You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Julian Hyde <jh...@gmail.com> on 2021/12/03 00:45:57 UTC

Re: Snowflake VARIANT support

Work has not started, and I don’t know of anyone who is planning to work on this. I have logged https://issues.apache.org/jira/browse/CALCITE-4918 <https://issues.apache.org/jira/browse/CALCITE-4918> and https://issues.apache.org/jira/browse/CALCITE-4919 <https://issues.apache.org/jira/browse/CALCITE-4919>. Let’s discuss there.

> On Nov 30, 2021, at 10:45 AM, Sandeep Nayak <ma...@gmail.com> wrote:
> 
> Hello,
> 
> I am a newbie to Calcite and still trying to figure all the pieces of the
> project.
> 
> In looking for support in Calcite to parse Snowflake expressions I ran into
> a question asked earlier this month on the dev mail list which is pretty
> much exactly what I am trying to do. Simon asked the question pretty
> concisely see
> https://www.mail-archive.com/dev@calcite.apache.org/msg17417.html.
> 
> I did not see a follow up after Julian's response on that thread and wanted
> to ask if there was work in progress on the snowflake integration?
> 
> I looked at the instructions Julian provided but given that I am new to
> this project I could not quite follow the pieces so I figured I should ask
> a few questions to get a better understanding. Apologies if these seem
> obvious and if they are already covered, please point me to collateral for
> the same
> 
> (a) There is a Free Marker template for parser in core as well as babel
> sub-project. How are these two related?
> 
> (b) The field reference operator ':' in snowflake (storing json in a
> variant column) differs from '->>' as used in PostgresDB for jsonb. My
> thought is that such a reference operator probably should be in a database
> specific extension like through a snowflake adapter. Is that thinking
> accurate?
> 
> (c) How are new data types plugged in? For example VARIANT is a specific
> Snowflake implementation and probably should stay in a snowflake adapter.
> Similar to (b).
> 
> For (b) and (c) how can a newbie like me understand how new operators or
> data-types get introduced into the parser? Which adapter or example should
> I look at to get a better sense of how the above is done?
> 
> Thanks in advance for your help.
> 
> Sandeep

Re: Snowflake VARIANT support

Posted by Julian Hyde <jh...@gmail.com>.

One other thought. Snowflake used the new ‘:’ operator to obtain fields of variants. Drill uses the existing ‘.’ operator.  It’s possible that the ‘.’ operator will work just fine. In which case I would propose to implement ‘.’. Then we don’t need to change the parser.

(Though there would be no changes to the parser, we would need some non-trivial changes to the validator, desugaring “v.f" to "v[‘f’]" if v is a variant type.)

If someone from Snowflake can provide a justification why ‘:’ is needed I’d like to hear it.

In Calcite we usually try to follow the standard (the ISO standard and also the consensus among DB vendors). But this is just one vendor, and it’s possible that what we are proposing is superior.

Julian


> On Dec 6, 2021, at 5:26 AM, Stamatis Zampetakis <za...@gmail.com> wrote:
> 
> Hello,
> 
> Just to be clear, I do think supporting VARIANT in Calcite will be great
> and all the notes made by Julian under the JIRA make perfect sense.
> Below, there is another viewpoint for those people who don't need the full
> capabilities of a VARIANT type.
> 
> In some cases, the data stored in a VARIANT column (in Snowflake or
> elsewhere) does have a fixed schema. When that's the case JSON/XML (or
> other) could be mapped in Calcite to the standard SQL type {{STRUCT}} which
> supports nesting.
> If this holds then the function could parse JSON into a STRUCT and then we
> could use the navigation {{RexFieldAccess}} expression (using the standard
> dot notation) to navigate in the struct (JSON/XML).
> 
> Now a few more clarifications around the points Sandeep raised which may be
> useful for other people as well:
> 
> (a) The core parser is (and will remain) compliant to the SQL standard.
> The babel parser is more liberal and attempts to cover some syntactic
> patterns which are not in the SQL standard but appear often in major DBMS.
> Depending on their use-cases people choose one or the other as a starting
> point, and if necessary use Free Marker templates for extending those.
> You could have a look at Apache Drill on how to use the templates (commit
> [1] may be a good starting point).
> 
> (b) The support for ':' could possibly divided in three parts:
> * parse SQL expressions with ':' field reference operator
> * use the ':' field reference operator or an equivalent in the relational
> algebra
> * use ':' operator when serializing relational algebra to SQL
> 
> Depending on the use-case you may not necessarily need to deal with
> everything.
> If for instance you are constructing the plan directly (RelNode) you may be
> able to skip the parsing of ':' operator.
> If you don't need to parse expressions with ':' then you probably don't
> need to worry about the different parser implementations and the Free
> Marker templates.
> 
> About using the ':' in relational algebra Julian already outlined a few
> ideas in CALCITE-4919 and RexFieldAccess could be another option.
> If for some reason nothing works and you need to introduce a new operator,
> you can check previous commits in the Calcite repository.
> There you can find many examples for adding new operators/functions both
> internal (e.g., SEARCH[3]) and public (e.g., ARRAY_AGG, ARRAY_CONCAT [4]),
> SQL standard compliant or database specific.
> 
> When it comes to serializing algebra to SQL there are mainly two things
> that come into play: RelToSqlConverter and SqlDialect.
> You can have a look at RelToSqlConverterTest [2] to see how these play
> together.
> In a nutshell you may be able to customize the serialization of
> ITEM/RexFieldAccess operator based on the dialect (e.g., Snowflake).
> 
> (c) Regarding the VARIANT data type as Julian pointed out it could make
> sense to add this new type in the core and not tight to a specific adapter.
> You can possibly look at the changes introduced in CALCITE-1968 [5] where
> the GEOMETRY type was introduced.
> Again if in your use-case you are not parsing SQL to convert to RelNode
> then you can skip the changes related to the parser.
> In some cases you can work with "new" types via a custom RelDataTypeFactory
> [6] and extensions in the RelDataType hierarchy.
> 
> Best,
> Stamatis
> 
> [1]
> https://github.com/apache/drill/commit/946bdb5cc1ef360318f9b29c072faff43f194ef6
> [2]
> https://github.com/apache/calcite/blob/master/core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java
> [3]
> https://github.com/apache/calcite/commit/850f0f4a04fc2399b8fd1c1fed532cd8e1e39514
> [4]
> https://github.com/apache/calcite/commit/3038fb7e87c51740ef733fd4f9bde41b0c25e5b0
> [5]
> https://github.com/apache/calcite/commit/cc20ca13db4d506d9d4d1b861dd1c7ac3944e56e
> [6]
> https://github.com/apache/calcite/blob/2317f4ea7779a38d128a756baa5ad21797f2eb6c/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeFactory.java
> 
> On Fri, Dec 3, 2021 at 1:46 AM Julian Hyde <jh...@gmail.com> wrote:
> 
>> Work has not started, and I don’t know of anyone who is planning to work
>> on this. I have logged https://issues.apache.org/jira/browse/CALCITE-4918
>> <https://issues.apache.org/jira/browse/CALCITE-4918> and
>> https://issues.apache.org/jira/browse/CALCITE-4919 <
>> https://issues.apache.org/jira/browse/CALCITE-4919>. Let’s discuss there.
>> 
>>> On Nov 30, 2021, at 10:45 AM, Sandeep Nayak <ma...@gmail.com>
>> wrote:
>>> 
>>> Hello,
>>> 
>>> I am a newbie to Calcite and still trying to figure all the pieces of the
>>> project.
>>> 
>>> In looking for support in Calcite to parse Snowflake expressions I ran
>> into
>>> a question asked earlier this month on the dev mail list which is pretty
>>> much exactly what I am trying to do. Simon asked the question pretty
>>> concisely see
>>> https://www.mail-archive.com/dev@calcite.apache.org/msg17417.html.
>>> 
>>> I did not see a follow up after Julian's response on that thread and
>> wanted
>>> to ask if there was work in progress on the snowflake integration?
>>> 
>>> I looked at the instructions Julian provided but given that I am new to
>>> this project I could not quite follow the pieces so I figured I should
>> ask
>>> a few questions to get a better understanding. Apologies if these seem
>>> obvious and if they are already covered, please point me to collateral
>> for
>>> the same
>>> 
>>> (a) There is a Free Marker template for parser in core as well as babel
>>> sub-project. How are these two related?
>>> 
>>> (b) The field reference operator ':' in snowflake (storing json in a
>>> variant column) differs from '->>' as used in PostgresDB for jsonb. My
>>> thought is that such a reference operator probably should be in a
>> database
>>> specific extension like through a snowflake adapter. Is that thinking
>>> accurate?
>>> 
>>> (c) How are new data types plugged in? For example VARIANT is a specific
>>> Snowflake implementation and probably should stay in a snowflake adapter.
>>> Similar to (b).
>>> 
>>> For (b) and (c) how can a newbie like me understand how new operators or
>>> data-types get introduced into the parser? Which adapter or example
>> should
>>> I look at to get a better sense of how the above is done?
>>> 
>>> Thanks in advance for your help.
>>> 
>>> Sandeep
>> 
>>

Re: Snowflake VARIANT support

Posted by Stamatis Zampetakis <za...@gmail.com>.

Hello,

Just to be clear, I do think supporting VARIANT in Calcite will be great
and all the notes made by Julian under the JIRA make perfect sense.
Below, there is another viewpoint for those people who don't need the full
capabilities of a VARIANT type.

In some cases, the data stored in a VARIANT column (in Snowflake or
elsewhere) does have a fixed schema. When that's the case JSON/XML (or
other) could be mapped in Calcite to the standard SQL type {{STRUCT}} which
supports nesting.
If this holds then the function could parse JSON into a STRUCT and then we
could use the navigation {{RexFieldAccess}} expression (using the standard
dot notation) to navigate in the struct (JSON/XML).

Now a few more clarifications around the points Sandeep raised which may be
useful for other people as well:

(a) The core parser is (and will remain) compliant to the SQL standard.
The babel parser is more liberal and attempts to cover some syntactic
patterns which are not in the SQL standard but appear often in major DBMS.
Depending on their use-cases people choose one or the other as a starting
point, and if necessary use Free Marker templates for extending those.
You could have a look at Apache Drill on how to use the templates (commit
[1] may be a good starting point).

(b) The support for ':' could possibly divided in three parts:
* parse SQL expressions with ':' field reference operator
* use the ':' field reference operator or an equivalent in the relational
algebra
* use ':' operator when serializing relational algebra to SQL

Depending on the use-case you may not necessarily need to deal with
everything.
If for instance you are constructing the plan directly (RelNode) you may be
able to skip the parsing of ':' operator.
If you don't need to parse expressions with ':' then you probably don't
need to worry about the different parser implementations and the Free
Marker templates.

About using the ':' in relational algebra Julian already outlined a few
ideas in CALCITE-4919 and RexFieldAccess could be another option.
If for some reason nothing works and you need to introduce a new operator,
you can check previous commits in the Calcite repository.
There you can find many examples for adding new operators/functions both
internal (e.g., SEARCH[3]) and public (e.g., ARRAY_AGG, ARRAY_CONCAT [4]),
SQL standard compliant or database specific.

When it comes to serializing algebra to SQL there are mainly two things
that come into play: RelToSqlConverter and SqlDialect.
You can have a look at RelToSqlConverterTest [2] to see how these play
together.
In a nutshell you may be able to customize the serialization of
ITEM/RexFieldAccess operator based on the dialect (e.g., Snowflake).

(c) Regarding the VARIANT data type as Julian pointed out it could make
sense to add this new type in the core and not tight to a specific adapter.
You can possibly look at the changes introduced in CALCITE-1968 [5] where
the GEOMETRY type was introduced.
Again if in your use-case you are not parsing SQL to convert to RelNode
then you can skip the changes related to the parser.
In some cases you can work with "new" types via a custom RelDataTypeFactory
[6] and extensions in the RelDataType hierarchy.

Best,
Stamatis

[1]
https://github.com/apache/drill/commit/946bdb5cc1ef360318f9b29c072faff43f194ef6
[2]
https://github.com/apache/calcite/blob/master/core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java
[3]
https://github.com/apache/calcite/commit/850f0f4a04fc2399b8fd1c1fed532cd8e1e39514
[4]
https://github.com/apache/calcite/commit/3038fb7e87c51740ef733fd4f9bde41b0c25e5b0
[5]
https://github.com/apache/calcite/commit/cc20ca13db4d506d9d4d1b861dd1c7ac3944e56e
[6]
https://github.com/apache/calcite/blob/2317f4ea7779a38d128a756baa5ad21797f2eb6c/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeFactory.java

On Fri, Dec 3, 2021 at 1:46 AM Julian Hyde <jh...@gmail.com> wrote:

> Work has not started, and I don’t know of anyone who is planning to work
> on this. I have logged https://issues.apache.org/jira/browse/CALCITE-4918
> <https://issues.apache.org/jira/browse/CALCITE-4918> and
> https://issues.apache.org/jira/browse/CALCITE-4919 <
> https://issues.apache.org/jira/browse/CALCITE-4919>. Let’s discuss there.
>
> > On Nov 30, 2021, at 10:45 AM, Sandeep Nayak <ma...@gmail.com>
> wrote:
> >
> > Hello,
> >
> > I am a newbie to Calcite and still trying to figure all the pieces of the
> > project.
> >
> > In looking for support in Calcite to parse Snowflake expressions I ran
> into
> > a question asked earlier this month on the dev mail list which is pretty
> > much exactly what I am trying to do. Simon asked the question pretty
> > concisely see
> > https://www.mail-archive.com/dev@calcite.apache.org/msg17417.html.
> >
> > I did not see a follow up after Julian's response on that thread and
> wanted
> > to ask if there was work in progress on the snowflake integration?
> >
> > I looked at the instructions Julian provided but given that I am new to
> > this project I could not quite follow the pieces so I figured I should
> ask
> > a few questions to get a better understanding. Apologies if these seem
> > obvious and if they are already covered, please point me to collateral
> for
> > the same
> >
> > (a) There is a Free Marker template for parser in core as well as babel
> > sub-project. How are these two related?
> >
> > (b) The field reference operator ':' in snowflake (storing json in a
> > variant column) differs from '->>' as used in PostgresDB for jsonb. My
> > thought is that such a reference operator probably should be in a
> database
> > specific extension like through a snowflake adapter. Is that thinking
> > accurate?
> >
> > (c) How are new data types plugged in? For example VARIANT is a specific
> > Snowflake implementation and probably should stay in a snowflake adapter.
> > Similar to (b).
> >
> > For (b) and (c) how can a newbie like me understand how new operators or
> > data-types get introduced into the parser? Which adapter or example
> should
> > I look at to get a better sense of how the above is done?
> >
> > Thanks in advance for your help.
> >
> > Sandeep
>
>