You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Rui Wang <ru...@google.com> on 2019/08/02 17:12:01 UTC

Support ZetaSQL as a new SQL dialect in BeamSQL

Hi community,

I have been working on supporting ZetaSQL[1] as a SQL dialect in BeamSQL.
ZetaSQL is a SQL analyzer open sourced by Google. Here is ZetaSQL's
documentation[2].

Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
plugable query planner interface in BeamSQL, and we can easily plug in a
new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
planners by this way (e.g. PostgreSQL dialect).

I want to contribute ZetaSQL planner and its related code(~10k) to Beam
repo(#9210 <https://github.com/apache/beam/pull/9210>). This contribution
barely touch existing Beam code (because the idea is plugable planner).


*Acknowledgement*
Thanks to all the people who provided help during Beam ZetaSQL development:
Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles, Anton Kedin
and Mikhail Gryzykhin. This list is not exhausted and also thanks to
contributions which are not listed.


[1]: https://github.com/google/zetasql
[2]: https://github.com/google/zetasql/tree/master/docs
[3]:
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java


-Rui

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Alex Van Boxel <al...@vanboxel.be>.
This is a very informative thread. I would love that a lot of this
information and reasoning end up in the documentation.

 _/
_/ Alex Van Boxel


On Wed, Aug 21, 2019 at 9:17 PM Rui Wang <ru...@google.com> wrote:

> Thanks everyone! Now Beam ZetaSQL is merged into Beam repo!
>
>
> -Rui
>
> On Mon, Aug 19, 2019 at 8:36 AM Ahmet Altay <al...@google.com> wrote:
>
>> Thank you both!
>>
>> On Mon, Aug 19, 2019 at 8:01 AM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> The i.p. clearance is complete:
>>> https://lists.apache.org/thread.html/239be048e7748f079dc34b06020e0c8f094859cb4a558b361f6b8eb5@<general.incubator.apache.org>
>>>
>>> Kenn
>>>
>>> On Mon, Aug 12, 2019 at 4:25 PM Rui Wang <ru...@google.com> wrote:
>>>
>>>> Thanks Kenneth.
>>>>
>>>> I will start a vote for Beam ZetaSQL contribution.
>>>>
>>>> -Rui
>>>>
>>>> On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <ke...@apache.org>
>>>> wrote:
>>>>
>>>>> Nice explanations of the reasoning. I think two things will stay
>>>>> approximately the same even as the ecosystem develops: (1) ZetaSQL has
>>>>> pretty clear semantics so we will have a compliant parser, whether it is
>>>>> the official one or another like Calcite Babel, and (2) we will need a way
>>>>> to implement all the standard ZetaSQL functions and this will be the same
>>>>> no matter the frontend.
>>>>>
>>>>> For a contribution this large where i.p. clearance is necessary, a
>>>>> vote is appropriate. It can happen at the same time or even after i.p.
>>>>> clearance.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <mi...@gmail.com> wrote:
>>>>>
>>>>>> Thanks to highlight the parts of types/operators/functions/..., that
>>>>>> does make things more complicated. +1 that as a short/middle term solution,
>>>>>> the proposal is reasonable. We could follow up in future to handle it in
>>>>>> Calcite Babel if possible.
>>>>>>
>>>>>> Mingmin
>>>>>>
>>>>>> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ru...@google.com> wrote:
>>>>>>
>>>>>>> Hi Mingmin,
>>>>>>>
>>>>>>> Honestly I don't have an answer to it: a SQL dialect is complicated
>>>>>>> and I don't have enough understanding on Calcite (Calcite has a big repo).
>>>>>>> Based on my read from CALCITE-2280
>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
>>>>>>> standard sql that a dialect is, the less blockers that we will have to
>>>>>>> support this dialect in Calcite babel parser.
>>>>>>>
>>>>>>> However, this is a good question, which raises a good aspect that I
>>>>>>> found people usually ignore: supporting a SQL dialect is not only support a
>>>>>>> type of syntax. It also includes data types, built-in sql functions,
>>>>>>> operators and many other stuff.
>>>>>>>
>>>>>>> I especially found the following incompatibilities between Calcite
>>>>>>> and ZetaSQL during the development:
>>>>>>> 1. Calcite does not support Struct/Row type well because Calcite
>>>>>>> flattens Rows when reading from tables by adding an extra Projection on top
>>>>>>> of tables.
>>>>>>> 2. I had trouble in supporting DATETIME(or timestamp without
>>>>>>> time zone) type.
>>>>>>> 3. Huge incompatibilities on SQL functions. E.g. return type is
>>>>>>> different for AVG(long), and many many more.
>>>>>>> 4. I am not sure if Calcite has the same set of type casting rules
>>>>>>> as BigQuery(my impression is there are differences).
>>>>>>>
>>>>>>>
>>>>>>> I would say in the short/mid term, it's much easier to use logical
>>>>>>> plan as IR to implement another SQL dialect for BeamSQL (Linkedin has
>>>>>>> similar practice, see their blog post
>>>>>>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
>>>>>>> ).
>>>>>>>
>>>>>>> For the longer term, it would be interesting to see how we can add
>>>>>>> BigQuery syntax (plus its data types and sql functions) to Calcite babel
>>>>>>> parser.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Just take a look at
>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-2280 which
>>>>>>>> introduced Babel parser in Calcite to support varied dialects, this may be
>>>>>>>> an easier way to support BigQuery syntax. @Rui do you notice any big
>>>>>>>> difference between Calcite engine and ZetaSQL, like parsing, optimization?
>>>>>>>> If that's the case, it make sense to build the alternative switch in Beam
>>>>>>>> side.
>>>>>>>>
>>>>>>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Mingmin - it sounds like an awesome idea to translate from
>>>>>>>>> SparkSQL. It's even more exciting to know if we could translate Spark
>>>>>>>>> Structured Streaming code by a similar way, which enables existing Spark
>>>>>>>>> SQL/Structure Streaming pipelines run on Beam.
>>>>>>>>>
>>>>>>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite
>>>>>>>>> and only found[1]. From that thread, I see that adding ZetaSQL to Calcite
>>>>>>>>> itself is still a discussion. I am also looking for if anyone knows more
>>>>>>>>> progress on this work than the thread.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupE6poytRXhjri9Q8w@mail.gmail.com%3E
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I hear rumours that the Calcite project is planning on adding a
>>>>>>>>>> zeta-SQL compatible parser to Calcite itself, in which case there will be a
>>>>>>>>>> Java parser we can use as well. Does anyone know if this work is still
>>>>>>>>>> going on?
>>>>>>>>>>
>>>>>>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <
>>>>>>>>>> owenzhang1990@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> A question to the community, does the size of the change require
>>>>>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think so. This is a big change and has come as kind of a
>>>>>>>>>>> surprise (sorry if I've missed previous discussions).
>>>>>>>>>>>
>>>>>>>>>>> Rui, could you explain more on how things will play out between
>>>>>>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface would
>>>>>>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is
>>>>>>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on
>>>>>>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting
>>>>>>>>>>> but I could barely find any doc for end users.
>>>>>>>>>>>
>>>>>>>>>>> Also, I'd prefer the PR to be split into two, one for the
>>>>>>>>>>> pluggable interface and one for the ZetaSQL.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Manu
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank you Rui for the heads up.
>>>>>>>>>>>>
>>>>>>>>>>>> A question to the community, does the size of the change
>>>>>>>>>>>> require any process besides the usual PR reviews?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi community,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect
>>>>>>>>>>>>> in BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>>>>>>>>>> ZetaSQL's documentation[2].
>>>>>>>>>>>>>
>>>>>>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I
>>>>>>>>>>>>> made a plugable query planner interface in BeamSQL, and we can easily plug
>>>>>>>>>>>>> in a new planner[3] (in my case, ZetaSQL planner). Actually anyone can add
>>>>>>>>>>>>> new planners by this way (e.g. PostgreSQL dialect).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I want to contribute ZetaSQL planner and its related
>>>>>>>>>>>>> code(~10k) to Beam repo(#9210
>>>>>>>>>>>>> <https://github.com/apache/beam/pull/9210>). This
>>>>>>>>>>>>> contribution barely touch existing Beam code (because the idea is plugable
>>>>>>>>>>>>> planner).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Acknowledgement*
>>>>>>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>>>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>>>>>>>>>> thanks to contributions which are not listed.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>>>>>>>> [3]:
>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Rui
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ----
>>>>>>>> Mingmin
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ----
>>>>>> Mingmin
>>>>>>
>>>>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Rui Wang <ru...@google.com>.
Thanks everyone! Now Beam ZetaSQL is merged into Beam repo!


-Rui

On Mon, Aug 19, 2019 at 8:36 AM Ahmet Altay <al...@google.com> wrote:

> Thank you both!
>
> On Mon, Aug 19, 2019 at 8:01 AM Kenneth Knowles <ke...@apache.org> wrote:
>
>> The i.p. clearance is complete:
>> https://lists.apache.org/thread.html/239be048e7748f079dc34b06020e0c8f094859cb4a558b361f6b8eb5@<general.incubator.apache.org>
>>
>> Kenn
>>
>> On Mon, Aug 12, 2019 at 4:25 PM Rui Wang <ru...@google.com> wrote:
>>
>>> Thanks Kenneth.
>>>
>>> I will start a vote for Beam ZetaSQL contribution.
>>>
>>> -Rui
>>>
>>> On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> Nice explanations of the reasoning. I think two things will stay
>>>> approximately the same even as the ecosystem develops: (1) ZetaSQL has
>>>> pretty clear semantics so we will have a compliant parser, whether it is
>>>> the official one or another like Calcite Babel, and (2) we will need a way
>>>> to implement all the standard ZetaSQL functions and this will be the same
>>>> no matter the frontend.
>>>>
>>>> For a contribution this large where i.p. clearance is necessary, a vote
>>>> is appropriate. It can happen at the same time or even after i.p. clearance.
>>>>
>>>> Kenn
>>>>
>>>> On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <mi...@gmail.com> wrote:
>>>>
>>>>> Thanks to highlight the parts of types/operators/functions/..., that
>>>>> does make things more complicated. +1 that as a short/middle term solution,
>>>>> the proposal is reasonable. We could follow up in future to handle it in
>>>>> Calcite Babel if possible.
>>>>>
>>>>> Mingmin
>>>>>
>>>>> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ru...@google.com> wrote:
>>>>>
>>>>>> Hi Mingmin,
>>>>>>
>>>>>> Honestly I don't have an answer to it: a SQL dialect is complicated
>>>>>> and I don't have enough understanding on Calcite (Calcite has a big repo).
>>>>>> Based on my read from CALCITE-2280
>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
>>>>>> standard sql that a dialect is, the less blockers that we will have to
>>>>>> support this dialect in Calcite babel parser.
>>>>>>
>>>>>> However, this is a good question, which raises a good aspect that I
>>>>>> found people usually ignore: supporting a SQL dialect is not only support a
>>>>>> type of syntax. It also includes data types, built-in sql functions,
>>>>>> operators and many other stuff.
>>>>>>
>>>>>> I especially found the following incompatibilities between Calcite
>>>>>> and ZetaSQL during the development:
>>>>>> 1. Calcite does not support Struct/Row type well because Calcite
>>>>>> flattens Rows when reading from tables by adding an extra Projection on top
>>>>>> of tables.
>>>>>> 2. I had trouble in supporting DATETIME(or timestamp without
>>>>>> time zone) type.
>>>>>> 3. Huge incompatibilities on SQL functions. E.g. return type is
>>>>>> different for AVG(long), and many many more.
>>>>>> 4. I am not sure if Calcite has the same set of type casting rules as
>>>>>> BigQuery(my impression is there are differences).
>>>>>>
>>>>>>
>>>>>> I would say in the short/mid term, it's much easier to use logical
>>>>>> plan as IR to implement another SQL dialect for BeamSQL (Linkedin has
>>>>>> similar practice, see their blog post
>>>>>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
>>>>>> ).
>>>>>>
>>>>>> For the longer term, it would be interesting to see how we can add
>>>>>> BigQuery syntax (plus its data types and sql functions) to Calcite babel
>>>>>> parser.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mi...@gmail.com> wrote:
>>>>>>
>>>>>>> Just take a look at
>>>>>>> https://issues.apache.org/jira/browse/CALCITE-2280 which introduced
>>>>>>> Babel parser in Calcite to support varied dialects, this may be an easier
>>>>>>> way to support BigQuery syntax. @Rui do you notice any big difference
>>>>>>> between Calcite engine and ZetaSQL, like parsing, optimization? If that's
>>>>>>> the case, it make sense to build the alternative switch in Beam side.
>>>>>>>
>>>>>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>
>>>>>>>> Mingmin - it sounds like an awesome idea to translate from
>>>>>>>> SparkSQL. It's even more exciting to know if we could translate Spark
>>>>>>>> Structured Streaming code by a similar way, which enables existing Spark
>>>>>>>> SQL/Structure Streaming pipelines run on Beam.
>>>>>>>>
>>>>>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite
>>>>>>>> and only found[1]. From that thread, I see that adding ZetaSQL to Calcite
>>>>>>>> itself is still a discussion. I am also looking for if anyone knows more
>>>>>>>> progress on this work than the thread.
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]:
>>>>>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupE6poytRXhjri9Q8w@mail.gmail.com%3E
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>>
>>>>>>>>> I hear rumours that the Calcite project is planning on adding a
>>>>>>>>> zeta-SQL compatible parser to Calcite itself, in which case there will be a
>>>>>>>>> Java parser we can use as well. Does anyone know if this work is still
>>>>>>>>> going on?
>>>>>>>>>
>>>>>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> A question to the community, does the size of the change require
>>>>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think so. This is a big change and has come as kind of a
>>>>>>>>>> surprise (sorry if I've missed previous discussions).
>>>>>>>>>>
>>>>>>>>>> Rui, could you explain more on how things will play out between
>>>>>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface would
>>>>>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is
>>>>>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on
>>>>>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting
>>>>>>>>>> but I could barely find any doc for end users.
>>>>>>>>>>
>>>>>>>>>> Also, I'd prefer the PR to be split into two, one for the
>>>>>>>>>> pluggable interface and one for the ZetaSQL.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Manu
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank you Rui for the heads up.
>>>>>>>>>>>
>>>>>>>>>>> A question to the community, does the size of the change require
>>>>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi community,
>>>>>>>>>>>>
>>>>>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect
>>>>>>>>>>>> in BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>>>>>>>>> ZetaSQL's documentation[2].
>>>>>>>>>>>>
>>>>>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I
>>>>>>>>>>>> made a plugable query planner interface in BeamSQL, and we can easily plug
>>>>>>>>>>>> in a new planner[3] (in my case, ZetaSQL planner). Actually anyone can add
>>>>>>>>>>>> new planners by this way (e.g. PostgreSQL dialect).
>>>>>>>>>>>>
>>>>>>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k)
>>>>>>>>>>>> to Beam repo(#9210 <https://github.com/apache/beam/pull/9210>).
>>>>>>>>>>>> This contribution barely touch existing Beam code (because the idea is
>>>>>>>>>>>> plugable planner).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Acknowledgement*
>>>>>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>>>>>>>>> thanks to contributions which are not listed.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>>>>>>> [3]:
>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Rui
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ----
>>>>>>> Mingmin
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> ----
>>>>> Mingmin
>>>>>
>>>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Ahmet Altay <al...@google.com>.
Thank you both!

On Mon, Aug 19, 2019 at 8:01 AM Kenneth Knowles <ke...@apache.org> wrote:

> The i.p. clearance is complete:
> https://lists.apache.org/thread.html/239be048e7748f079dc34b06020e0c8f094859cb4a558b361f6b8eb5@<general.incubator.apache.org>
>
> Kenn
>
> On Mon, Aug 12, 2019 at 4:25 PM Rui Wang <ru...@google.com> wrote:
>
>> Thanks Kenneth.
>>
>> I will start a vote for Beam ZetaSQL contribution.
>>
>> -Rui
>>
>> On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> Nice explanations of the reasoning. I think two things will stay
>>> approximately the same even as the ecosystem develops: (1) ZetaSQL has
>>> pretty clear semantics so we will have a compliant parser, whether it is
>>> the official one or another like Calcite Babel, and (2) we will need a way
>>> to implement all the standard ZetaSQL functions and this will be the same
>>> no matter the frontend.
>>>
>>> For a contribution this large where i.p. clearance is necessary, a vote
>>> is appropriate. It can happen at the same time or even after i.p. clearance.
>>>
>>> Kenn
>>>
>>> On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <mi...@gmail.com> wrote:
>>>
>>>> Thanks to highlight the parts of types/operators/functions/..., that
>>>> does make things more complicated. +1 that as a short/middle term solution,
>>>> the proposal is reasonable. We could follow up in future to handle it in
>>>> Calcite Babel if possible.
>>>>
>>>> Mingmin
>>>>
>>>> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ru...@google.com> wrote:
>>>>
>>>>> Hi Mingmin,
>>>>>
>>>>> Honestly I don't have an answer to it: a SQL dialect is complicated
>>>>> and I don't have enough understanding on Calcite (Calcite has a big repo).
>>>>> Based on my read from CALCITE-2280
>>>>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
>>>>> standard sql that a dialect is, the less blockers that we will have to
>>>>> support this dialect in Calcite babel parser.
>>>>>
>>>>> However, this is a good question, which raises a good aspect that I
>>>>> found people usually ignore: supporting a SQL dialect is not only support a
>>>>> type of syntax. It also includes data types, built-in sql functions,
>>>>> operators and many other stuff.
>>>>>
>>>>> I especially found the following incompatibilities between Calcite and
>>>>> ZetaSQL during the development:
>>>>> 1. Calcite does not support Struct/Row type well because Calcite
>>>>> flattens Rows when reading from tables by adding an extra Projection on top
>>>>> of tables.
>>>>> 2. I had trouble in supporting DATETIME(or timestamp without
>>>>> time zone) type.
>>>>> 3. Huge incompatibilities on SQL functions. E.g. return type is
>>>>> different for AVG(long), and many many more.
>>>>> 4. I am not sure if Calcite has the same set of type casting rules as
>>>>> BigQuery(my impression is there are differences).
>>>>>
>>>>>
>>>>> I would say in the short/mid term, it's much easier to use logical
>>>>> plan as IR to implement another SQL dialect for BeamSQL (Linkedin has
>>>>> similar practice, see their blog post
>>>>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
>>>>> ).
>>>>>
>>>>> For the longer term, it would be interesting to see how we can add
>>>>> BigQuery syntax (plus its data types and sql functions) to Calcite babel
>>>>> parser.
>>>>>
>>>>>
>>>>>
>>>>> -Rui
>>>>>
>>>>>
>>>>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mi...@gmail.com> wrote:
>>>>>
>>>>>> Just take a look at
>>>>>> https://issues.apache.org/jira/browse/CALCITE-2280 which introduced
>>>>>> Babel parser in Calcite to support varied dialects, this may be an easier
>>>>>> way to support BigQuery syntax. @Rui do you notice any big difference
>>>>>> between Calcite engine and ZetaSQL, like parsing, optimization? If that's
>>>>>> the case, it make sense to build the alternative switch in Beam side.
>>>>>>
>>>>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ru...@google.com> wrote:
>>>>>>
>>>>>>> Mingmin - it sounds like an awesome idea to translate from SparkSQL.
>>>>>>> It's even more exciting to know if we could translate Spark
>>>>>>> Structured Streaming code by a similar way, which enables existing Spark
>>>>>>> SQL/Structure Streaming pipelines run on Beam.
>>>>>>>
>>>>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite
>>>>>>> and only found[1]. From that thread, I see that adding ZetaSQL to Calcite
>>>>>>> itself is still a discussion. I am also looking for if anyone knows more
>>>>>>> progress on this work than the thread.
>>>>>>>
>>>>>>>
>>>>>>> [1]:
>>>>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupE6poytRXhjri9Q8w@mail.gmail.com%3E
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> I hear rumours that the Calcite project is planning on adding a
>>>>>>>> zeta-SQL compatible parser to Calcite itself, in which case there will be a
>>>>>>>> Java parser we can use as well. Does anyone know if this work is still
>>>>>>>> going on?
>>>>>>>>
>>>>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> A question to the community, does the size of the change require
>>>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think so. This is a big change and has come as kind of a
>>>>>>>>> surprise (sorry if I've missed previous discussions).
>>>>>>>>>
>>>>>>>>> Rui, could you explain more on how things will play out between
>>>>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface would
>>>>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is
>>>>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on
>>>>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but
>>>>>>>>> I could barely find any doc for end users.
>>>>>>>>>
>>>>>>>>> Also, I'd prefer the PR to be split into two, one for the
>>>>>>>>> pluggable interface and one for the ZetaSQL.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Manu
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thank you Rui for the heads up.
>>>>>>>>>>
>>>>>>>>>> A question to the community, does the size of the change require
>>>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi community,
>>>>>>>>>>>
>>>>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>>>>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>>>>>>>> ZetaSQL's documentation[2].
>>>>>>>>>>>
>>>>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I
>>>>>>>>>>> made a plugable query planner interface in BeamSQL, and we can easily plug
>>>>>>>>>>> in a new planner[3] (in my case, ZetaSQL planner). Actually anyone can add
>>>>>>>>>>> new planners by this way (e.g. PostgreSQL dialect).
>>>>>>>>>>>
>>>>>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k)
>>>>>>>>>>> to Beam repo(#9210 <https://github.com/apache/beam/pull/9210>).
>>>>>>>>>>> This contribution barely touch existing Beam code (because the idea is
>>>>>>>>>>> plugable planner).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Acknowledgement*
>>>>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>>>>>>>> thanks to contributions which are not listed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>>>>>> [3]:
>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Rui
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ----
>>>>>> Mingmin
>>>>>>
>>>>>
>>>>
>>>> --
>>>> ----
>>>> Mingmin
>>>>
>>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Kenneth Knowles <ke...@apache.org>.
The i.p. clearance is complete:
https://lists.apache.org/thread.html/239be048e7748f079dc34b06020e0c8f094859cb4a558b361f6b8eb5@<general.incubator.apache.org>

Kenn

On Mon, Aug 12, 2019 at 4:25 PM Rui Wang <ru...@google.com> wrote:

> Thanks Kenneth.
>
> I will start a vote for Beam ZetaSQL contribution.
>
> -Rui
>
> On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Nice explanations of the reasoning. I think two things will stay
>> approximately the same even as the ecosystem develops: (1) ZetaSQL has
>> pretty clear semantics so we will have a compliant parser, whether it is
>> the official one or another like Calcite Babel, and (2) we will need a way
>> to implement all the standard ZetaSQL functions and this will be the same
>> no matter the frontend.
>>
>> For a contribution this large where i.p. clearance is necessary, a vote
>> is appropriate. It can happen at the same time or even after i.p. clearance.
>>
>> Kenn
>>
>> On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <mi...@gmail.com> wrote:
>>
>>> Thanks to highlight the parts of types/operators/functions/..., that
>>> does make things more complicated. +1 that as a short/middle term solution,
>>> the proposal is reasonable. We could follow up in future to handle it in
>>> Calcite Babel if possible.
>>>
>>> Mingmin
>>>
>>> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ru...@google.com> wrote:
>>>
>>>> Hi Mingmin,
>>>>
>>>> Honestly I don't have an answer to it: a SQL dialect is complicated and
>>>> I don't have enough understanding on Calcite (Calcite has a big repo).
>>>> Based on my read from CALCITE-2280
>>>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
>>>> standard sql that a dialect is, the less blockers that we will have to
>>>> support this dialect in Calcite babel parser.
>>>>
>>>> However, this is a good question, which raises a good aspect that I
>>>> found people usually ignore: supporting a SQL dialect is not only support a
>>>> type of syntax. It also includes data types, built-in sql functions,
>>>> operators and many other stuff.
>>>>
>>>> I especially found the following incompatibilities between Calcite and
>>>> ZetaSQL during the development:
>>>> 1. Calcite does not support Struct/Row type well because Calcite
>>>> flattens Rows when reading from tables by adding an extra Projection on top
>>>> of tables.
>>>> 2. I had trouble in supporting DATETIME(or timestamp without time zone)
>>>> type.
>>>> 3. Huge incompatibilities on SQL functions. E.g. return type is
>>>> different for AVG(long), and many many more.
>>>> 4. I am not sure if Calcite has the same set of type casting rules as
>>>> BigQuery(my impression is there are differences).
>>>>
>>>>
>>>> I would say in the short/mid term, it's much easier to use logical plan
>>>> as IR to implement another SQL dialect for BeamSQL (Linkedin has
>>>> similar practice, see their blog post
>>>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
>>>> ).
>>>>
>>>> For the longer term, it would be interesting to see how we can add
>>>> BigQuery syntax (plus its data types and sql functions) to Calcite babel
>>>> parser.
>>>>
>>>>
>>>>
>>>> -Rui
>>>>
>>>>
>>>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mi...@gmail.com> wrote:
>>>>
>>>>> Just take a look at https://issues.apache.org/jira/browse/CALCITE-2280
>>>>> which introduced Babel parser in Calcite to support varied dialects, this
>>>>> may be an easier way to support BigQuery syntax. @Rui do you notice any big
>>>>> difference between Calcite engine and ZetaSQL, like parsing, optimization?
>>>>> If that's the case, it make sense to build the alternative switch in Beam
>>>>> side.
>>>>>
>>>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ru...@google.com> wrote:
>>>>>
>>>>>> Mingmin - it sounds like an awesome idea to translate from SparkSQL.
>>>>>> It's even more exciting to know if we could translate Spark
>>>>>> Structured Streaming code by a similar way, which enables existing Spark
>>>>>> SQL/Structure Streaming pipelines run on Beam.
>>>>>>
>>>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite
>>>>>> and only found[1]. From that thread, I see that adding ZetaSQL to Calcite
>>>>>> itself is still a discussion. I am also looking for if anyone knows more
>>>>>> progress on this work than the thread.
>>>>>>
>>>>>>
>>>>>> [1]:
>>>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupE6poytRXhjri9Q8w@mail.gmail.com%3E
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> I hear rumours that the Calcite project is planning on adding a
>>>>>>> zeta-SQL compatible parser to Calcite itself, in which case there will be a
>>>>>>> Java parser we can use as well. Does anyone know if this work is still
>>>>>>> going on?
>>>>>>>
>>>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> A question to the community, does the size of the change require
>>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think so. This is a big change and has come as kind of a surprise
>>>>>>>> (sorry if I've missed previous discussions).
>>>>>>>>
>>>>>>>> Rui, could you explain more on how things will play out between
>>>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface would
>>>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is
>>>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on
>>>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but
>>>>>>>> I could barely find any doc for end users.
>>>>>>>>
>>>>>>>> Also, I'd prefer the PR to be split into two, one for the pluggable
>>>>>>>> interface and one for the ZetaSQL.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Manu
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thank you Rui for the heads up.
>>>>>>>>>
>>>>>>>>> A question to the community, does the size of the change require
>>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>>
>>>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi community,
>>>>>>>>>>
>>>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>>>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>>>>>>> ZetaSQL's documentation[2].
>>>>>>>>>>
>>>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I
>>>>>>>>>> made a plugable query planner interface in BeamSQL, and we can easily plug
>>>>>>>>>> in a new planner[3] (in my case, ZetaSQL planner). Actually anyone can add
>>>>>>>>>> new planners by this way (e.g. PostgreSQL dialect).
>>>>>>>>>>
>>>>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k)
>>>>>>>>>> to Beam repo(#9210 <https://github.com/apache/beam/pull/9210>).
>>>>>>>>>> This contribution barely touch existing Beam code (because the idea is
>>>>>>>>>> plugable planner).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Acknowledgement*
>>>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>>>>>>> thanks to contributions which are not listed.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>>>>> [3]:
>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Rui
>>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> ----
>>>>> Mingmin
>>>>>
>>>>
>>>
>>> --
>>> ----
>>> Mingmin
>>>
>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Rui Wang <ru...@google.com>.
Thanks Kenneth.

I will start a vote for Beam ZetaSQL contribution.

-Rui

On Mon, Aug 12, 2019 at 4:11 PM Kenneth Knowles <ke...@apache.org> wrote:

> Nice explanations of the reasoning. I think two things will stay
> approximately the same even as the ecosystem develops: (1) ZetaSQL has
> pretty clear semantics so we will have a compliant parser, whether it is
> the official one or another like Calcite Babel, and (2) we will need a way
> to implement all the standard ZetaSQL functions and this will be the same
> no matter the frontend.
>
> For a contribution this large where i.p. clearance is necessary, a vote is
> appropriate. It can happen at the same time or even after i.p. clearance.
>
> Kenn
>
> On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <mi...@gmail.com> wrote:
>
>> Thanks to highlight the parts of types/operators/functions/..., that does
>> make things more complicated. +1 that as a short/middle term solution, the
>> proposal is reasonable. We could follow up in future to handle it in
>> Calcite Babel if possible.
>>
>> Mingmin
>>
>> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ru...@google.com> wrote:
>>
>>> Hi Mingmin,
>>>
>>> Honestly I don't have an answer to it: a SQL dialect is complicated and
>>> I don't have enough understanding on Calcite (Calcite has a big repo).
>>> Based on my read from CALCITE-2280
>>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
>>> standard sql that a dialect is, the less blockers that we will have to
>>> support this dialect in Calcite babel parser.
>>>
>>> However, this is a good question, which raises a good aspect that I
>>> found people usually ignore: supporting a SQL dialect is not only support a
>>> type of syntax. It also includes data types, built-in sql functions,
>>> operators and many other stuff.
>>>
>>> I especially found the following incompatibilities between Calcite and
>>> ZetaSQL during the development:
>>> 1. Calcite does not support Struct/Row type well because Calcite
>>> flattens Rows when reading from tables by adding an extra Projection on top
>>> of tables.
>>> 2. I had trouble in supporting DATETIME(or timestamp without time zone)
>>> type.
>>> 3. Huge incompatibilities on SQL functions. E.g. return type is
>>> different for AVG(long), and many many more.
>>> 4. I am not sure if Calcite has the same set of type casting rules as
>>> BigQuery(my impression is there are differences).
>>>
>>>
>>> I would say in the short/mid term, it's much easier to use logical plan
>>> as IR to implement another SQL dialect for BeamSQL (Linkedin has
>>> similar practice, see their blog post
>>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
>>> ).
>>>
>>> For the longer term, it would be interesting to see how we can add
>>> BigQuery syntax (plus its data types and sql functions) to Calcite babel
>>> parser.
>>>
>>>
>>>
>>> -Rui
>>>
>>>
>>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mi...@gmail.com> wrote:
>>>
>>>> Just take a look at https://issues.apache.org/jira/browse/CALCITE-2280
>>>> which introduced Babel parser in Calcite to support varied dialects, this
>>>> may be an easier way to support BigQuery syntax. @Rui do you notice any big
>>>> difference between Calcite engine and ZetaSQL, like parsing, optimization?
>>>> If that's the case, it make sense to build the alternative switch in Beam
>>>> side.
>>>>
>>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ru...@google.com> wrote:
>>>>
>>>>> Mingmin - it sounds like an awesome idea to translate from SparkSQL.
>>>>> It's even more exciting to know if we could translate Spark
>>>>> Structured Streaming code by a similar way, which enables existing Spark
>>>>> SQL/Structure Streaming pipelines run on Beam.
>>>>>
>>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite and
>>>>> only found[1]. From that thread, I see that adding ZetaSQL to Calcite
>>>>> itself is still a discussion. I am also looking for if anyone knows more
>>>>> progress on this work than the thread.
>>>>>
>>>>>
>>>>> [1]:
>>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupE6poytRXhjri9Q8w@mail.gmail.com%3E
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> I hear rumours that the Calcite project is planning on adding a
>>>>>> zeta-SQL compatible parser to Calcite itself, in which case there will be a
>>>>>> Java parser we can use as well. Does anyone know if this work is still
>>>>>> going on?
>>>>>>
>>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> A question to the community, does the size of the change require any
>>>>>>>> process besides the usual PR reviews?
>>>>>>>>
>>>>>>>
>>>>>>> I think so. This is a big change and has come as kind of a surprise
>>>>>>> (sorry if I've missed previous discussions).
>>>>>>>
>>>>>>> Rui, could you explain more on how things will play out between
>>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface would
>>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is
>>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on
>>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I
>>>>>>> could barely find any doc for end users.
>>>>>>>
>>>>>>> Also, I'd prefer the PR to be split into two, one for the pluggable
>>>>>>> interface and one for the ZetaSQL.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thank you Rui for the heads up.
>>>>>>>>
>>>>>>>> A question to the community, does the size of the change require
>>>>>>>> any process besides the usual PR reviews?
>>>>>>>>
>>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Hi community,
>>>>>>>>>
>>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>>>>>> ZetaSQL's documentation[2].
>>>>>>>>>
>>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made
>>>>>>>>> a plugable query planner interface in BeamSQL, and we can easily plug in a
>>>>>>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>>>>>>>> planners by this way (e.g. PostgreSQL dialect).
>>>>>>>>>
>>>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to
>>>>>>>>> Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>>>>>>>> contribution barely touch existing Beam code (because the idea is plugable
>>>>>>>>> planner).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Acknowledgement*
>>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>>>>>> thanks to contributions which are not listed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>>>> [3]:
>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> ----
>>>> Mingmin
>>>>
>>>
>>
>> --
>> ----
>> Mingmin
>>
>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Kenneth Knowles <ke...@apache.org>.
Nice explanations of the reasoning. I think two things will stay
approximately the same even as the ecosystem develops: (1) ZetaSQL has
pretty clear semantics so we will have a compliant parser, whether it is
the official one or another like Calcite Babel, and (2) we will need a way
to implement all the standard ZetaSQL functions and this will be the same
no matter the frontend.

For a contribution this large where i.p. clearance is necessary, a vote is
appropriate. It can happen at the same time or even after i.p. clearance.

Kenn

On Wed, Aug 7, 2019 at 1:08 PM Mingmin Xu <mi...@gmail.com> wrote:

> Thanks to highlight the parts of types/operators/functions/..., that does
> make things more complicated. +1 that as a short/middle term solution, the
> proposal is reasonable. We could follow up in future to handle it in
> Calcite Babel if possible.
>
> Mingmin
>
> On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ru...@google.com> wrote:
>
>> Hi Mingmin,
>>
>> Honestly I don't have an answer to it: a SQL dialect is complicated and I
>> don't have enough understanding on Calcite (Calcite has a big repo). Based
>> on my read from CALCITE-2280
>> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
>> standard sql that a dialect is, the less blockers that we will have to
>> support this dialect in Calcite babel parser.
>>
>> However, this is a good question, which raises a good aspect that I found
>> people usually ignore: supporting a SQL dialect is not only support a type
>> of syntax. It also includes data types, built-in sql functions, operators
>> and many other stuff.
>>
>> I especially found the following incompatibilities between Calcite and
>> ZetaSQL during the development:
>> 1. Calcite does not support Struct/Row type well because Calcite flattens
>> Rows when reading from tables by adding an extra Projection on top of
>> tables.
>> 2. I had trouble in supporting DATETIME(or timestamp without time zone)
>> type.
>> 3. Huge incompatibilities on SQL functions. E.g. return type is different
>> for AVG(long), and many many more.
>> 4. I am not sure if Calcite has the same set of type casting rules as
>> BigQuery(my impression is there are differences).
>>
>>
>> I would say in the short/mid term, it's much easier to use logical plan
>> as IR to implement another SQL dialect for BeamSQL (Linkedin has
>> similar practice, see their blog post
>> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
>> ).
>>
>> For the longer term, it would be interesting to see how we can add
>> BigQuery syntax (plus its data types and sql functions) to Calcite babel
>> parser.
>>
>>
>>
>> -Rui
>>
>>
>> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mi...@gmail.com> wrote:
>>
>>> Just take a look at https://issues.apache.org/jira/browse/CALCITE-2280
>>> which introduced Babel parser in Calcite to support varied dialects, this
>>> may be an easier way to support BigQuery syntax. @Rui do you notice any big
>>> difference between Calcite engine and ZetaSQL, like parsing, optimization?
>>> If that's the case, it make sense to build the alternative switch in Beam
>>> side.
>>>
>>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ru...@google.com> wrote:
>>>
>>>> Mingmin - it sounds like an awesome idea to translate from SparkSQL.
>>>> It's even more exciting to know if we could translate Spark
>>>> Structured Streaming code by a similar way, which enables existing Spark
>>>> SQL/Structure Streaming pipelines run on Beam.
>>>>
>>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite and
>>>> only found[1]. From that thread, I see that adding ZetaSQL to Calcite
>>>> itself is still a discussion. I am also looking for if anyone knows more
>>>> progress on this work than the thread.
>>>>
>>>>
>>>> [1]:
>>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupE6poytRXhjri9Q8w@mail.gmail.com%3E
>>>>
>>>> -Rui
>>>>
>>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> I hear rumours that the Calcite project is planning on adding a
>>>>> zeta-SQL compatible parser to Calcite itself, in which case there will be a
>>>>> Java parser we can use as well. Does anyone know if this work is still
>>>>> going on?
>>>>>
>>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> A question to the community, does the size of the change require any
>>>>>>> process besides the usual PR reviews?
>>>>>>>
>>>>>>
>>>>>> I think so. This is a big change and has come as kind of a surprise
>>>>>> (sorry if I've missed previous discussions).
>>>>>>
>>>>>> Rui, could you explain more on how things will play out between
>>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface would
>>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is
>>>>>> a port or a connector to ZetaSQL ? Do we need to depend on
>>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I
>>>>>> could barely find any doc for end users.
>>>>>>
>>>>>> Also, I'd prefer the PR to be split into two, one for the pluggable
>>>>>> interface and one for the ZetaSQL.
>>>>>>
>>>>>> Thanks,
>>>>>> Manu
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>>>>>
>>>>>>> Thank you Rui for the heads up.
>>>>>>>
>>>>>>> A question to the community, does the size of the change require any
>>>>>>> process besides the usual PR reviews?
>>>>>>>
>>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>>>>>>
>>>>>>>> Hi community,
>>>>>>>>
>>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>>>>> ZetaSQL's documentation[2].
>>>>>>>>
>>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made
>>>>>>>> a plugable query planner interface in BeamSQL, and we can easily plug in a
>>>>>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>>>>>>> planners by this way (e.g. PostgreSQL dialect).
>>>>>>>>
>>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to
>>>>>>>> Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>>>>>>> contribution barely touch existing Beam code (because the idea is plugable
>>>>>>>> planner).
>>>>>>>>
>>>>>>>>
>>>>>>>> *Acknowledgement*
>>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>>>>> thanks to contributions which are not listed.
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>>> [3]:
>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>>>
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>
>>>
>>> --
>>> ----
>>> Mingmin
>>>
>>
>
> --
> ----
> Mingmin
>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Mingmin Xu <mi...@gmail.com>.
Thanks to highlight the parts of types/operators/functions/..., that does
make things more complicated. +1 that as a short/middle term solution, the
proposal is reasonable. We could follow up in future to handle it in
Calcite Babel if possible.

Mingmin

On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ru...@google.com> wrote:

> Hi Mingmin,
>
> Honestly I don't have an answer to it: a SQL dialect is complicated and I
> don't have enough understanding on Calcite (Calcite has a big repo). Based
> on my read from CALCITE-2280
> <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
> standard sql that a dialect is, the less blockers that we will have to
> support this dialect in Calcite babel parser.
>
> However, this is a good question, which raises a good aspect that I found
> people usually ignore: supporting a SQL dialect is not only support a type
> of syntax. It also includes data types, built-in sql functions, operators
> and many other stuff.
>
> I especially found the following incompatibilities between Calcite and
> ZetaSQL during the development:
> 1. Calcite does not support Struct/Row type well because Calcite flattens
> Rows when reading from tables by adding an extra Projection on top of
> tables.
> 2. I had trouble in supporting DATETIME(or timestamp without time zone)
> type.
> 3. Huge incompatibilities on SQL functions. E.g. return type is different
> for AVG(long), and many many more.
> 4. I am not sure if Calcite has the same set of type casting rules as
> BigQuery(my impression is there are differences).
>
>
> I would say in the short/mid term, it's much easier to use logical plan as
> IR to implement another SQL dialect for BeamSQL (Linkedin has
> similar practice, see their blog post
> <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
> ).
>
> For the longer term, it would be interesting to see how we can add
> BigQuery syntax (plus its data types and sql functions) to Calcite babel
> parser.
>
>
>
> -Rui
>
>
> On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mi...@gmail.com> wrote:
>
>> Just take a look at https://issues.apache.org/jira/browse/CALCITE-2280
>> which introduced Babel parser in Calcite to support varied dialects, this
>> may be an easier way to support BigQuery syntax. @Rui do you notice any big
>> difference between Calcite engine and ZetaSQL, like parsing, optimization?
>> If that's the case, it make sense to build the alternative switch in Beam
>> side.
>>
>> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ru...@google.com> wrote:
>>
>>> Mingmin - it sounds like an awesome idea to translate from SparkSQL.
>>> It's even more exciting to know if we could translate Spark
>>> Structured Streaming code by a similar way, which enables existing Spark
>>> SQL/Structure Streaming pipelines run on Beam.
>>>
>>> Reuven - Thanks for bringing it up. I tried to search dev@calcite and
>>> only found[1]. From that thread, I see that adding ZetaSQL to Calcite
>>> itself is still a discussion. I am also looking for if anyone knows more
>>> progress on this work than the thread.
>>>
>>>
>>> [1]:
>>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupE6poytRXhjri9Q8w@mail.gmail.com%3E
>>>
>>> -Rui
>>>
>>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> I hear rumours that the Calcite project is planning on adding a
>>>> zeta-SQL compatible parser to Calcite itself, in which case there will be a
>>>> Java parser we can use as well. Does anyone know if this work is still
>>>> going on?
>>>>
>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com>
>>>> wrote:
>>>>
>>>>> A question to the community, does the size of the change require any
>>>>>> process besides the usual PR reviews?
>>>>>>
>>>>>
>>>>> I think so. This is a big change and has come as kind of a surprise
>>>>> (sorry if I've missed previous discussions).
>>>>>
>>>>> Rui, could you explain more on how things will play out between
>>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface would
>>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is
>>>>> a port or a connector to ZetaSQL ? Do we need to depend on
>>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I
>>>>> could barely find any doc for end users.
>>>>>
>>>>> Also, I'd prefer the PR to be split into two, one for the pluggable
>>>>> interface and one for the ZetaSQL.
>>>>>
>>>>> Thanks,
>>>>> Manu
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> Thank you Rui for the heads up.
>>>>>>
>>>>>> A question to the community, does the size of the change require any
>>>>>> process besides the usual PR reviews?
>>>>>>
>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>>>>>
>>>>>>> Hi community,
>>>>>>>
>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>>>> ZetaSQL's documentation[2].
>>>>>>>
>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
>>>>>>> plugable query planner interface in BeamSQL, and we can easily plug in a
>>>>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>>>>>> planners by this way (e.g. PostgreSQL dialect).
>>>>>>>
>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to
>>>>>>> Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>>>>>> contribution barely touch existing Beam code (because the idea is plugable
>>>>>>> planner).
>>>>>>>
>>>>>>>
>>>>>>> *Acknowledgement*
>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>>>> thanks to contributions which are not listed.
>>>>>>>
>>>>>>>
>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>> [3]:
>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>>
>>>>>>>
>>>>>>> -Rui
>>>>>>>
>>>>>>
>>
>> --
>> ----
>> Mingmin
>>
>

-- 
----
Mingmin

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Rui Wang <ru...@google.com>.
Hi Mingmin,

Honestly I don't have an answer to it: a SQL dialect is complicated and I
don't have enough understanding on Calcite (Calcite has a big repo). Based
on my read from CALCITE-2280
<https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to
standard sql that a dialect is, the less blockers that we will have to
support this dialect in Calcite babel parser.

However, this is a good question, which raises a good aspect that I found
people usually ignore: supporting a SQL dialect is not only support a type
of syntax. It also includes data types, built-in sql functions, operators
and many other stuff.

I especially found the following incompatibilities between Calcite and
ZetaSQL during the development:
1. Calcite does not support Struct/Row type well because Calcite flattens
Rows when reading from tables by adding an extra Projection on top of
tables.
2. I had trouble in supporting DATETIME(or timestamp without time zone)
type.
3. Huge incompatibilities on SQL functions. E.g. return type is different
for AVG(long), and many many more.
4. I am not sure if Calcite has the same set of type casting rules as
BigQuery(my impression is there are differences).


I would say in the short/mid term, it's much easier to use logical plan as
IR to implement another SQL dialect for BeamSQL (Linkedin has
similar practice, see their blog post
<https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite>
).

For the longer term, it would be interesting to see how we can add BigQuery
syntax (plus its data types and sql functions) to Calcite babel parser.



-Rui


On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mi...@gmail.com> wrote:

> Just take a look at https://issues.apache.org/jira/browse/CALCITE-2280
> which introduced Babel parser in Calcite to support varied dialects, this
> may be an easier way to support BigQuery syntax. @Rui do you notice any big
> difference between Calcite engine and ZetaSQL, like parsing, optimization?
> If that's the case, it make sense to build the alternative switch in Beam
> side.
>
> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ru...@google.com> wrote:
>
>> Mingmin - it sounds like an awesome idea to translate from SparkSQL. It's
>> even more exciting to know if we could translate Spark Structured Streaming
>> code by a similar way, which enables existing Spark SQL/Structure Streaming
>> pipelines run on Beam.
>>
>> Reuven - Thanks for bringing it up. I tried to search dev@calcite and
>> only found[1]. From that thread, I see that adding ZetaSQL to Calcite
>> itself is still a discussion. I am also looking for if anyone knows more
>> progress on this work than the thread.
>>
>>
>> [1]:
>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupE6poytRXhjri9Q8w@mail.gmail.com%3E
>>
>> -Rui
>>
>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote:
>>
>>> I hear rumours that the Calcite project is planning on adding a zeta-SQL
>>> compatible parser to Calcite itself, in which case there will be a Java
>>> parser we can use as well. Does anyone know if this work is still going on?
>>>
>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com>
>>> wrote:
>>>
>>>> A question to the community, does the size of the change require any
>>>>> process besides the usual PR reviews?
>>>>>
>>>>
>>>> I think so. This is a big change and has come as kind of a surprise
>>>> (sorry if I've missed previous discussions).
>>>>
>>>> Rui, could you explain more on how things will play out between BeamSQL
>>>> and ZetaSQL (A design doc including the pluggable interface would be
>>>> perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is a
>>>> port or a connector to ZetaSQL ? Do we need to depend on
>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I
>>>> could barely find any doc for end users.
>>>>
>>>> Also, I'd prefer the PR to be split into two, one for the pluggable
>>>> interface and one for the ZetaSQL.
>>>>
>>>> Thanks,
>>>> Manu
>>>>
>>>>
>>>>
>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Thank you Rui for the heads up.
>>>>>
>>>>> A question to the community, does the size of the change require any
>>>>> process besides the usual PR reviews?
>>>>>
>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>>>>
>>>>>> Hi community,
>>>>>>
>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>>> ZetaSQL's documentation[2].
>>>>>>
>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
>>>>>> plugable query planner interface in BeamSQL, and we can easily plug in a
>>>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>>>>> planners by this way (e.g. PostgreSQL dialect).
>>>>>>
>>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to
>>>>>> Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>>>>> contribution barely touch existing Beam code (because the idea is plugable
>>>>>> planner).
>>>>>>
>>>>>>
>>>>>> *Acknowledgement*
>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>>> thanks to contributions which are not listed.
>>>>>>
>>>>>>
>>>>>> [1]: https://github.com/google/zetasql
>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>> [3]:
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>
>
> --
> ----
> Mingmin
>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Mingmin Xu <mi...@gmail.com>.
Just take a look at https://issues.apache.org/jira/browse/CALCITE-2280
which introduced Babel parser in Calcite to support varied dialects, this
may be an easier way to support BigQuery syntax. @Rui do you notice any big
difference between Calcite engine and ZetaSQL, like parsing, optimization?
If that's the case, it make sense to build the alternative switch in Beam
side.

On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ru...@google.com> wrote:

> Mingmin - it sounds like an awesome idea to translate from SparkSQL. It's
> even more exciting to know if we could translate Spark Structured Streaming
> code by a similar way, which enables existing Spark SQL/Structure Streaming
> pipelines run on Beam.
>
> Reuven - Thanks for bringing it up. I tried to search dev@calcite and
> only found[1]. From that thread, I see that adding ZetaSQL to Calcite
> itself is still a discussion. I am also looking for if anyone knows more
> progress on this work than the thread.
>
>
> [1]:
> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupE6poytRXhjri9Q8w@mail.gmail.com%3E
>
> -Rui
>
> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote:
>
>> I hear rumours that the Calcite project is planning on adding a zeta-SQL
>> compatible parser to Calcite itself, in which case there will be a Java
>> parser we can use as well. Does anyone know if this work is still going on?
>>
>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com>
>> wrote:
>>
>>> A question to the community, does the size of the change require any
>>>> process besides the usual PR reviews?
>>>>
>>>
>>> I think so. This is a big change and has come as kind of a surprise
>>> (sorry if I've missed previous discussions).
>>>
>>> Rui, could you explain more on how things will play out between BeamSQL
>>> and ZetaSQL (A design doc including the pluggable interface would be
>>> perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is a
>>> port or a connector to ZetaSQL ? Do we need to depend on
>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I
>>> could barely find any doc for end users.
>>>
>>> Also, I'd prefer the PR to be split into two, one for the pluggable
>>> interface and one for the ZetaSQL.
>>>
>>> Thanks,
>>> Manu
>>>
>>>
>>>
>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> Thank you Rui for the heads up.
>>>>
>>>> A question to the community, does the size of the change require any
>>>> process besides the usual PR reviews?
>>>>
>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>>>
>>>>> Hi community,
>>>>>
>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>> ZetaSQL's documentation[2].
>>>>>
>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
>>>>> plugable query planner interface in BeamSQL, and we can easily plug in a
>>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>>>> planners by this way (e.g. PostgreSQL dialect).
>>>>>
>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to
>>>>> Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>>>> contribution barely touch existing Beam code (because the idea is plugable
>>>>> planner).
>>>>>
>>>>>
>>>>> *Acknowledgement*
>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>> thanks to contributions which are not listed.
>>>>>
>>>>>
>>>>> [1]: https://github.com/google/zetasql
>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>> [3]:
>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>
>>>>>
>>>>> -Rui
>>>>>
>>>>

-- 
----
Mingmin

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Rui Wang <ru...@google.com>.
Mingmin - it sounds like an awesome idea to translate from SparkSQL. It's
even more exciting to know if we could translate Spark Structured Streaming
code by a similar way, which enables existing Spark SQL/Structure Streaming
pipelines run on Beam.

Reuven - Thanks for bringing it up. I tried to search dev@calcite and only
found[1]. From that thread, I see that adding ZetaSQL to Calcite itself is
still a discussion. I am also looking for if anyone knows more progress on
this work than the thread.


[1]:
http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupE6poytRXhjri9Q8w@mail.gmail.com%3E

-Rui

On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote:

> I hear rumours that the Calcite project is planning on adding a zeta-SQL
> compatible parser to Calcite itself, in which case there will be a Java
> parser we can use as well. Does anyone know if this work is still going on?
>
> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com> wrote:
>
>> A question to the community, does the size of the change require any
>>> process besides the usual PR reviews?
>>>
>>
>> I think so. This is a big change and has come as kind of a surprise
>> (sorry if I've missed previous discussions).
>>
>> Rui, could you explain more on how things will play out between BeamSQL
>> and ZetaSQL (A design doc including the pluggable interface would be
>> perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is a
>> port or a connector to ZetaSQL ? Do we need to depend on
>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I
>> could barely find any doc for end users.
>>
>> Also, I'd prefer the PR to be split into two, one for the pluggable
>> interface and one for the ZetaSQL.
>>
>> Thanks,
>> Manu
>>
>>
>>
>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>
>>> Thank you Rui for the heads up.
>>>
>>> A question to the community, does the size of the change require any
>>> process besides the usual PR reviews?
>>>
>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>>
>>>> Hi community,
>>>>
>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>> ZetaSQL's documentation[2].
>>>>
>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
>>>> plugable query planner interface in BeamSQL, and we can easily plug in a
>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>>> planners by this way (e.g. PostgreSQL dialect).
>>>>
>>>> I want to contribute ZetaSQL planner and its related code(~10k) to Beam
>>>> repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>>> contribution barely touch existing Beam code (because the idea is plugable
>>>> planner).
>>>>
>>>>
>>>> *Acknowledgement*
>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>> thanks to contributions which are not listed.
>>>>
>>>>
>>>> [1]: https://github.com/google/zetasql
>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>> [3]:
>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>
>>>>
>>>> -Rui
>>>>
>>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Reuven Lax <re...@google.com>.
I hear rumours that the Calcite project is planning on adding a zeta-SQL
compatible parser to Calcite itself, in which case there will be a Java
parser we can use as well. Does anyone know if this work is still going on?

On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com> wrote:

> A question to the community, does the size of the change require any
>> process besides the usual PR reviews?
>>
>
> I think so. This is a big change and has come as kind of a surprise (sorry
> if I've missed previous discussions).
>
> Rui, could you explain more on how things will play out between BeamSQL
> and ZetaSQL (A design doc including the pluggable interface would be
> perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing is a
> port or a connector to ZetaSQL ? Do we need to depend on
> https://github.com/google/zetasql ? ZetaSQL looks interesting but I could
> barely find any doc for end users.
>
> Also, I'd prefer the PR to be split into two, one for the pluggable
> interface and one for the ZetaSQL.
>
> Thanks,
> Manu
>
>
>
> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>
>> Thank you Rui for the heads up.
>>
>> A question to the community, does the size of the change require any
>> process besides the usual PR reviews?
>>
>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>
>>> Hi community,
>>>
>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>> ZetaSQL's documentation[2].
>>>
>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
>>> plugable query planner interface in BeamSQL, and we can easily plug in a
>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>> planners by this way (e.g. PostgreSQL dialect).
>>>
>>> I want to contribute ZetaSQL planner and its related code(~10k) to Beam
>>> repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>> contribution barely touch existing Beam code (because the idea is plugable
>>> planner).
>>>
>>>
>>> *Acknowledgement*
>>> Thanks to all the people who provided help during Beam ZetaSQL
>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>> thanks to contributions which are not listed.
>>>
>>>
>>> [1]: https://github.com/google/zetasql
>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>> [3]:
>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>
>>>
>>> -Rui
>>>
>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Mingmin Xu <mi...@gmail.com>.
Interesting feature, thanks Rui to bring the new option. Please keep me in loop, I’ll take a look when back to home tomorrow. It seems the chance to support other dialects, we see lots of concerns to translate from like SparkSQL.

Mingmin 
Sent from my iPhone

> On Aug 4, 2019, at 2:43 PM, Rui Wang <ru...@google.com> wrote:
> 
> Hi David,
> 
> That's a good point. I just add a section to discuss benefits in the doc (link).
> 
> 
> -Rui
> 
>> On Sun, Aug 4, 2019 at 2:01 PM David Morávek <da...@gmail.com> wrote:
>> Hi Rui,
>> 
>> This is definitely an interesting topic! Can you please elaborate little bit more about the benefits, that this will bring to the end user? All the documents only cover technical details and I'm still not sure what you're trying to achieve product-wise.
>> 
>> Best,
>> D.
>> 
>>> On Sun, Aug 4, 2019 at 8:07 PM Rui Wang <ru...@google.com> wrote:
>>> I created a google doc to explain basic design on Beam ZetaSQL: https://docs.google.com/document/d/14Yi4oEMzqS3n9-LfSNi6Q6kQpEP3gWTHzX0HxqUksdc/edit?usp=sharing
>>> 
>>> 
>>> 
>>> -Rui
>>> 
>>>> On Sun, Aug 4, 2019 at 10:02 AM Rui Wang <ru...@google.com> wrote:
>>>> Thanks Manu for you feedback! Some comments inlined:
>>>> 
>>>> 
>>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com> wrote:
>>>>>> A question to the community, does the size of the change require any process besides the usual PR reviews?
>>>>> 
>>>>> I think so. This is a big change and has come as kind of a surprise (sorry if I've missed previous discussions). 
>>>>> 
>>>>> Rui, could you explain more on how things will play out between BeamSQL and ZetaSQL (A design doc including the pluggable interface would be perfect).
>>>> 
>>>> I see. I will have a document about some basic idea on Beam ZetaSQL (this is my way to call "ZetaSQL as a SQL dialect in BeamSQL", and I usually use Beam CalciteSQL to refer to Calcite's SQL dialect.).
>>>> 
>>>> At least from users perspective, it's simple to use: setup planner name in BeamSqlPipelineOptions and BeamSQL will initialize different planners: either Calcite or ZetaSQL is supported now.
>>>>  
>>>>> From GitHub, ZetaSQL is mainly in C++ so what you are doing is a port or a connector to ZetaSQL? Do we need to depend on https://github.com/google/zetasql ? ZetaSQL looks interesting but I could barely find any doc for end users.
>>>> 
>>>> ZetaSQL provides a Java interface which calls c++ binary through JNI. For using ZetaSQL in BeamSQL, we only need to depend on ZetaSQL jars in maven central (https://mvnrepository.com/search?q=zetasql). These jars contains all we need to call ZetaSQL analyzer by Java.
>>>> 
>>>>> 
>>>>> Also, I'd prefer the PR to be split into two, one for the pluggable interface and one for the ZetaSQL.
>>>>> 
>>>> Pluggable planner is already a separate PR merged before: https://github.com/apache/beam/pull/7745  
>>>> 
>>>> 
>>>> -Rui
>>>> 
>>>>  
>>>>> Thanks,
>>>>> Manu
>>>>> 
>>>>>  
>>>>> 
>>>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>>>>> Thank you Rui for the heads up.
>>>>>> 
>>>>>> A question to the community, does the size of the change require any process besides the usual PR reviews?
>>>>>> 
>>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>>>>>> Hi community,
>>>>>>> 
>>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is ZetaSQL's documentation[2].
>>>>>>> 
>>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a plugable query planner interface in BeamSQL, and we can easily plug in a new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new planners by this way (e.g. PostgreSQL dialect).       
>>>>>>> 
>>>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to Beam repo(#9210). This contribution barely touch existing Beam code (because the idea is plugable planner).
>>>>>>> 
>>>>>>> 
>>>>>>> Acknowledgement
>>>>>>> Thanks to all the people who provided help during Beam ZetaSQL development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles, Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also thanks to contributions which are not listed.
>>>>>>> 
>>>>>>> 
>>>>>>> [1]: https://github.com/google/zetasql
>>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>>> [3]: https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>> 
>>>>>>> 
>>>>>>> -Rui

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Rui Wang <ru...@google.com>.
Hi David,

That's a good point. I just add a section to discuss benefits in the doc (
link
<https://docs.google.com/document/d/14Yi4oEMzqS3n9-LfSNi6Q6kQpEP3gWTHzX0HxqUksdc/edit#heading=h.bkpdcuy05he>
).


-Rui

On Sun, Aug 4, 2019 at 2:01 PM David Morávek <da...@gmail.com>
wrote:

> Hi Rui,
>
> This is definitely an interesting topic! Can you please elaborate little
> bit more about the benefits, that this will bring to the end user? All the
> documents only cover technical details and I'm still not sure what you're
> trying to achieve product-wise.
>
> Best,
> D.
>
> On Sun, Aug 4, 2019 at 8:07 PM Rui Wang <ru...@google.com> wrote:
>
>> I created a google doc to explain basic design on Beam ZetaSQL:
>> https://docs.google.com/document/d/14Yi4oEMzqS3n9-LfSNi6Q6kQpEP3gWTHzX0HxqUksdc/edit?usp=sharing
>>
>>
>>
>> -Rui
>>
>> On Sun, Aug 4, 2019 at 10:02 AM Rui Wang <ru...@google.com> wrote:
>>
>>> Thanks Manu for you feedback! Some comments inlined:
>>>
>>>
>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com>
>>> wrote:
>>>
>>>> A question to the community, does the size of the change require any
>>>>> process besides the usual PR reviews?
>>>>>
>>>>
>>>> I think so. This is a big change and has come as kind of a surprise
>>>> (sorry if I've missed previous discussions).
>>>>
>>>> Rui, could you explain more on how things will play out between BeamSQL
>>>> and ZetaSQL (A design doc including the pluggable interface would be
>>>> perfect).
>>>>
>>>
>>> I see. I will have a document about some basic idea on Beam ZetaSQL
>>> (this is my way to call "ZetaSQL as a SQL dialect in BeamSQL", and I
>>> usually use Beam CalciteSQL to refer to Calcite's SQL dialect.).
>>>
>>> At least from users perspective, it's simple to use: setup planner name
>>> in BeamSqlPipelineOptions
>>> <https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L29> and
>>> BeamSQL will initialize different planners: either Calcite or ZetaSQL is
>>> supported now.
>>>
>>>
>>>> From GitHub, ZetaSQL is mainly in C++ so what you are doing is a port
>>>> or a connector to ZetaSQL? Do we need to depend on
>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I
>>>> could barely find any doc for end users.
>>>>
>>>
>>> ZetaSQL provides a Java interface which calls c++ binary through JNI.
>>> For using ZetaSQL in BeamSQL, we only need to depend on ZetaSQL jars in
>>> maven central (https://mvnrepository.com/search?q=zetasql). These jars
>>> contains all we need to call ZetaSQL analyzer by Java.
>>>
>>>
>>>> Also, I'd prefer the PR to be split into two, one for the pluggable
>>>> interface and one for the ZetaSQL.
>>>>
>>>> Pluggable planner is already a separate PR merged before:
>>> https://github.com/apache/beam/pull/7745
>>>
>>>
>>> -Rui
>>>
>>>
>>>
>>>> Thanks,
>>>> Manu
>>>>
>>>>
>>>>
>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Thank you Rui for the heads up.
>>>>>
>>>>> A question to the community, does the size of the change require any
>>>>> process besides the usual PR reviews?
>>>>>
>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>>>>
>>>>>> Hi community,
>>>>>>
>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>>> ZetaSQL's documentation[2].
>>>>>>
>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
>>>>>> plugable query planner interface in BeamSQL, and we can easily plug in a
>>>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>>>>> planners by this way (e.g. PostgreSQL dialect).
>>>>>>
>>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to
>>>>>> Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>>>>> contribution barely touch existing Beam code (because the idea is plugable
>>>>>> planner).
>>>>>>
>>>>>>
>>>>>> *Acknowledgement*
>>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>>> thanks to contributions which are not listed.
>>>>>>
>>>>>>
>>>>>> [1]: https://github.com/google/zetasql
>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>>> [3]:
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>>
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by David Morávek <da...@gmail.com>.
Hi Rui,

This is definitely an interesting topic! Can you please elaborate little
bit more about the benefits, that this will bring to the end user? All the
documents only cover technical details and I'm still not sure what you're
trying to achieve product-wise.

Best,
D.

On Sun, Aug 4, 2019 at 8:07 PM Rui Wang <ru...@google.com> wrote:

> I created a google doc to explain basic design on Beam ZetaSQL:
> https://docs.google.com/document/d/14Yi4oEMzqS3n9-LfSNi6Q6kQpEP3gWTHzX0HxqUksdc/edit?usp=sharing
>
>
>
> -Rui
>
> On Sun, Aug 4, 2019 at 10:02 AM Rui Wang <ru...@google.com> wrote:
>
>> Thanks Manu for you feedback! Some comments inlined:
>>
>>
>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com>
>> wrote:
>>
>>> A question to the community, does the size of the change require any
>>>> process besides the usual PR reviews?
>>>>
>>>
>>> I think so. This is a big change and has come as kind of a surprise
>>> (sorry if I've missed previous discussions).
>>>
>>> Rui, could you explain more on how things will play out between BeamSQL
>>> and ZetaSQL (A design doc including the pluggable interface would be
>>> perfect).
>>>
>>
>> I see. I will have a document about some basic idea on Beam ZetaSQL (this
>> is my way to call "ZetaSQL as a SQL dialect in BeamSQL", and I usually use
>> Beam CalciteSQL to refer to Calcite's SQL dialect.).
>>
>> At least from users perspective, it's simple to use: setup planner name
>> in BeamSqlPipelineOptions
>> <https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L29> and
>> BeamSQL will initialize different planners: either Calcite or ZetaSQL is
>> supported now.
>>
>>
>>> From GitHub, ZetaSQL is mainly in C++ so what you are doing is a port or
>>> a connector to ZetaSQL? Do we need to depend on
>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I
>>> could barely find any doc for end users.
>>>
>>
>> ZetaSQL provides a Java interface which calls c++ binary through JNI. For
>> using ZetaSQL in BeamSQL, we only need to depend on ZetaSQL jars in maven
>> central (https://mvnrepository.com/search?q=zetasql). These jars
>> contains all we need to call ZetaSQL analyzer by Java.
>>
>>
>>> Also, I'd prefer the PR to be split into two, one for the pluggable
>>> interface and one for the ZetaSQL.
>>>
>>> Pluggable planner is already a separate PR merged before:
>> https://github.com/apache/beam/pull/7745
>>
>>
>> -Rui
>>
>>
>>
>>> Thanks,
>>> Manu
>>>
>>>
>>>
>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> Thank you Rui for the heads up.
>>>>
>>>> A question to the community, does the size of the change require any
>>>> process besides the usual PR reviews?
>>>>
>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>>>
>>>>> Hi community,
>>>>>
>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>>> ZetaSQL's documentation[2].
>>>>>
>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
>>>>> plugable query planner interface in BeamSQL, and we can easily plug in a
>>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>>>> planners by this way (e.g. PostgreSQL dialect).
>>>>>
>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to
>>>>> Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>>>> contribution barely touch existing Beam code (because the idea is plugable
>>>>> planner).
>>>>>
>>>>>
>>>>> *Acknowledgement*
>>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>>> thanks to contributions which are not listed.
>>>>>
>>>>>
>>>>> [1]: https://github.com/google/zetasql
>>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>>> [3]:
>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>>
>>>>>
>>>>> -Rui
>>>>>
>>>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Rui Wang <ru...@google.com>.
I created a google doc to explain basic design on Beam ZetaSQL:
https://docs.google.com/document/d/14Yi4oEMzqS3n9-LfSNi6Q6kQpEP3gWTHzX0HxqUksdc/edit?usp=sharing



-Rui

On Sun, Aug 4, 2019 at 10:02 AM Rui Wang <ru...@google.com> wrote:

> Thanks Manu for you feedback! Some comments inlined:
>
>
> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com> wrote:
>
>> A question to the community, does the size of the change require any
>>> process besides the usual PR reviews?
>>>
>>
>> I think so. This is a big change and has come as kind of a surprise
>> (sorry if I've missed previous discussions).
>>
>> Rui, could you explain more on how things will play out between BeamSQL
>> and ZetaSQL (A design doc including the pluggable interface would be
>> perfect).
>>
>
> I see. I will have a document about some basic idea on Beam ZetaSQL (this
> is my way to call "ZetaSQL as a SQL dialect in BeamSQL", and I usually use
> Beam CalciteSQL to refer to Calcite's SQL dialect.).
>
> At least from users perspective, it's simple to use: setup planner name in
> BeamSqlPipelineOptions
> <https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L29> and
> BeamSQL will initialize different planners: either Calcite or ZetaSQL is
> supported now.
>
>
>> From GitHub, ZetaSQL is mainly in C++ so what you are doing is a port or
>> a connector to ZetaSQL? Do we need to depend on
>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I
>> could barely find any doc for end users.
>>
>
> ZetaSQL provides a Java interface which calls c++ binary through JNI. For
> using ZetaSQL in BeamSQL, we only need to depend on ZetaSQL jars in maven
> central (https://mvnrepository.com/search?q=zetasql). These jars contains
> all we need to call ZetaSQL analyzer by Java.
>
>
>> Also, I'd prefer the PR to be split into two, one for the pluggable
>> interface and one for the ZetaSQL.
>>
>> Pluggable planner is already a separate PR merged before:
> https://github.com/apache/beam/pull/7745
>
>
> -Rui
>
>
>
>> Thanks,
>> Manu
>>
>>
>>
>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>>
>>> Thank you Rui for the heads up.
>>>
>>> A question to the community, does the size of the change require any
>>> process besides the usual PR reviews?
>>>
>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>>
>>>> Hi community,
>>>>
>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>>> ZetaSQL's documentation[2].
>>>>
>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
>>>> plugable query planner interface in BeamSQL, and we can easily plug in a
>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>>> planners by this way (e.g. PostgreSQL dialect).
>>>>
>>>> I want to contribute ZetaSQL planner and its related code(~10k) to Beam
>>>> repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>>> contribution barely touch existing Beam code (because the idea is plugable
>>>> planner).
>>>>
>>>>
>>>> *Acknowledgement*
>>>> Thanks to all the people who provided help during Beam ZetaSQL
>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>>> thanks to contributions which are not listed.
>>>>
>>>>
>>>> [1]: https://github.com/google/zetasql
>>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>>> [3]:
>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>>
>>>>
>>>> -Rui
>>>>
>>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Rui Wang <ru...@google.com>.
Thanks Manu for you feedback! Some comments inlined:


On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <ow...@gmail.com> wrote:

> A question to the community, does the size of the change require any
>> process besides the usual PR reviews?
>>
>
> I think so. This is a big change and has come as kind of a surprise (sorry
> if I've missed previous discussions).
>
> Rui, could you explain more on how things will play out between BeamSQL
> and ZetaSQL (A design doc including the pluggable interface would be
> perfect).
>

I see. I will have a document about some basic idea on Beam ZetaSQL (this
is my way to call "ZetaSQL as a SQL dialect in BeamSQL", and I usually use
Beam CalciteSQL to refer to Calcite's SQL dialect.).

At least from users perspective, it's simple to use: setup planner name in
BeamSqlPipelineOptions
<https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L29>
and
BeamSQL will initialize different planners: either Calcite or ZetaSQL is
supported now.


> From GitHub, ZetaSQL is mainly in C++ so what you are doing is a port or a
> connector to ZetaSQL? Do we need to depend on
> https://github.com/google/zetasql ? ZetaSQL looks interesting but I could
> barely find any doc for end users.
>

ZetaSQL provides a Java interface which calls c++ binary through JNI. For
using ZetaSQL in BeamSQL, we only need to depend on ZetaSQL jars in maven
central (https://mvnrepository.com/search?q=zetasql). These jars contains
all we need to call ZetaSQL analyzer by Java.


> Also, I'd prefer the PR to be split into two, one for the pluggable
> interface and one for the ZetaSQL.
>
> Pluggable planner is already a separate PR merged before:
https://github.com/apache/beam/pull/7745


-Rui



> Thanks,
> Manu
>
>
>
> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:
>
>> Thank you Rui for the heads up.
>>
>> A question to the community, does the size of the change require any
>> process besides the usual PR reviews?
>>
>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>>
>>> Hi community,
>>>
>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in
>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is
>>> ZetaSQL's documentation[2].
>>>
>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
>>> plugable query planner interface in BeamSQL, and we can easily plug in a
>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>>> planners by this way (e.g. PostgreSQL dialect).
>>>
>>> I want to contribute ZetaSQL planner and its related code(~10k) to Beam
>>> repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>>> contribution barely touch existing Beam code (because the idea is plugable
>>> planner).
>>>
>>>
>>> *Acknowledgement*
>>> Thanks to all the people who provided help during Beam ZetaSQL
>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>>> thanks to contributions which are not listed.
>>>
>>>
>>> [1]: https://github.com/google/zetasql
>>> [2]: https://github.com/google/zetasql/tree/master/docs
>>> [3]:
>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>>
>>>
>>> -Rui
>>>
>>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Manu Zhang <ow...@gmail.com>.
>
> A question to the community, does the size of the change require any
> process besides the usual PR reviews?
>

I think so. This is a big change and has come as kind of a surprise (sorry
if I've missed previous discussions).

Rui, could you explain more on how things will play out between BeamSQL and
ZetaSQL (A design doc including the pluggable interface would be perfect).
From GitHub, ZetaSQL is mainly in C++ so what you are doing is a port or a
connector to ZetaSQL ? Do we need to depend on
https://github.com/google/zetasql ? ZetaSQL looks interesting but I could
barely find any doc for end users.

Also, I'd prefer the PR to be split into two, one for the pluggable
interface and one for the ZetaSQL.

Thanks,
Manu



On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote:

> Thank you Rui for the heads up.
>
> A question to the community, does the size of the change require any
> process besides the usual PR reviews?
>
> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:
>
>> Hi community,
>>
>> I have been working on supporting ZetaSQL[1] as a SQL dialect in BeamSQL.
>> ZetaSQL is a SQL analyzer open sourced by Google. Here is ZetaSQL's
>> documentation[2].
>>
>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
>> plugable query planner interface in BeamSQL, and we can easily plug in a
>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
>> planners by this way (e.g. PostgreSQL dialect).
>>
>> I want to contribute ZetaSQL planner and its related code(~10k) to Beam
>> repo(#9210 <https://github.com/apache/beam/pull/9210>). This
>> contribution barely touch existing Beam code (because the idea is plugable
>> planner).
>>
>>
>> *Acknowledgement*
>> Thanks to all the people who provided help during Beam ZetaSQL
>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
>> thanks to contributions which are not listed.
>>
>>
>> [1]: https://github.com/google/zetasql
>> [2]: https://github.com/google/zetasql/tree/master/docs
>> [3]:
>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>>
>>
>> -Rui
>>
>

Re: Support ZetaSQL as a new SQL dialect in BeamSQL

Posted by Ahmet Altay <al...@google.com>.
Thank you Rui for the heads up.

A question to the community, does the size of the change require any
process besides the usual PR reviews?

On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ru...@google.com> wrote:

> Hi community,
>
> I have been working on supporting ZetaSQL[1] as a SQL dialect in BeamSQL.
> ZetaSQL is a SQL analyzer open sourced by Google. Here is ZetaSQL's
> documentation[2].
>
> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a
> plugable query planner interface in BeamSQL, and we can easily plug in a
> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add new
> planners by this way (e.g. PostgreSQL dialect).
>
> I want to contribute ZetaSQL planner and its related code(~10k) to Beam
> repo(#9210 <https://github.com/apache/beam/pull/9210>). This contribution
> barely touch existing Beam code (because the idea is plugable planner).
>
>
> *Acknowledgement*
> Thanks to all the people who provided help during Beam ZetaSQL
> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth Knowles,
> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also
> thanks to contributions which are not listed.
>
>
> [1]: https://github.com/google/zetasql
> [2]: https://github.com/google/zetasql/tree/master/docs
> [3]:
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java
>
>
> -Rui
>