You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Steve Niemitz <sn...@apache.org> on 2020/03/26 19:40:59 UTC

ZetaSQL to Calcite translation layer

The ZetaSQL to calcite translation layer that is bundled with beam seems
generally useful in cases other than for beam.  In fact, we're using
(essentially a fork of) it internally outside of beam right now (and I've
fixed a bunch of bugs in it).

Has there ever been any thought about splitting into a separate library
without any beam dependencies?

Re: ZetaSQL to Calcite translation layer

Posted by Steve Niemitz <sn...@apache.org>.
It doesn't, I want to use the translation layer outside of beam.  A good
chunk of the code in the library is beam agnostic, but it also contains a
lot of beam dependencies that I don't want to pull in.

I think if ZetaSQLPlannerImpl and its dependency graph were separate,
that'd be all that's needed for it to stand alone.  From what I can tell (I
ended up extracting the code into my own repo), there are very few beam
dependencies involved there.

On Thu, Mar 26, 2020 at 5:23 PM Rui Wang <ru...@google.com> wrote:

> Hi Steve,
>
> Could you clarify a bit: could you use [1] directly to solve your case? If
> not, why?
>
>
> [1]:
> https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-extensions-sql-zetasql/2.17.0
>
>
>
>
> -Rui
>
> On Thu, Mar 26, 2020 at 1:23 PM Steve Niemitz <sn...@apache.org> wrote:
>
>> Oh I think I actually remember seeing that email on the calcite list. :)
>>
>> I agree that it being an alternate parser implementation in calcite
>> itself would be ideal, but also agree (sadly) that that'll probably be a
>> very slow process.
>>
>> Splitting it into its own library in beam seems ideal, the only problem I
>> can see is that beam is using a vendored version of calcite.  I think to be
>> useful the library itself would need to use a "stock" version of calcite.
>>
>> I think I'd have some time to spend on this as well, if we can figure out
>> a good way forward and agree on splitting it out.
>>
>> On Thu, Mar 26, 2020 at 4:04 PM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> I think it makes sense for the ZetaSQL to Calcite translation layer to
>>> live in Calcite itself, and did suggest it at one point on their dev list
>>> (See:
>>> https://lists.apache.org/thread.html/38942fcb4775ed71f9b2ab8880ab68a4238166ea5e904111ca184a12%40%3Cdev.calcite.apache.org%3E).
>>> I don't think there is a quick way to get there, but we could split up the
>>> interfaces within Beam so they are cleaner.
>>>
>>> It seems like a good next step would be to split up packages within
>>> Beam. We could add a set of core SQL interfaces that only depend on Calcite
>>> and then split our ZetaSQL translator into a piece that only depends on
>>> those interfaces, Calcite, and ZetaSQL.
>>>
>>> Andrew
>>>
>>> On Thu, Mar 26, 2020 at 12:41 PM Steve Niemitz <sn...@apache.org>
>>> wrote:
>>>
>>>> The ZetaSQL to calcite translation layer that is bundled with beam
>>>> seems generally useful in cases other than for beam.  In fact, we're using
>>>> (essentially a fork of) it internally outside of beam right now (and I've
>>>> fixed a bunch of bugs in it).
>>>>
>>>> Has there ever been any thought about splitting into a separate library
>>>> without any beam dependencies?
>>>>
>>>

Re: ZetaSQL to Calcite translation layer

Posted by Rui Wang <ru...@google.com>.
Hi Steve,

Could you clarify a bit: could you use [1] directly to solve your case? If
not, why?


[1]:
https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-extensions-sql-zetasql/2.17.0




-Rui

On Thu, Mar 26, 2020 at 1:23 PM Steve Niemitz <sn...@apache.org> wrote:

> Oh I think I actually remember seeing that email on the calcite list. :)
>
> I agree that it being an alternate parser implementation in calcite itself
> would be ideal, but also agree (sadly) that that'll probably be a very slow
> process.
>
> Splitting it into its own library in beam seems ideal, the only problem I
> can see is that beam is using a vendored version of calcite.  I think to be
> useful the library itself would need to use a "stock" version of calcite.
>
> I think I'd have some time to spend on this as well, if we can figure out
> a good way forward and agree on splitting it out.
>
> On Thu, Mar 26, 2020 at 4:04 PM Andrew Pilloud <ap...@google.com>
> wrote:
>
>> I think it makes sense for the ZetaSQL to Calcite translation layer to
>> live in Calcite itself, and did suggest it at one point on their dev list
>> (See:
>> https://lists.apache.org/thread.html/38942fcb4775ed71f9b2ab8880ab68a4238166ea5e904111ca184a12%40%3Cdev.calcite.apache.org%3E).
>> I don't think there is a quick way to get there, but we could split up the
>> interfaces within Beam so they are cleaner.
>>
>> It seems like a good next step would be to split up packages within Beam.
>> We could add a set of core SQL interfaces that only depend on Calcite and
>> then split our ZetaSQL translator into a piece that only depends on those
>> interfaces, Calcite, and ZetaSQL.
>>
>> Andrew
>>
>> On Thu, Mar 26, 2020 at 12:41 PM Steve Niemitz <sn...@apache.org>
>> wrote:
>>
>>> The ZetaSQL to calcite translation layer that is bundled with beam seems
>>> generally useful in cases other than for beam.  In fact, we're using
>>> (essentially a fork of) it internally outside of beam right now (and I've
>>> fixed a bunch of bugs in it).
>>>
>>> Has there ever been any thought about splitting into a separate library
>>> without any beam dependencies?
>>>
>>

Re: ZetaSQL to Calcite translation layer

Posted by Steve Niemitz <sn...@apache.org>.
Oh I think I actually remember seeing that email on the calcite list. :)

I agree that it being an alternate parser implementation in calcite itself
would be ideal, but also agree (sadly) that that'll probably be a very slow
process.

Splitting it into its own library in beam seems ideal, the only problem I
can see is that beam is using a vendored version of calcite.  I think to be
useful the library itself would need to use a "stock" version of calcite.

I think I'd have some time to spend on this as well, if we can figure out a
good way forward and agree on splitting it out.

On Thu, Mar 26, 2020 at 4:04 PM Andrew Pilloud <ap...@google.com> wrote:

> I think it makes sense for the ZetaSQL to Calcite translation layer to
> live in Calcite itself, and did suggest it at one point on their dev list
> (See:
> https://lists.apache.org/thread.html/38942fcb4775ed71f9b2ab8880ab68a4238166ea5e904111ca184a12%40%3Cdev.calcite.apache.org%3E).
> I don't think there is a quick way to get there, but we could split up the
> interfaces within Beam so they are cleaner.
>
> It seems like a good next step would be to split up packages within Beam.
> We could add a set of core SQL interfaces that only depend on Calcite and
> then split our ZetaSQL translator into a piece that only depends on those
> interfaces, Calcite, and ZetaSQL.
>
> Andrew
>
> On Thu, Mar 26, 2020 at 12:41 PM Steve Niemitz <sn...@apache.org>
> wrote:
>
>> The ZetaSQL to calcite translation layer that is bundled with beam seems
>> generally useful in cases other than for beam.  In fact, we're using
>> (essentially a fork of) it internally outside of beam right now (and I've
>> fixed a bunch of bugs in it).
>>
>> Has there ever been any thought about splitting into a separate library
>> without any beam dependencies?
>>
>

Re: ZetaSQL to Calcite translation layer

Posted by Andrew Pilloud <ap...@google.com>.
I think it makes sense for the ZetaSQL to Calcite translation layer to live
in Calcite itself, and did suggest it at one point on their dev list (See:
https://lists.apache.org/thread.html/38942fcb4775ed71f9b2ab8880ab68a4238166ea5e904111ca184a12%40%3Cdev.calcite.apache.org%3E).
I don't think there is a quick way to get there, but we could split up the
interfaces within Beam so they are cleaner.

It seems like a good next step would be to split up packages within Beam.
We could add a set of core SQL interfaces that only depend on Calcite and
then split our ZetaSQL translator into a piece that only depends on those
interfaces, Calcite, and ZetaSQL.

Andrew

On Thu, Mar 26, 2020 at 12:41 PM Steve Niemitz <sn...@apache.org> wrote:

> The ZetaSQL to calcite translation layer that is bundled with beam seems
> generally useful in cases other than for beam.  In fact, we're using
> (essentially a fork of) it internally outside of beam right now (and I've
> fixed a bunch of bugs in it).
>
> Has there ever been any thought about splitting into a separate library
> without any beam dependencies?
>