You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Michael Mior <mm...@apache.org> on 2021/04/07 19:30:22 UTC

Apache Arrow adapter

Hi all,

I wanted to share some work one of my (now former) students, Karshit
Shah, has done with integrating Apache Arrow into Calcite. Karshit has
written an Arrow adapter that's able to perform filtering and
projections natively on Arrow data using Gandiva so these expressions
can be JITed using LLVM. The pull request[0] needs some cleanup, but
the code is in relatively good shape.

Right now, the adapter only reads from files, but I think there are a
number of exciting extensions to this that are possible. For example,
Arrow has a client-server framework Flight which could be connected
with Calcite, perhaps via Avatica. (Andy Grove was doing some work on
this last year[1] although I'm not sure of the progress.)

The biggest blocker on this is actually not the Calcite code, but the
availability of a suitably built Arrow dependency with Gandiva along
with the appropriate CI configuration. I opened a JIRA on the Arrow
project with some more details[2].

I'd love some thoughts on the approach and some help in pushing this
over the finish line.

[0] https://github.com/apache/calcite/pull/2133
[1] https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28SCzmid1tycE4mQ@mail.gmail.com%3e
[2] https://issues.apache.org/jira/browse/ARROW-11135
--
Michael Mior
mmior@apache.org

Re: Apache Arrow adapter

Posted by Michael Mior <mm...@apache.org>.
Yes, it was a bit of a challenge to get working in the Linux and macOS
development environments we've been using. This is why I temporarily
checked in the jar, but this should certainly be removed before the PR
is merged.

--
Michael Mior
mmior@apache.org

Le sam. 10 avr. 2021 à 17:05, Julian Hyde <jh...@apache.org> a écrit :
>
> I've been trying to switch over to use the official Apache Arrow
> Gandiva 3.0.0 jar at Maven central. (Which means we can remove the
> 3.0.0-SNAPSHOT.jar that you had checked into arrow/libs.) That jar is
> built for macOS, and is a little more tricky to get running than the
> previous jar, which was built for Linux. I'll post to
> https://issues.apache.org/jira/browse/ARROW-11135 as I discover
> things.
>
> (Makes me glad we don't have any C++ code in Calcite. Making artifacts
> that work on multiple operating systems seems to be really
> challenging.)
>
> Julian
>
> On Sat, Apr 10, 2021 at 6:31 AM Michael Mior <mm...@apache.org> wrote:
> >
> > Thanks Julian! I really appreciate the help. I think beta would be
> > accurate here but it would be great to have this pushed so people can
> > start trying it out.
> >
> > --
> > Michael Mior
> > mmior@apache.org
> >
> > Le ven. 9 avr. 2021 à 20:37, Julian Hyde <jh...@apache.org> a écrit :
> > >
> > > Yes, thanks to Michael and Karshit for their great work.
> > >
> > > I am reviewing now, and doing some fix up (e.g. lint, repositories) so
> > > that we could get it into master as a "beta" component. I'll add
> > > updates in https://issues.apache.org/jira/browse/CALCITE-2040.
> > >
> > > On Wed, Apr 7, 2021 at 9:37 PM Fan Liya <li...@gmail.com> wrote:
> > > >
> > > > Hi Michael,
> > > >
> > > > Thanks for sharing the great work.
> > > > I believe it is important work for both communities.
> > > >
> > > > Best,
> > > > Liya Fan
> > > >
> > > >
> > > > On Thu, Apr 8, 2021 at 3:30 AM Michael Mior <mm...@apache.org> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I wanted to share some work one of my (now former) students, Karshit
> > > > > Shah, has done with integrating Apache Arrow into Calcite. Karshit has
> > > > > written an Arrow adapter that's able to perform filtering and
> > > > > projections natively on Arrow data using Gandiva so these expressions
> > > > > can be JITed using LLVM. The pull request[0] needs some cleanup, but
> > > > > the code is in relatively good shape.
> > > > >
> > > > > Right now, the adapter only reads from files, but I think there are a
> > > > > number of exciting extensions to this that are possible. For example,
> > > > > Arrow has a client-server framework Flight which could be connected
> > > > > with Calcite, perhaps via Avatica. (Andy Grove was doing some work on
> > > > > this last year[1] although I'm not sure of the progress.)
> > > > >
> > > > > The biggest blocker on this is actually not the Calcite code, but the
> > > > > availability of a suitably built Arrow dependency with Gandiva along
> > > > > with the appropriate CI configuration. I opened a JIRA on the Arrow
> > > > > project with some more details[2].
> > > > >
> > > > > I'd love some thoughts on the approach and some help in pushing this
> > > > > over the finish line.
> > > > >
> > > > > [0] https://github.com/apache/calcite/pull/2133
> > > > > [1]
> > > > > https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28SCzmid1tycE4mQ@mail.gmail.com%3e
> > > > > [2] https://issues.apache.org/jira/browse/ARROW-11135
> > > > > --
> > > > > Michael Mior
> > > > > mmior@apache.org
> > > > >

Re: Apache Arrow adapter

Posted by Julian Hyde <jh...@apache.org>.
I've been trying to switch over to use the official Apache Arrow
Gandiva 3.0.0 jar at Maven central. (Which means we can remove the
3.0.0-SNAPSHOT.jar that you had checked into arrow/libs.) That jar is
built for macOS, and is a little more tricky to get running than the
previous jar, which was built for Linux. I'll post to
https://issues.apache.org/jira/browse/ARROW-11135 as I discover
things.

(Makes me glad we don't have any C++ code in Calcite. Making artifacts
that work on multiple operating systems seems to be really
challenging.)

Julian

On Sat, Apr 10, 2021 at 6:31 AM Michael Mior <mm...@apache.org> wrote:
>
> Thanks Julian! I really appreciate the help. I think beta would be
> accurate here but it would be great to have this pushed so people can
> start trying it out.
>
> --
> Michael Mior
> mmior@apache.org
>
> Le ven. 9 avr. 2021 à 20:37, Julian Hyde <jh...@apache.org> a écrit :
> >
> > Yes, thanks to Michael and Karshit for their great work.
> >
> > I am reviewing now, and doing some fix up (e.g. lint, repositories) so
> > that we could get it into master as a "beta" component. I'll add
> > updates in https://issues.apache.org/jira/browse/CALCITE-2040.
> >
> > On Wed, Apr 7, 2021 at 9:37 PM Fan Liya <li...@gmail.com> wrote:
> > >
> > > Hi Michael,
> > >
> > > Thanks for sharing the great work.
> > > I believe it is important work for both communities.
> > >
> > > Best,
> > > Liya Fan
> > >
> > >
> > > On Thu, Apr 8, 2021 at 3:30 AM Michael Mior <mm...@apache.org> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I wanted to share some work one of my (now former) students, Karshit
> > > > Shah, has done with integrating Apache Arrow into Calcite. Karshit has
> > > > written an Arrow adapter that's able to perform filtering and
> > > > projections natively on Arrow data using Gandiva so these expressions
> > > > can be JITed using LLVM. The pull request[0] needs some cleanup, but
> > > > the code is in relatively good shape.
> > > >
> > > > Right now, the adapter only reads from files, but I think there are a
> > > > number of exciting extensions to this that are possible. For example,
> > > > Arrow has a client-server framework Flight which could be connected
> > > > with Calcite, perhaps via Avatica. (Andy Grove was doing some work on
> > > > this last year[1] although I'm not sure of the progress.)
> > > >
> > > > The biggest blocker on this is actually not the Calcite code, but the
> > > > availability of a suitably built Arrow dependency with Gandiva along
> > > > with the appropriate CI configuration. I opened a JIRA on the Arrow
> > > > project with some more details[2].
> > > >
> > > > I'd love some thoughts on the approach and some help in pushing this
> > > > over the finish line.
> > > >
> > > > [0] https://github.com/apache/calcite/pull/2133
> > > > [1]
> > > > https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28SCzmid1tycE4mQ@mail.gmail.com%3e
> > > > [2] https://issues.apache.org/jira/browse/ARROW-11135
> > > > --
> > > > Michael Mior
> > > > mmior@apache.org
> > > >

Re: Apache Arrow adapter

Posted by Michael Mior <mm...@apache.org>.
Thanks Julian! I really appreciate the help. I think beta would be
accurate here but it would be great to have this pushed so people can
start trying it out.

--
Michael Mior
mmior@apache.org

Le ven. 9 avr. 2021 à 20:37, Julian Hyde <jh...@apache.org> a écrit :
>
> Yes, thanks to Michael and Karshit for their great work.
>
> I am reviewing now, and doing some fix up (e.g. lint, repositories) so
> that we could get it into master as a "beta" component. I'll add
> updates in https://issues.apache.org/jira/browse/CALCITE-2040.
>
> On Wed, Apr 7, 2021 at 9:37 PM Fan Liya <li...@gmail.com> wrote:
> >
> > Hi Michael,
> >
> > Thanks for sharing the great work.
> > I believe it is important work for both communities.
> >
> > Best,
> > Liya Fan
> >
> >
> > On Thu, Apr 8, 2021 at 3:30 AM Michael Mior <mm...@apache.org> wrote:
> >
> > > Hi all,
> > >
> > > I wanted to share some work one of my (now former) students, Karshit
> > > Shah, has done with integrating Apache Arrow into Calcite. Karshit has
> > > written an Arrow adapter that's able to perform filtering and
> > > projections natively on Arrow data using Gandiva so these expressions
> > > can be JITed using LLVM. The pull request[0] needs some cleanup, but
> > > the code is in relatively good shape.
> > >
> > > Right now, the adapter only reads from files, but I think there are a
> > > number of exciting extensions to this that are possible. For example,
> > > Arrow has a client-server framework Flight which could be connected
> > > with Calcite, perhaps via Avatica. (Andy Grove was doing some work on
> > > this last year[1] although I'm not sure of the progress.)
> > >
> > > The biggest blocker on this is actually not the Calcite code, but the
> > > availability of a suitably built Arrow dependency with Gandiva along
> > > with the appropriate CI configuration. I opened a JIRA on the Arrow
> > > project with some more details[2].
> > >
> > > I'd love some thoughts on the approach and some help in pushing this
> > > over the finish line.
> > >
> > > [0] https://github.com/apache/calcite/pull/2133
> > > [1]
> > > https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28SCzmid1tycE4mQ@mail.gmail.com%3e
> > > [2] https://issues.apache.org/jira/browse/ARROW-11135
> > > --
> > > Michael Mior
> > > mmior@apache.org
> > >

Re: Apache Arrow adapter

Posted by Julian Hyde <jh...@apache.org>.
Yes, thanks to Michael and Karshit for their great work.

I am reviewing now, and doing some fix up (e.g. lint, repositories) so
that we could get it into master as a "beta" component. I'll add
updates in https://issues.apache.org/jira/browse/CALCITE-2040.

On Wed, Apr 7, 2021 at 9:37 PM Fan Liya <li...@gmail.com> wrote:
>
> Hi Michael,
>
> Thanks for sharing the great work.
> I believe it is important work for both communities.
>
> Best,
> Liya Fan
>
>
> On Thu, Apr 8, 2021 at 3:30 AM Michael Mior <mm...@apache.org> wrote:
>
> > Hi all,
> >
> > I wanted to share some work one of my (now former) students, Karshit
> > Shah, has done with integrating Apache Arrow into Calcite. Karshit has
> > written an Arrow adapter that's able to perform filtering and
> > projections natively on Arrow data using Gandiva so these expressions
> > can be JITed using LLVM. The pull request[0] needs some cleanup, but
> > the code is in relatively good shape.
> >
> > Right now, the adapter only reads from files, but I think there are a
> > number of exciting extensions to this that are possible. For example,
> > Arrow has a client-server framework Flight which could be connected
> > with Calcite, perhaps via Avatica. (Andy Grove was doing some work on
> > this last year[1] although I'm not sure of the progress.)
> >
> > The biggest blocker on this is actually not the Calcite code, but the
> > availability of a suitably built Arrow dependency with Gandiva along
> > with the appropriate CI configuration. I opened a JIRA on the Arrow
> > project with some more details[2].
> >
> > I'd love some thoughts on the approach and some help in pushing this
> > over the finish line.
> >
> > [0] https://github.com/apache/calcite/pull/2133
> > [1]
> > https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28SCzmid1tycE4mQ@mail.gmail.com%3e
> > [2] https://issues.apache.org/jira/browse/ARROW-11135
> > --
> > Michael Mior
> > mmior@apache.org
> >

Re: Apache Arrow adapter

Posted by Fan Liya <li...@gmail.com>.
Hi Michael,

Thanks for sharing the great work.
I believe it is important work for both communities.

Best,
Liya Fan


On Thu, Apr 8, 2021 at 3:30 AM Michael Mior <mm...@apache.org> wrote:

> Hi all,
>
> I wanted to share some work one of my (now former) students, Karshit
> Shah, has done with integrating Apache Arrow into Calcite. Karshit has
> written an Arrow adapter that's able to perform filtering and
> projections natively on Arrow data using Gandiva so these expressions
> can be JITed using LLVM. The pull request[0] needs some cleanup, but
> the code is in relatively good shape.
>
> Right now, the adapter only reads from files, but I think there are a
> number of exciting extensions to this that are possible. For example,
> Arrow has a client-server framework Flight which could be connected
> with Calcite, perhaps via Avatica. (Andy Grove was doing some work on
> this last year[1] although I'm not sure of the progress.)
>
> The biggest blocker on this is actually not the Calcite code, but the
> availability of a suitably built Arrow dependency with Gandiva along
> with the appropriate CI configuration. I opened a JIRA on the Arrow
> project with some more details[2].
>
> I'd love some thoughts on the approach and some help in pushing this
> over the finish line.
>
> [0] https://github.com/apache/calcite/pull/2133
> [1]
> https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28SCzmid1tycE4mQ@mail.gmail.com%3e
> [2] https://issues.apache.org/jira/browse/ARROW-11135
> --
> Michael Mior
> mmior@apache.org
>