You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2018/06/22 18:20:16 UTC

Gandiva

There is a discussion on dev@arrow about Gandiva, a kernel for Arrow[1].

I think it would be an interesting library on which to build our Arrow engine. (Without a kernel, Arrow is just a data format, but with Gandiva it becomes an engine upon which we can implement all relational operations, albeit on a multi-threaded single node. Potentially this approach can process each row in a few machine cycles, i.e. billions of records per second. Therefore single-node would be sufficient for many queries.)

Masayuki Takahashi has started to develop an Arrow adapter for Calcite[2], but a lot of work remains to implement all SQL built-in functions and basic relational operators. Building on top of Gandiva we could save a lot of this effort.

Julian

[1] https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E <https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E>

[2] https://issues.apache.org/jira/browse/CALCITE-2173 <https://issues.apache.org/jira/browse/CALCITE-2173> 

Re: Gandiva

Posted by Masayuki Takahashi <ma...@gmail.com>.
Sorry for very late reply.

I am watching Gandiva repository in github. "Filter" has just been
implemented few days ago. But "Aggregation" seems still to be not.

After "Aggregation" has been implemented, I want to start to implement
arrow adapter using Gandiva.

Currently, I think the Filter of Arrow Adapter implementation as below:

1. From the condition of Calcite Filter, extracting the field type and
comparation operator.
2. Using them, Arrow Filter generate Java code that calling Gandiva
API to generate llvm code.
3. From these results of executions, extracting Selection Vector to
Java side  generated by Gandiva side.

Finally, I want to remove ArrowRexToLixTranslator in Arrow Adapter.

thanks.
2018年7月1日(日) 6:35 Walaa Eldin Moustafa <wa...@gmail.com>:
>
> Hi Julian and Masayuki,
>
> This indeed sounds quite important. Masayuki, thanks for taking the
> initiative. I would like to do I what I can to help. I can help with
> writing some of the operators, UDFs/UDF APIs, and integration with Calcite.
>
> Thanks,
> Walaa.
>
>
> On Fri, Jun 29, 2018 at 11:40 AM Julian Hyde <jh...@apache.org> wrote:
>
> > We already have two JIRA cases for Arrow integration:
> > https://issues.apache.org/jira/browse/CALCITE-2040 and
> > https://issues.apache.org/jira/browse/CALCITE-2173.
> >
> > I think this is an extremely important area of work for the Calcite
> > project, because it helps us realize the vision of a deconstructed
> > database[1]. There is a lot of work to do, much of it very interesting
> > (e.g. writing a thread scheduler, IPC mechanisms, and algorithms for
> > sort, join and aggregation that work effectively on Arrow data
> > structures).
> >
> > If you want to help Masayuki, please step up!
> >
> > Julian
> >
> > [1]
> > https://www.slideshare.net/julienledem/from-flat-files-to-deconstructed-database
> >
> > On Thu, Jun 28, 2018 at 2:24 PM, Michael Mior <mm...@apache.org> wrote:
> > > That's great! If you could create a JIRA case to track your progress,
> > that
> > > would be helpful for others who might want to follow along or contribute.
> > > Thanks!
> > >
> > > --
> > > Michael Mior
> > > mmior@apache.org
> > >
> > >
> > >
> > > Le mar. 26 juin 2018 à 10:36, Masayuki Takahashi <ma...@gmail.com>
> > a
> > > écrit :
> > >
> > >> Hi Julian,
> > >>
> > >> > Masayuki Takahashi has started to develop an Arrow adapter for
> > >> Calcite[2], but a lot of work remains to implement all SQL built-in
> > >> functions and basic relational operators. Building on top of Gandiva we
> > >> could save a lot of this effort.
> > >>
> > >> I will start to build Gandiva development environment and try to
> > >> consider a way to incorporate.
> > >>
> > >> thanks.
> > >>
> > >>
> > >>
> > >> 2018年6月23日(土) 3:54 Julian Hyde <jh...@apache.org>:
> > >> >
> > >> > Suppose a company wishes to build a graph database using their own
> > >> innovative graph index data structure. They nevertheless need to
> > implement
> > >> core relational algebra, core data types, and core built-in functions
> > (+,
> > >> CASE, SUM, SUBSTRING). And they want to implement these on a
> > >> memory-efficient data structure (tens of thousands of rows, stored
> > >> column-oriented, per memory block). This is a massive effort.
> > >> >
> > >> > With Calcite+Gandiva+Arrow they just need to create a sequence of
> > >> relational operators (using RelBuilder, say) and efficient machine code
> > is
> > >> generated. They can then start adding their own data types, built-in
> > >> functions, and relational operators, using the same architecture.
> > >> >
> > >> > Julian
> > >> >
> > >> >
> > >> > > On Jun 22, 2018, at 11:33 AM, Xiening Dai <xn...@live.com>
> > wrote:
> > >> > >
> > >> > > I was in a talk regarding Gandiva yesterday. Impressive work!
> > >> > >
> > >> > > But I am not sure why Calcite would like to integrate with it. To me
> > >> Gandiva is on execution side, in which scenarios a query planner would
> > need
> > >> a arrow engine? I read the original Jira about implementing file
> > >> enumerator, but the intent is still not clear to me. Would appreciate if
> > >> you can elaborate. Thanks.
> > >> > >
> > >> > >
> > >> > >> On Jun 22, 2018, at 11:20 AM, Julian Hyde <jh...@apache.org>
> > wrote:
> > >> > >>
> > >> > >> There is a discussion on dev@arrow about Gandiva, a kernel for
> > >> Arrow[1].
> > >> > >>
> > >> > >> I think it would be an interesting library on which to build our
> > >> Arrow engine. (Without a kernel, Arrow is just a data format, but with
> > >> Gandiva it becomes an engine upon which we can implement all relational
> > >> operations, albeit on a multi-threaded single node. Potentially this
> > >> approach can process each row in a few machine cycles, i.e. billions of
> > >> records per second. Therefore single-node would be sufficient for many
> > >> queries.)
> > >> > >>
> > >> > >> Masayuki Takahashi has started to develop an Arrow adapter for
> > >> Calcite[2], but a lot of work remains to implement all SQL built-in
> > >> functions and basic relational operators. Building on top of Gandiva we
> > >> could save a lot of this effort.
> > >> > >>
> > >> > >> Julian
> > >> > >>
> > >> > >> [1]
> > >>
> > https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E
> > >> <
> > >>
> > https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E
> > >> >
> > >> > >>
> > >> > >> [2] https://issues.apache.org/jira/browse/CALCITE-2173 <
> > >> https://issues.apache.org/jira/browse/CALCITE-2173>
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> 高橋 真之
> > >>
> >



-- 
高橋 真之

Re: Gandiva

Posted by Walaa Eldin Moustafa <wa...@gmail.com>.
Hi Julian and Masayuki,

This indeed sounds quite important. Masayuki, thanks for taking the
initiative. I would like to do I what I can to help. I can help with
writing some of the operators, UDFs/UDF APIs, and integration with Calcite.

Thanks,
Walaa.


On Fri, Jun 29, 2018 at 11:40 AM Julian Hyde <jh...@apache.org> wrote:

> We already have two JIRA cases for Arrow integration:
> https://issues.apache.org/jira/browse/CALCITE-2040 and
> https://issues.apache.org/jira/browse/CALCITE-2173.
>
> I think this is an extremely important area of work for the Calcite
> project, because it helps us realize the vision of a deconstructed
> database[1]. There is a lot of work to do, much of it very interesting
> (e.g. writing a thread scheduler, IPC mechanisms, and algorithms for
> sort, join and aggregation that work effectively on Arrow data
> structures).
>
> If you want to help Masayuki, please step up!
>
> Julian
>
> [1]
> https://www.slideshare.net/julienledem/from-flat-files-to-deconstructed-database
>
> On Thu, Jun 28, 2018 at 2:24 PM, Michael Mior <mm...@apache.org> wrote:
> > That's great! If you could create a JIRA case to track your progress,
> that
> > would be helpful for others who might want to follow along or contribute.
> > Thanks!
> >
> > --
> > Michael Mior
> > mmior@apache.org
> >
> >
> >
> > Le mar. 26 juin 2018 à 10:36, Masayuki Takahashi <ma...@gmail.com>
> a
> > écrit :
> >
> >> Hi Julian,
> >>
> >> > Masayuki Takahashi has started to develop an Arrow adapter for
> >> Calcite[2], but a lot of work remains to implement all SQL built-in
> >> functions and basic relational operators. Building on top of Gandiva we
> >> could save a lot of this effort.
> >>
> >> I will start to build Gandiva development environment and try to
> >> consider a way to incorporate.
> >>
> >> thanks.
> >>
> >>
> >>
> >> 2018年6月23日(土) 3:54 Julian Hyde <jh...@apache.org>:
> >> >
> >> > Suppose a company wishes to build a graph database using their own
> >> innovative graph index data structure. They nevertheless need to
> implement
> >> core relational algebra, core data types, and core built-in functions
> (+,
> >> CASE, SUM, SUBSTRING). And they want to implement these on a
> >> memory-efficient data structure (tens of thousands of rows, stored
> >> column-oriented, per memory block). This is a massive effort.
> >> >
> >> > With Calcite+Gandiva+Arrow they just need to create a sequence of
> >> relational operators (using RelBuilder, say) and efficient machine code
> is
> >> generated. They can then start adding their own data types, built-in
> >> functions, and relational operators, using the same architecture.
> >> >
> >> > Julian
> >> >
> >> >
> >> > > On Jun 22, 2018, at 11:33 AM, Xiening Dai <xn...@live.com>
> wrote:
> >> > >
> >> > > I was in a talk regarding Gandiva yesterday. Impressive work!
> >> > >
> >> > > But I am not sure why Calcite would like to integrate with it. To me
> >> Gandiva is on execution side, in which scenarios a query planner would
> need
> >> a arrow engine? I read the original Jira about implementing file
> >> enumerator, but the intent is still not clear to me. Would appreciate if
> >> you can elaborate. Thanks.
> >> > >
> >> > >
> >> > >> On Jun 22, 2018, at 11:20 AM, Julian Hyde <jh...@apache.org>
> wrote:
> >> > >>
> >> > >> There is a discussion on dev@arrow about Gandiva, a kernel for
> >> Arrow[1].
> >> > >>
> >> > >> I think it would be an interesting library on which to build our
> >> Arrow engine. (Without a kernel, Arrow is just a data format, but with
> >> Gandiva it becomes an engine upon which we can implement all relational
> >> operations, albeit on a multi-threaded single node. Potentially this
> >> approach can process each row in a few machine cycles, i.e. billions of
> >> records per second. Therefore single-node would be sufficient for many
> >> queries.)
> >> > >>
> >> > >> Masayuki Takahashi has started to develop an Arrow adapter for
> >> Calcite[2], but a lot of work remains to implement all SQL built-in
> >> functions and basic relational operators. Building on top of Gandiva we
> >> could save a lot of this effort.
> >> > >>
> >> > >> Julian
> >> > >>
> >> > >> [1]
> >>
> https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E
> >> <
> >>
> https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E
> >> >
> >> > >>
> >> > >> [2] https://issues.apache.org/jira/browse/CALCITE-2173 <
> >> https://issues.apache.org/jira/browse/CALCITE-2173>
> >> > >
> >> >
> >>
> >>
> >> --
> >> 高橋 真之
> >>
>

Re: Gandiva

Posted by Julian Hyde <jh...@apache.org>.
We already have two JIRA cases for Arrow integration:
https://issues.apache.org/jira/browse/CALCITE-2040 and
https://issues.apache.org/jira/browse/CALCITE-2173.

I think this is an extremely important area of work for the Calcite
project, because it helps us realize the vision of a deconstructed
database[1]. There is a lot of work to do, much of it very interesting
(e.g. writing a thread scheduler, IPC mechanisms, and algorithms for
sort, join and aggregation that work effectively on Arrow data
structures).

If you want to help Masayuki, please step up!

Julian

[1] https://www.slideshare.net/julienledem/from-flat-files-to-deconstructed-database

On Thu, Jun 28, 2018 at 2:24 PM, Michael Mior <mm...@apache.org> wrote:
> That's great! If you could create a JIRA case to track your progress, that
> would be helpful for others who might want to follow along or contribute.
> Thanks!
>
> --
> Michael Mior
> mmior@apache.org
>
>
>
> Le mar. 26 juin 2018 à 10:36, Masayuki Takahashi <ma...@gmail.com> a
> écrit :
>
>> Hi Julian,
>>
>> > Masayuki Takahashi has started to develop an Arrow adapter for
>> Calcite[2], but a lot of work remains to implement all SQL built-in
>> functions and basic relational operators. Building on top of Gandiva we
>> could save a lot of this effort.
>>
>> I will start to build Gandiva development environment and try to
>> consider a way to incorporate.
>>
>> thanks.
>>
>>
>>
>> 2018年6月23日(土) 3:54 Julian Hyde <jh...@apache.org>:
>> >
>> > Suppose a company wishes to build a graph database using their own
>> innovative graph index data structure. They nevertheless need to implement
>> core relational algebra, core data types, and core built-in functions (+,
>> CASE, SUM, SUBSTRING). And they want to implement these on a
>> memory-efficient data structure (tens of thousands of rows, stored
>> column-oriented, per memory block). This is a massive effort.
>> >
>> > With Calcite+Gandiva+Arrow they just need to create a sequence of
>> relational operators (using RelBuilder, say) and efficient machine code is
>> generated. They can then start adding their own data types, built-in
>> functions, and relational operators, using the same architecture.
>> >
>> > Julian
>> >
>> >
>> > > On Jun 22, 2018, at 11:33 AM, Xiening Dai <xn...@live.com> wrote:
>> > >
>> > > I was in a talk regarding Gandiva yesterday. Impressive work!
>> > >
>> > > But I am not sure why Calcite would like to integrate with it. To me
>> Gandiva is on execution side, in which scenarios a query planner would need
>> a arrow engine? I read the original Jira about implementing file
>> enumerator, but the intent is still not clear to me. Would appreciate if
>> you can elaborate. Thanks.
>> > >
>> > >
>> > >> On Jun 22, 2018, at 11:20 AM, Julian Hyde <jh...@apache.org> wrote:
>> > >>
>> > >> There is a discussion on dev@arrow about Gandiva, a kernel for
>> Arrow[1].
>> > >>
>> > >> I think it would be an interesting library on which to build our
>> Arrow engine. (Without a kernel, Arrow is just a data format, but with
>> Gandiva it becomes an engine upon which we can implement all relational
>> operations, albeit on a multi-threaded single node. Potentially this
>> approach can process each row in a few machine cycles, i.e. billions of
>> records per second. Therefore single-node would be sufficient for many
>> queries.)
>> > >>
>> > >> Masayuki Takahashi has started to develop an Arrow adapter for
>> Calcite[2], but a lot of work remains to implement all SQL built-in
>> functions and basic relational operators. Building on top of Gandiva we
>> could save a lot of this effort.
>> > >>
>> > >> Julian
>> > >>
>> > >> [1]
>> https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E
>> <
>> https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E
>> >
>> > >>
>> > >> [2] https://issues.apache.org/jira/browse/CALCITE-2173 <
>> https://issues.apache.org/jira/browse/CALCITE-2173>
>> > >
>> >
>>
>>
>> --
>> 高橋 真之
>>

Re: Gandiva

Posted by Michael Mior <mm...@apache.org>.
That's great! If you could create a JIRA case to track your progress, that
would be helpful for others who might want to follow along or contribute.
Thanks!

--
Michael Mior
mmior@apache.org



Le mar. 26 juin 2018 à 10:36, Masayuki Takahashi <ma...@gmail.com> a
écrit :

> Hi Julian,
>
> > Masayuki Takahashi has started to develop an Arrow adapter for
> Calcite[2], but a lot of work remains to implement all SQL built-in
> functions and basic relational operators. Building on top of Gandiva we
> could save a lot of this effort.
>
> I will start to build Gandiva development environment and try to
> consider a way to incorporate.
>
> thanks.
>
>
>
> 2018年6月23日(土) 3:54 Julian Hyde <jh...@apache.org>:
> >
> > Suppose a company wishes to build a graph database using their own
> innovative graph index data structure. They nevertheless need to implement
> core relational algebra, core data types, and core built-in functions (+,
> CASE, SUM, SUBSTRING). And they want to implement these on a
> memory-efficient data structure (tens of thousands of rows, stored
> column-oriented, per memory block). This is a massive effort.
> >
> > With Calcite+Gandiva+Arrow they just need to create a sequence of
> relational operators (using RelBuilder, say) and efficient machine code is
> generated. They can then start adding their own data types, built-in
> functions, and relational operators, using the same architecture.
> >
> > Julian
> >
> >
> > > On Jun 22, 2018, at 11:33 AM, Xiening Dai <xn...@live.com> wrote:
> > >
> > > I was in a talk regarding Gandiva yesterday. Impressive work!
> > >
> > > But I am not sure why Calcite would like to integrate with it. To me
> Gandiva is on execution side, in which scenarios a query planner would need
> a arrow engine? I read the original Jira about implementing file
> enumerator, but the intent is still not clear to me. Would appreciate if
> you can elaborate. Thanks.
> > >
> > >
> > >> On Jun 22, 2018, at 11:20 AM, Julian Hyde <jh...@apache.org> wrote:
> > >>
> > >> There is a discussion on dev@arrow about Gandiva, a kernel for
> Arrow[1].
> > >>
> > >> I think it would be an interesting library on which to build our
> Arrow engine. (Without a kernel, Arrow is just a data format, but with
> Gandiva it becomes an engine upon which we can implement all relational
> operations, albeit on a multi-threaded single node. Potentially this
> approach can process each row in a few machine cycles, i.e. billions of
> records per second. Therefore single-node would be sufficient for many
> queries.)
> > >>
> > >> Masayuki Takahashi has started to develop an Arrow adapter for
> Calcite[2], but a lot of work remains to implement all SQL built-in
> functions and basic relational operators. Building on top of Gandiva we
> could save a lot of this effort.
> > >>
> > >> Julian
> > >>
> > >> [1]
> https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E
> <
> https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E
> >
> > >>
> > >> [2] https://issues.apache.org/jira/browse/CALCITE-2173 <
> https://issues.apache.org/jira/browse/CALCITE-2173>
> > >
> >
>
>
> --
> 高橋 真之
>

Re: Gandiva

Posted by Masayuki Takahashi <ma...@gmail.com>.
Hi Julian,

> Masayuki Takahashi has started to develop an Arrow adapter for Calcite[2], but a lot of work remains to implement all SQL built-in functions and basic relational operators. Building on top of Gandiva we could save a lot of this effort.

I will start to build Gandiva development environment and try to
consider a way to incorporate.

thanks.



2018年6月23日(土) 3:54 Julian Hyde <jh...@apache.org>:
>
> Suppose a company wishes to build a graph database using their own innovative graph index data structure. They nevertheless need to implement core relational algebra, core data types, and core built-in functions (+, CASE, SUM, SUBSTRING). And they want to implement these on a memory-efficient data structure (tens of thousands of rows, stored column-oriented, per memory block). This is a massive effort.
>
> With Calcite+Gandiva+Arrow they just need to create a sequence of relational operators (using RelBuilder, say) and efficient machine code is generated. They can then start adding their own data types, built-in functions, and relational operators, using the same architecture.
>
> Julian
>
>
> > On Jun 22, 2018, at 11:33 AM, Xiening Dai <xn...@live.com> wrote:
> >
> > I was in a talk regarding Gandiva yesterday. Impressive work!
> >
> > But I am not sure why Calcite would like to integrate with it. To me Gandiva is on execution side, in which scenarios a query planner would need a arrow engine? I read the original Jira about implementing file enumerator, but the intent is still not clear to me. Would appreciate if you can elaborate. Thanks.
> >
> >
> >> On Jun 22, 2018, at 11:20 AM, Julian Hyde <jh...@apache.org> wrote:
> >>
> >> There is a discussion on dev@arrow about Gandiva, a kernel for Arrow[1].
> >>
> >> I think it would be an interesting library on which to build our Arrow engine. (Without a kernel, Arrow is just a data format, but with Gandiva it becomes an engine upon which we can implement all relational operations, albeit on a multi-threaded single node. Potentially this approach can process each row in a few machine cycles, i.e. billions of records per second. Therefore single-node would be sufficient for many queries.)
> >>
> >> Masayuki Takahashi has started to develop an Arrow adapter for Calcite[2], but a lot of work remains to implement all SQL built-in functions and basic relational operators. Building on top of Gandiva we could save a lot of this effort.
> >>
> >> Julian
> >>
> >> [1] https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E <https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E>
> >>
> >> [2] https://issues.apache.org/jira/browse/CALCITE-2173 <https://issues.apache.org/jira/browse/CALCITE-2173>
> >
>


-- 
高橋 真之

Re: Gandiva

Posted by Julian Hyde <jh...@apache.org>.
Suppose a company wishes to build a graph database using their own innovative graph index data structure. They nevertheless need to implement core relational algebra, core data types, and core built-in functions (+, CASE, SUM, SUBSTRING). And they want to implement these on a memory-efficient data structure (tens of thousands of rows, stored column-oriented, per memory block). This is a massive effort.

With Calcite+Gandiva+Arrow they just need to create a sequence of relational operators (using RelBuilder, say) and efficient machine code is generated. They can then start adding their own data types, built-in functions, and relational operators, using the same architecture.

Julian


> On Jun 22, 2018, at 11:33 AM, Xiening Dai <xn...@live.com> wrote:
> 
> I was in a talk regarding Gandiva yesterday. Impressive work!
> 
> But I am not sure why Calcite would like to integrate with it. To me Gandiva is on execution side, in which scenarios a query planner would need a arrow engine? I read the original Jira about implementing file enumerator, but the intent is still not clear to me. Would appreciate if you can elaborate. Thanks.
> 
> 
>> On Jun 22, 2018, at 11:20 AM, Julian Hyde <jh...@apache.org> wrote:
>> 
>> There is a discussion on dev@arrow about Gandiva, a kernel for Arrow[1].
>> 
>> I think it would be an interesting library on which to build our Arrow engine. (Without a kernel, Arrow is just a data format, but with Gandiva it becomes an engine upon which we can implement all relational operations, albeit on a multi-threaded single node. Potentially this approach can process each row in a few machine cycles, i.e. billions of records per second. Therefore single-node would be sufficient for many queries.)
>> 
>> Masayuki Takahashi has started to develop an Arrow adapter for Calcite[2], but a lot of work remains to implement all SQL built-in functions and basic relational operators. Building on top of Gandiva we could save a lot of this effort.
>> 
>> Julian
>> 
>> [1] https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E <https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E>
>> 
>> [2] https://issues.apache.org/jira/browse/CALCITE-2173 <https://issues.apache.org/jira/browse/CALCITE-2173>
> 


Re: Gandiva

Posted by Xiening Dai <xn...@live.com>.
I was in a talk regarding Gandiva yesterday. Impressive work!

But I am not sure why Calcite would like to integrate with it. To me Gandiva is on execution side, in which scenarios a query planner would need a arrow engine? I read the original Jira about implementing file enumerator, but the intent is still not clear to me. Would appreciate if you can elaborate. Thanks.


> On Jun 22, 2018, at 11:20 AM, Julian Hyde <jh...@apache.org> wrote:
> 
> There is a discussion on dev@arrow about Gandiva, a kernel for Arrow[1].
> 
> I think it would be an interesting library on which to build our Arrow engine. (Without a kernel, Arrow is just a data format, but with Gandiva it becomes an engine upon which we can implement all relational operations, albeit on a multi-threaded single node. Potentially this approach can process each row in a few machine cycles, i.e. billions of records per second. Therefore single-node would be sufficient for many queries.)
> 
> Masayuki Takahashi has started to develop an Arrow adapter for Calcite[2], but a lot of work remains to implement all SQL built-in functions and basic relational operators. Building on top of Gandiva we could save a lot of this effort.
> 
> Julian
> 
> [1] https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E <https://lists.apache.org/thread.html/f099b3d1e2aaf9803c5c756f872a594baf17e9f25974e3496c9706d9@%3Cdev.arrow.apache.org%3E>
> 
> [2] https://issues.apache.org/jira/browse/CALCITE-2173 <https://issues.apache.org/jira/browse/CALCITE-2173>