You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Yibo Cai <yi...@arm.com> on 2019/11/01 05:02:24 UTC

Re: questions about Gandiva

Thanks Wes. Arrow is a very exciting project.
I'm from Arm. We are interested in arrow and would like to study and help improving arrow.

Yibo

On 11/1/19 1:25 AM, Wes McKinney wrote:
> hi
> 
> On Thu, Oct 31, 2019 at 12:11 AM Yibo Cai <yi...@arm.com> wrote:
>>
>> Hi,
>>
>> Arrow cpp integrates Gandiva to provide low level operations on arrow buffers. [1][2]
>> I have some questions, any help is appreciated:
>> - Arrow cpp already has a compute kernel[3], does it duplicate what Gandiva provides? I see a Jira talk about it.[4]
> 
> No. There are some cases of functional overlap but we are servicing a
> spectrum of use cases beyond the scope of Gandiva. Additionally, it is
> unclear to me that an LLVM JIT compilation step should be required to
> evaluate simple expressions such as "a > 5" -- in addition to
> introducing latency (due to the compilation step) it is also a heavy
> dependency to require the LLVM runtime in all applications.
> 
> Personally I'm interested in supporting a wide gamut of analytics
> workloads, from data frame / data science type libraries to SQL-like
> systems. Gandiva is designed for the needs of a SQL-based execution
> engine where chunks of data are fed into Projection or Filter nodes in
> a computation graph -- Gandiva generates a specialized kernel to
> perform a unit of work inside those nodes. Realistically, I expect
> many real world applications will contain a mixture of pre-compiled
> analytic kernels and JIT-compiled kernels.
> 
> Rome wasn't built in a day, so I'm expecting several years of work
> ahead of us at the present rate. We need more help in this domain.
> 
>> - Is Gandiva only for arrow cpp? What about other languages(go, rust, ...)?
> 
> It's being used in Java via JNI. The same approach could be applied
> for the other languages as they have their own C FFI mechanisms.
> 
>> - Gandiva leverages SIMD for vectorized operations[1], but I didn't see any related code. Am I missing something?
> 
> My understanding is that LLVM inserts many SIMD instructions
> automatically based on the host CPU architecture version. Gandiva
> developers may have some comments / pointers about this
> 
>>
>> [1] https://www.dremio.com/announcing-gandiva-initiative-for-apache-arrow/
>> [2] https://github.com/apache/arrow/tree/master/cpp/src/gandiva
>> [3] https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute
>> [4] https://issues.apache.org/jira/browse/ARROW-7017
>>
>> Thanks,
>> Yibo

Re: questions about Gandiva

Posted by Ravindra Pindikura <ra...@dremio.com>.
On Fri, Nov 1, 2019 at 10:41 AM Yibo Cai <yi...@arm.com> wrote:

> Thanks Wes. Arrow is a very exciting project.
> I'm from Arm. We are interested in arrow and would like to study and help
> improving arrow.
>

If you are familiar with LLVM/JIT, you could help us with improving the
optimisation passes in gandiva (tweaking existing ones or adding new ones
or any other tricks ..)


>
> Yibo
>
> On 11/1/19 1:25 AM, Wes McKinney wrote:
> > hi
> >
> > On Thu, Oct 31, 2019 at 12:11 AM Yibo Cai <yi...@arm.com> wrote:
> >>
> >> Hi,
> >>
> >> Arrow cpp integrates Gandiva to provide low level operations on arrow
> buffers. [1][2]
> >> I have some questions, any help is appreciated:
> >> - Arrow cpp already has a compute kernel[3], does it duplicate what
> Gandiva provides? I see a Jira talk about it.[4]
> >
> > No. There are some cases of functional overlap but we are servicing a
> > spectrum of use cases beyond the scope of Gandiva. Additionally, it is
> > unclear to me that an LLVM JIT compilation step should be required to
> > evaluate simple expressions such as "a > 5" -- in addition to
> > introducing latency (due to the compilation step) it is also a heavy
> > dependency to require the LLVM runtime in all applications.
> >
> > Personally I'm interested in supporting a wide gamut of analytics
> > workloads, from data frame / data science type libraries to SQL-like
> > systems. Gandiva is designed for the needs of a SQL-based execution
> > engine where chunks of data are fed into Projection or Filter nodes in
> > a computation graph -- Gandiva generates a specialized kernel to
> > perform a unit of work inside those nodes. Realistically, I expect
> > many real world applications will contain a mixture of pre-compiled
> > analytic kernels and JIT-compiled kernels.
> >
> > Rome wasn't built in a day, so I'm expecting several years of work
> > ahead of us at the present rate. We need more help in this domain.
> >
> >> - Is Gandiva only for arrow cpp? What about other languages(go, rust,
> ...)?
> >
> > It's being used in Java via JNI. The same approach could be applied
> > for the other languages as they have their own C FFI mechanisms.
> >
> >> - Gandiva leverages SIMD for vectorized operations[1], but I didn't see
> any related code. Am I missing something?
> >
> > My understanding is that LLVM inserts many SIMD instructions
> > automatically based on the host CPU architecture version. Gandiva
> > developers may have some comments / pointers about this
> >
> >>
> >> [1]
> https://www.dremio.com/announcing-gandiva-initiative-for-apache-arrow/
> >> [2] https://github.com/apache/arrow/tree/master/cpp/src/gandiva
> >> [3] https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute
> >> [4] https://issues.apache.org/jira/browse/ARROW-7017
> >>
> >> Thanks,
> >> Yibo
>


-- 
Thanks and regards,
Ravindra.