You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2021/02/05 17:01:11 UTC

Re: Computational Kernels: the project overview

Welcome Aldrin,
This sounds like a very reasonable way to start contributing.

-Micah

On Fri, Jan 29, 2021 at 1:53 PM Aldrin <ak...@ucsc.edu.invalid> wrote:

> Hello!
>
> I am trying to use the expression and compute APIs for query processing,
> and in my searches so far, this thread seems to be the most relevant.
>
> A lot of the operators and functions that I need in the short-term appear
> to be implemented, but the documentation seems sparse or at least not all
> in the same place. The document that Micah linked has been useful, and I've
> been perusing the source, but I was wondering if some initial contributions
> I can make would be to document the designed model and then propose further
> changes or designs afterwards.
>
> Is anyone already putting effort in (or completed) consolidating or
> expanding documentation on the compute and dataset/expression APIs and how
> they interact, etc.?
>
> Thanks!
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>
>
> On Mon, Nov 30, 2020 at 7:40 AM Wes McKinney <we...@gmail.com> wrote:
>
> > One objective of the precompiled kernels project is to have meaningful
> > computational functionality in a package that does not need to include
> > the LLVM runtime -- to require the LLVM dependency even for simple
> > functions would more than double the size of our Python packages, for
> > example.
> >
> > There is currently little code sharing between functions that do
> > identical work in arrow::compute:: versus gandiva:: -- this has been
> > discussed, but it needs a champion to do something about it. When I
> > was working on the new function framework earlier this year, I spent a
> > day or so perusing src/gandiva/precompiled and reasoned it would be a
> > prohibitive amount of refactoring for me to undertake at that time. In
> > principle many of these functions (e.g. string functions) can be
> > incrementally refactored into reusable inline functions / templates
> > for improved code reuse. We could also explore common infrastructure
> > for unit testing and benchmarking. Anything is possible if enough
> > engineering time is invested.
> >
> > I would hope in the future to see a generalized expression API as part
> > of a logical query plan-type system (for query processing) that has
> > the ability to use Gandiva (if it's available) to compile
> > subexpressions for better performance. I had hoped to spend some time
> > on this myself earlier this year, but I've gotten busy with some other
> > things and won't be able to devote much development time to this
> > myself.
> >
> > - Wes
> >
> > On Sun, Nov 29, 2020 at 11:18 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> > >
> > > >
> > > > There are some computations kernels in arrow and it looks that this
> > part is
> > > > in active development right now. I wonder if there is a document /
> some
> > > > emails describing what is the goal and uses cases for this part of
> the
> > code
> > > > base. Would be very interesting to know a bit more and I would like
> to
> > > > contribute at some point.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1LFk3WRfWGQbJ9uitWwucjiJsZMqLh8lC1vAUOscLtj8/edit
> > > talks about some of the goals of the compute module.
> > >
> > > I'm interested because I develop a Proof-of-concept for a declarative
> > > > language to perform statistical computations on top of gandiva.
> > >
> > >
> > > I think upon cursory examination someone (maybe Wes) thought Gandiva
> and
> > > the compute kernels might not play nicely together, but I can't find a
> > > reference to that at the moment.
> > >
> > >
> > > On Sat, Nov 21, 2020 at 3:09 AM Kirill Lykov <ly...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > There are some computations kernels in arrow and it looks that this
> > part is
> > > > in active development right now. I wonder if there is a document /
> some
> > > > emails describing what is the goal and uses cases for this part of
> the
> > code
> > > > base. Would be very interesting to know a bit more and I would like
> to
> > > > contribute at some point.
> > > > I'm interested because I develop a Proof-of-concept for a declarative
> > > > language to perform statistical computations on top of gandiva.
> > > >
> > > > --
> > > > Best regards,
> > > > Kirill Lykov
> > > >
> >
>