You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2020/04/18 21:41:30 UTC

[C++] Revamping approach to Arrow compute kernel development

hi folks,

This e-mail comes in the context of two C++ data processing
subprojects we have discussed in the past

* Data Frame API
https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit
* In-memory Query Engine
https://docs.google.com/document/d/10RoUZmiMQRi_J1FcPeVAUAMJ6d_ZuiEbaM2Y33sNPu4/edit

One important area for both of these projects is the evaluation of
array expressions that are found inside projections (i.e. "SELECT" --
without any aggregate functions), filters (i.e. "WHERE"), and join
clauses. In summary, these are mostly elementwise functions that do
not alter the size of their input arrays. For conciseness I'll call
these kinds of expressions "array-exprs"

Gandiva does LLVM code generation for evaluating array-exprs, but I
think it makes sense to have both codegen'd expressions as well as
X100-style [1] interpreted array-expr evaluation. These modes of query
evaluation ideally should share as much common implementation code as
possible (taking into account Gandiva's need to compile to LLVM IR)

I have reviewed the contents of our arrow/compute/kernels directory
and made the following spreadsheet

https://docs.google.com/spreadsheets/d/1oXVy29GT1apM4e3isNpzk497Re3OqM8APvlBZCDc8B8/edit?usp=sharing

There are some problems with our current collection of kernels in the
context of array-expr evaluation in query processing:

* For efficiency, kernels used for array-expr evaluation should write
into preallocated memory as their default mode. This enables the
interpreter to avoid temporary memory allocations and improve CPU
cache utilization. Almost none of our kernels are implemented this way
currently.
* The current approach for expr-type kernels of having a top-level
memory-allocating function is not scalable for binding developers. I
believe instead that kernels should be selected and invoked
generically by using the string name of the kernel

On this last point, what I am suggesting is that we do something more like

ASSIGN_OR_RAISE(auto kernel, compute::GetKernel("greater", {type0, type1}));
ArrayData* out = ... ;
RETURN_NOT_OK(kernel->Call({arg0, arg1}, &out));

In particular, when we reason that successive kernel invocations can
reuse memory, we can have code that is doing in essence

k1->Call({arg0, arg1}, &out)
k2->Call({out}, &out))
k3->Call({arg2, out}, &out)

Of course, if you want the kernel to allocate memory then you can do
something like

ASSIGN_OR_RAISE(out = kernel->Call({arg0, arg1}));

And if you want to avoid the "GetKernel" business you should be able to do

ASSIGN_OR_RAISE(auto result, ExecKernel(name, {arg0, arg1}));

I think this "ExecKernel" function should likely replace the public
APIs for running specific kernels that we currently have.

One detail that isn't addressed above is what to do with
kernel-specific configuration options. One way to address that is to
have a common base type for all options so that we can do

struct MyKernelOptions : public KernelOptions {
  ...
};

and then

MyKernelOptions options = ...;
out = ExecKernel(name, args, options);

Maybe there are some other ideas.

At some point we will need to have an implementation blitz where we go
from our current 20 or so non-codegen'd kernels for array-exprs to
several hundred, so I wanted to discuss these issues so we can all get
on the same page. I'd like to take a crack at an initial iteration of
the above API proposal with a centralized kernel registry (that will
also need to support having UDFs) so we can begin porting the existing
array-expr kernels to use the new API and then have a clear path
forward for expanding our set of supported functions (which will also
ideally involve sharing implementation code with Gandiva, particularly
its "precompiled" directory)

Thanks,
Wes

[1]: http://cidrdb.org/cidr2005/papers/P19.pdf

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

Hi Sven,

On Mon, Apr 20, 2020 at 11:49 PM Sven Wagner-Boysen
<sv...@signavio.com> wrote:
>
> Hi Wes,
>
> I think reducing temporary memory allocation is a great effort and will
> show great benefit in compute intensive scenarios.
> As we are mainly working with the Rust and Datafusion part of the Arrow
> project I was wondering how we could best align the concepts and
> implementations on that level.
> Is the approach that the compute kernels have to be implemented for every
> language separately as well similar to the format (which is my current
> understanding) or do you see options for re-use and a common/shared
> implementation?
> Happy to get your thoughts on that.

Since we have the C data interface now, it would be relatively easy to
build a bridge for Rust to call functions written in C++ or vice
versa.

However, being a volunteer-driven project, there is only so much that
I personally have agency over, and I intend to do development in C++.
If someone wants to coordinate cross-language code reuse they are more
than welcome to do that, but I do not personally plan to at this time.

> Thanks,
> Sven
>
> On Sun, Apr 19, 2020 at 2:14 AM Wes McKinney <we...@gmail.com> wrote:
>
> > I started a brain dump of some issues that come to mind around kernel
> > implementation and array expression evaluation. I'll try to fill this
> > out, and it would be helpful to add supporting citations to other
> > projects about what kinds of issues come up and what implementation
> > strategies may be helpful. If anyone would like edit access please let
> > me know, otherwise feel free to add comments
> >
> >
> > https://docs.google.com/document/d/1LFk3WRfWGQbJ9uitWwucjiJsZMqLh8lC1vAUOscLtj8/edit?usp=sharing
> >
> > On Sat, Apr 18, 2020 at 4:41 PM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > hi folks,
> > >
> > > This e-mail comes in the context of two C++ data processing
> > > subprojects we have discussed in the past
> > >
> > > * Data Frame API
> > >
> > https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit
> > > * In-memory Query Engine
> > >
> > https://docs.google.com/document/d/10RoUZmiMQRi_J1FcPeVAUAMJ6d_ZuiEbaM2Y33sNPu4/edit
> > >
> > > One important area for both of these projects is the evaluation of
> > > array expressions that are found inside projections (i.e. "SELECT" --
> > > without any aggregate functions), filters (i.e. "WHERE"), and join
> > > clauses. In summary, these are mostly elementwise functions that do
> > > not alter the size of their input arrays. For conciseness I'll call
> > > these kinds of expressions "array-exprs"
> > >
> > > Gandiva does LLVM code generation for evaluating array-exprs, but I
> > > think it makes sense to have both codegen'd expressions as well as
> > > X100-style [1] interpreted array-expr evaluation. These modes of query
> > > evaluation ideally should share as much common implementation code as
> > > possible (taking into account Gandiva's need to compile to LLVM IR)
> > >
> > > I have reviewed the contents of our arrow/compute/kernels directory
> > > and made the following spreadsheet
> > >
> > >
> > https://docs.google.com/spreadsheets/d/1oXVy29GT1apM4e3isNpzk497Re3OqM8APvlBZCDc8B8/edit?usp=sharing
> > >
> > > There are some problems with our current collection of kernels in the
> > > context of array-expr evaluation in query processing:
> > >
> > > * For efficiency, kernels used for array-expr evaluation should write
> > > into preallocated memory as their default mode. This enables the
> > > interpreter to avoid temporary memory allocations and improve CPU
> > > cache utilization. Almost none of our kernels are implemented this way
> > > currently.
> > > * The current approach for expr-type kernels of having a top-level
> > > memory-allocating function is not scalable for binding developers. I
> > > believe instead that kernels should be selected and invoked
> > > generically by using the string name of the kernel
> > >
> > > On this last point, what I am suggesting is that we do something more
> > like
> > >
> > > ASSIGN_OR_RAISE(auto kernel, compute::GetKernel("greater", {type0,
> > type1}));
> > > ArrayData* out = ... ;
> > > RETURN_NOT_OK(kernel->Call({arg0, arg1}, &out));
> > >
> > > In particular, when we reason that successive kernel invocations can
> > > reuse memory, we can have code that is doing in essence
> > >
> > > k1->Call({arg0, arg1}, &out)
> > > k2->Call({out}, &out))
> > > k3->Call({arg2, out}, &out)
> > >
> > > Of course, if you want the kernel to allocate memory then you can do
> > > something like
> > >
> > > ASSIGN_OR_RAISE(out = kernel->Call({arg0, arg1}));
> > >
> > > And if you want to avoid the "GetKernel" business you should be able to
> > do
> > >
> > > ASSIGN_OR_RAISE(auto result, ExecKernel(name, {arg0, arg1}));
> > >
> > > I think this "ExecKernel" function should likely replace the public
> > > APIs for running specific kernels that we currently have.
> > >
> > > One detail that isn't addressed above is what to do with
> > > kernel-specific configuration options. One way to address that is to
> > > have a common base type for all options so that we can do
> > >
> > > struct MyKernelOptions : public KernelOptions {
> > >   ...
> > > };
> > >
> > > and then
> > >
> > > MyKernelOptions options = ...;
> > > out = ExecKernel(name, args, options);
> > >
> > > Maybe there are some other ideas.
> > >
> > > At some point we will need to have an implementation blitz where we go
> > > from our current 20 or so non-codegen'd kernels for array-exprs to
> > > several hundred, so I wanted to discuss these issues so we can all get
> > > on the same page. I'd like to take a crack at an initial iteration of
> > > the above API proposal with a centralized kernel registry (that will
> > > also need to support having UDFs) so we can begin porting the existing
> > > array-expr kernels to use the new API and then have a clear path
> > > forward for expanding our set of supported functions (which will also
> > > ideally involve sharing implementation code with Gandiva, particularly
> > > its "precompiled" directory)
> > >
> > > Thanks,
> > > Wes
> > >
> > > [1]: http://cidrdb.org/cidr2005/papers/P19.pdf
> >
>
>
> --
>
> *Sven Wagner-Boysen* | Lead Software Engineer
>
> T: +49 160 930 70 6 70 | www.signavio.com
> Kurfürstenstraße 111, 10787 Berlin, Germany
>
> [image: https://www.signavio.com/] <https://www.signavio.com/>
> <https://twitter.com/signavio/>   <https://www.facebook.com/signavio/>
> <https://www.linkedin.com/company/signavio/>
> <https://www.xing.com/company/signavio>
> <https://www.youtube.com/user/signavio>
>
> <https://www.signavio.com/live/?utm_campaign=Signavio%20Live%202020&utm_source=email&utm_medium=signature>
> Check my LinkedIn Profile
> <https://www.linkedin.com/in/sven-wagner-boysen-84231046/>
>
> *Review us:*
> Gartner Peer Insights
> <https://www.gartner.com/reviews/market/enterprise-business-process-analysis/vendor/Signavio/product/signavio-process-manager>
>
> HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123
> Managing Directors: Dr. Gero Decker, Daniel Rosenthal

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Sven Wagner-Boysen <sv...@signavio.com>.

Hi Wes,

I think reducing temporary memory allocation is a great effort and will
show great benefit in compute intensive scenarios.
As we are mainly working with the Rust and Datafusion part of the Arrow
project I was wondering how we could best align the concepts and
implementations on that level.
Is the approach that the compute kernels have to be implemented for every
language separately as well similar to the format (which is my current
understanding) or do you see options for re-use and a common/shared
implementation?
Happy to get your thoughts on that.

Thanks,
Sven

On Sun, Apr 19, 2020 at 2:14 AM Wes McKinney <we...@gmail.com> wrote:

> I started a brain dump of some issues that come to mind around kernel
> implementation and array expression evaluation. I'll try to fill this
> out, and it would be helpful to add supporting citations to other
> projects about what kinds of issues come up and what implementation
> strategies may be helpful. If anyone would like edit access please let
> me know, otherwise feel free to add comments
>
>
> https://docs.google.com/document/d/1LFk3WRfWGQbJ9uitWwucjiJsZMqLh8lC1vAUOscLtj8/edit?usp=sharing
>
> On Sat, Apr 18, 2020 at 4:41 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > hi folks,
> >
> > This e-mail comes in the context of two C++ data processing
> > subprojects we have discussed in the past
> >
> > * Data Frame API
> >
> https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit
> > * In-memory Query Engine
> >
> https://docs.google.com/document/d/10RoUZmiMQRi_J1FcPeVAUAMJ6d_ZuiEbaM2Y33sNPu4/edit
> >
> > One important area for both of these projects is the evaluation of
> > array expressions that are found inside projections (i.e. "SELECT" --
> > without any aggregate functions), filters (i.e. "WHERE"), and join
> > clauses. In summary, these are mostly elementwise functions that do
> > not alter the size of their input arrays. For conciseness I'll call
> > these kinds of expressions "array-exprs"
> >
> > Gandiva does LLVM code generation for evaluating array-exprs, but I
> > think it makes sense to have both codegen'd expressions as well as
> > X100-style [1] interpreted array-expr evaluation. These modes of query
> > evaluation ideally should share as much common implementation code as
> > possible (taking into account Gandiva's need to compile to LLVM IR)
> >
> > I have reviewed the contents of our arrow/compute/kernels directory
> > and made the following spreadsheet
> >
> >
> https://docs.google.com/spreadsheets/d/1oXVy29GT1apM4e3isNpzk497Re3OqM8APvlBZCDc8B8/edit?usp=sharing
> >
> > There are some problems with our current collection of kernels in the
> > context of array-expr evaluation in query processing:
> >
> > * For efficiency, kernels used for array-expr evaluation should write
> > into preallocated memory as their default mode. This enables the
> > interpreter to avoid temporary memory allocations and improve CPU
> > cache utilization. Almost none of our kernels are implemented this way
> > currently.
> > * The current approach for expr-type kernels of having a top-level
> > memory-allocating function is not scalable for binding developers. I
> > believe instead that kernels should be selected and invoked
> > generically by using the string name of the kernel
> >
> > On this last point, what I am suggesting is that we do something more
> like
> >
> > ASSIGN_OR_RAISE(auto kernel, compute::GetKernel("greater", {type0,
> type1}));
> > ArrayData* out = ... ;
> > RETURN_NOT_OK(kernel->Call({arg0, arg1}, &out));
> >
> > In particular, when we reason that successive kernel invocations can
> > reuse memory, we can have code that is doing in essence
> >
> > k1->Call({arg0, arg1}, &out)
> > k2->Call({out}, &out))
> > k3->Call({arg2, out}, &out)
> >
> > Of course, if you want the kernel to allocate memory then you can do
> > something like
> >
> > ASSIGN_OR_RAISE(out = kernel->Call({arg0, arg1}));
> >
> > And if you want to avoid the "GetKernel" business you should be able to
> do
> >
> > ASSIGN_OR_RAISE(auto result, ExecKernel(name, {arg0, arg1}));
> >
> > I think this "ExecKernel" function should likely replace the public
> > APIs for running specific kernels that we currently have.
> >
> > One detail that isn't addressed above is what to do with
> > kernel-specific configuration options. One way to address that is to
> > have a common base type for all options so that we can do
> >
> > struct MyKernelOptions : public KernelOptions {
> >   ...
> > };
> >
> > and then
> >
> > MyKernelOptions options = ...;
> > out = ExecKernel(name, args, options);
> >
> > Maybe there are some other ideas.
> >
> > At some point we will need to have an implementation blitz where we go
> > from our current 20 or so non-codegen'd kernels for array-exprs to
> > several hundred, so I wanted to discuss these issues so we can all get
> > on the same page. I'd like to take a crack at an initial iteration of
> > the above API proposal with a centralized kernel registry (that will
> > also need to support having UDFs) so we can begin porting the existing
> > array-expr kernels to use the new API and then have a clear path
> > forward for expanding our set of supported functions (which will also
> > ideally involve sharing implementation code with Gandiva, particularly
> > its "precompiled" directory)
> >
> > Thanks,
> > Wes
> >
> > [1]: http://cidrdb.org/cidr2005/papers/P19.pdf
>


-- 

*Sven Wagner-Boysen* | Lead Software Engineer

T: +49 160 930 70 6 70 | www.signavio.com
Kurfürstenstraße 111, 10787 Berlin, Germany

[image: https://www.signavio.com/] <https://www.signavio.com/>
<https://twitter.com/signavio/>   <https://www.facebook.com/signavio/>
<https://www.linkedin.com/company/signavio/>
<https://www.xing.com/company/signavio>
<https://www.youtube.com/user/signavio>

<https://www.signavio.com/live/?utm_campaign=Signavio%20Live%202020&utm_source=email&utm_medium=signature>
Check my LinkedIn Profile
<https://www.linkedin.com/in/sven-wagner-boysen-84231046/>

*Review us:*
Gartner Peer Insights
<https://www.gartner.com/reviews/market/enterprise-business-process-analysis/vendor/Signavio/product/signavio-process-manager>

HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123
Managing Directors: Dr. Gero Decker, Daniel Rosenthal

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

I started a brain dump of some issues that come to mind around kernel
implementation and array expression evaluation. I'll try to fill this
out, and it would be helpful to add supporting citations to other
projects about what kinds of issues come up and what implementation
strategies may be helpful. If anyone would like edit access please let
me know, otherwise feel free to add comments

https://docs.google.com/document/d/1LFk3WRfWGQbJ9uitWwucjiJsZMqLh8lC1vAUOscLtj8/edit?usp=sharing

On Sat, Apr 18, 2020 at 4:41 PM Wes McKinney <we...@gmail.com> wrote:
>
> hi folks,
>
> This e-mail comes in the context of two C++ data processing
> subprojects we have discussed in the past
>
> * Data Frame API
> https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit
> * In-memory Query Engine
> https://docs.google.com/document/d/10RoUZmiMQRi_J1FcPeVAUAMJ6d_ZuiEbaM2Y33sNPu4/edit
>
> One important area for both of these projects is the evaluation of
> array expressions that are found inside projections (i.e. "SELECT" --
> without any aggregate functions), filters (i.e. "WHERE"), and join
> clauses. In summary, these are mostly elementwise functions that do
> not alter the size of their input arrays. For conciseness I'll call
> these kinds of expressions "array-exprs"
>
> Gandiva does LLVM code generation for evaluating array-exprs, but I
> think it makes sense to have both codegen'd expressions as well as
> X100-style [1] interpreted array-expr evaluation. These modes of query
> evaluation ideally should share as much common implementation code as
> possible (taking into account Gandiva's need to compile to LLVM IR)
>
> I have reviewed the contents of our arrow/compute/kernels directory
> and made the following spreadsheet
>
> https://docs.google.com/spreadsheets/d/1oXVy29GT1apM4e3isNpzk497Re3OqM8APvlBZCDc8B8/edit?usp=sharing
>
> There are some problems with our current collection of kernels in the
> context of array-expr evaluation in query processing:
>
> * For efficiency, kernels used for array-expr evaluation should write
> into preallocated memory as their default mode. This enables the
> interpreter to avoid temporary memory allocations and improve CPU
> cache utilization. Almost none of our kernels are implemented this way
> currently.
> * The current approach for expr-type kernels of having a top-level
> memory-allocating function is not scalable for binding developers. I
> believe instead that kernels should be selected and invoked
> generically by using the string name of the kernel
>
> On this last point, what I am suggesting is that we do something more like
>
> ASSIGN_OR_RAISE(auto kernel, compute::GetKernel("greater", {type0, type1}));
> ArrayData* out = ... ;
> RETURN_NOT_OK(kernel->Call({arg0, arg1}, &out));
>
> In particular, when we reason that successive kernel invocations can
> reuse memory, we can have code that is doing in essence
>
> k1->Call({arg0, arg1}, &out)
> k2->Call({out}, &out))
> k3->Call({arg2, out}, &out)
>
> Of course, if you want the kernel to allocate memory then you can do
> something like
>
> ASSIGN_OR_RAISE(out = kernel->Call({arg0, arg1}));
>
> And if you want to avoid the "GetKernel" business you should be able to do
>
> ASSIGN_OR_RAISE(auto result, ExecKernel(name, {arg0, arg1}));
>
> I think this "ExecKernel" function should likely replace the public
> APIs for running specific kernels that we currently have.
>
> One detail that isn't addressed above is what to do with
> kernel-specific configuration options. One way to address that is to
> have a common base type for all options so that we can do
>
> struct MyKernelOptions : public KernelOptions {
>   ...
> };
>
> and then
>
> MyKernelOptions options = ...;
> out = ExecKernel(name, args, options);
>
> Maybe there are some other ideas.
>
> At some point we will need to have an implementation blitz where we go
> from our current 20 or so non-codegen'd kernels for array-exprs to
> several hundred, so I wanted to discuss these issues so we can all get
> on the same page. I'd like to take a crack at an initial iteration of
> the above API proposal with a centralized kernel registry (that will
> also need to support having UDFs) so we can begin porting the existing
> array-expr kernels to use the new API and then have a clear path
> forward for expanding our set of supported functions (which will also
> ideally involve sharing implementation code with Gandiva, particularly
> its "precompiled" directory)
>
> Thanks,
> Wes
>
> [1]: http://cidrdb.org/cidr2005/papers/P19.pdf

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

I have spent some time going through the JIRA backlog and have
organized an umbrella JIRA with about 75 issues under it to help
organize building out further compute kernels and kernel execution
functionality:

https://issues.apache.org/jira/browse/ARROW-8894

On Sun, May 24, 2020 at 9:36 AM Wes McKinney <we...@gmail.com> wrote:
>
> I have merged the patch but left the PR open for additional code review.
>
> On Sat, May 23, 2020 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > To be clear given the scope of code affected I think we should merge it today and address further feedback in a follow up patch. I will be diligent about responding to additional comments in the PR
> >
> > On Sat, May 23, 2020, 3:19 PM Wes McKinney <we...@gmail.com> wrote:
> >>
> >> Yes you should still be able to comment. I will reopen the PR after it is merged
> >>
> >> On Sat, May 23, 2020, 2:52 PM Micah Kornfield <em...@gmail.com> wrote:
> >>>
> >>> Hi Wes,
> >>> Will we still be able to comment on the PR once it is closed?
> >>>
> >>>
> >>> If we want to be inclusive on feedback it might pay to wait until Tuesday evening US time to merge since it is a long weekend here.
> >>>
> >>> Thanks,
> >>> Micah
> >>>
> >>> On Saturday, May 23, 2020, Wes McKinney <we...@gmail.com> wrote:
> >>>>
> >>>> Hi folks -- I've addressed a good deal of feedback and added a lot of
> >>>> comments and with Kou's help have got the build passing, It would be
> >>>> great if this could be merged soon to unblock follow up PRs
> >>>>
> >>>> On Wed, May 20, 2020 at 11:55 PM Wes McKinney <we...@gmail.com> wrote:
> >>>> >
> >>>> > I just opened the PR https://github.com/apache/arrow/pull/7240
> >>>> >
> >>>> > I'm sorry it's so big. I really think this is the best way. The only
> >>>> > further work I plan to do on it is to get the CI passing.
> >>>> >
> >>>> > On Wed, May 20, 2020 at 12:26 PM Wes McKinney <we...@gmail.com> wrote:
> >>>> > >
> >>>> > > I'd guess I'm < 24 hours away from putting up my initial PR for this
> >>>> > > work. Since the work is large and (for all practical purposes) nearly
> >>>> > > impossible to separate into separately merge-ready PRs, I'll start a
> >>>> > > new e-mail thread describing what I've done in more detail and
> >>>> > > proposing a path for merging the PR (so as to unblock PRs into
> >>>> > > arrow/compute and avoid rebasing headaches) and addressing review
> >>>> > > feedback that will likely take several weeks to fully accumulate.
> >>>> > >
> >>>> > > On Mon, May 11, 2020 at 5:17 PM Wes McKinney <we...@gmail.com> wrote:
> >>>> > > >
> >>>> > > > I'm working actively on this but perhaps as expected it has ballooned
> >>>> > > > into a very large project -- it's unclear at the moment whether I'll
> >>>> > > > be able to break the work into smaller patches that are easier to
> >>>> > > > digest. I'm working as fast as I can to have an initial
> >>>> > > > feature-preserving PR up, but the changes to the src/arrow/compute
> >>>> > > > directory are extensive, so I would please ask that we do not merge
> >>>> > > > non-essential patches into cpp/src/arrow/compute until I get the PR up
> >>>> > > > (estimated later this week at present rate) so we can see where things
> >>>> > > > stand.
> >>>> > > >
> >>>> > > > On Wed, Apr 22, 2020 at 8:06 AM Wes McKinney <we...@gmail.com> wrote:
> >>>> > > > >
> >>>> > > > > On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield <em...@gmail.com> wrote:
> >>>> > > > > >
> >>>> > > > > > Hi Wes,
> >>>> > > > > > I haven't had time to read the doc, but wanted to ask some questions on
> >>>> > > > > > points raised on the thread.
> >>>> > > > > >
> >>>> > > > > > * For efficiency, kernels used for array-expr evaluation should write
> >>>> > > > > > > into preallocated memory as their default mode. This enables the
> >>>> > > > > > > interpreter to avoid temporary memory allocations and improve CPU
> >>>> > > > > > > cache utilization. Almost none of our kernels are implemented this way
> >>>> > > > > > > currently.
> >>>> > > > > >
> >>>> > > > > > Did something change, I was pretty sure I submitted a patch a while ago for
> >>>> > > > > > boolean kernels, that separated out memory allocation from computation.
> >>>> > > > > > Which should allow for writing to the same memory.  Is this a concern with
> >>>> > > > > > the public Function APIs for the Kernel APIs themselves, or a lower level
> >>>> > > > > > implementation concern?
> >>>> > > > >
> >>>> > > > > Yes, you did in the internal implementation [1]. The concern is the
> >>>> > > > > public API and the general approach to implementing new kernels.
> >>>> > > > >
> >>>> > > > > I'm working on this right now (it's a large project so it will take me
> >>>> > > > > a little while to produce something to be reviewed) so bear with me =)
> >>>> > > > >
> >>>> > > > > [1]: https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76
> >>>> > > > >
> >>>> > > > > > * Sorting is generally handled by different data processing nodes from
> >>>> > > > > > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> >>>> > > > > > > Projections and Filters use expressions, they do not sort.
> >>>> > > > > >
> >>>> > > > > > Would sorting the list-column elements per row be an array-expr?
> >>>> > > > >
> >>>> > > > > Yes, as that's an element-wise function. When I said sorting I was
> >>>> > > > > referring to ORDER BY. The functions we have that do sorting do so in
> >>>> > > > > the context of a single array [2].
> >>>> > > > >
> >>>> > > > > A query engine must be able to sort a (potentially very large) stream
> >>>> > > > > of record batches. One approach is for the Sort operator to exhaust
> >>>> > > > > its child input, accumulating all of the record batches in memory
> >>>> > > > > (spilling to disk as needed) and then sorting and emitting record
> >>>> > > > > batches from the sorted records/tuples. See e.g. Impala's sorting code
> >>>> > > > > [3] [4]
> >>>> > > > >
> >>>> > > > > [2]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34
> >>>> > > > > [3]: https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h
> >>>> > > > > [4]: https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h
> >>>> > > > >
> >>>> > > > > >
> >>>> > > > > > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <we...@gmail.com> wrote:
> >>>> > > > > >
> >>>> > > > > > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <an...@python.org> wrote:
> >>>> > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
> >>>> > > > > > > > >>
> >>>> > > > > > > > >> That said, in the SortToIndices case, this wouldn't be a problem,
> >>>> > > > > > > since
> >>>> > > > > > > > >> only the second pass writes to the output.
> >>>> > > > > > > > >
> >>>> > > > > > > > > This kernel is not valid for normal array-exprs (see the spreadsheet I
> >>>> > > > > > > > > linked), such as what you can write in SQL
> >>>> > > > > > > > >
> >>>> > > > > > > > > Kernels like SortToIndices are a different type of function (in other
> >>>> > > > > > > > > words, "not a SQL function") and so if we choose to allow such a
> >>>> > > > > > > > > "non-SQL-like" functions in the expression evaluator then different
> >>>> > > > > > > > > logic must be used.
> >>>> > > > > > > >
> >>>> > > > > > > > Hmm, I think that maybe I'm misunderstanding at which level we're
> >>>> > > > > > > > talking here.  SortToIndices() may not be a "SQL function", but it looks
> >>>> > > > > > > > like an important basic block for a query engine (since, after all,
> >>>> > > > > > > > sorting results is an often used feature in SQL and other languages).
> >>>> > > > > > > > So it should be usable *inside* the expression engine, even though it's
> >>>> > > > > > > > not part of the exposed vocabulary, no?
> >>>> > > > > > >
> >>>> > > > > > > No, not as part of "expressions" as they are defined in the context of
> >>>> > > > > > > SQL engines.
> >>>> > > > > > >
> >>>> > > > > > > Sorting is generally handled by different data processing nodes from
> >>>> > > > > > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> >>>> > > > > > > Projections and Filters use expressions, they do not sort.
> >>>> > > > > > >
> >>>> > > > > > > > Regards
> >>>> > > > > > > >
> >>>> > > > > > > > Antoine.
> >>>> > > > > > >

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

I have merged the patch but left the PR open for additional code review.

On Sat, May 23, 2020 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:
>
> To be clear given the scope of code affected I think we should merge it today and address further feedback in a follow up patch. I will be diligent about responding to additional comments in the PR
>
> On Sat, May 23, 2020, 3:19 PM Wes McKinney <we...@gmail.com> wrote:
>>
>> Yes you should still be able to comment. I will reopen the PR after it is merged
>>
>> On Sat, May 23, 2020, 2:52 PM Micah Kornfield <em...@gmail.com> wrote:
>>>
>>> Hi Wes,
>>> Will we still be able to comment on the PR once it is closed?
>>>
>>>
>>> If we want to be inclusive on feedback it might pay to wait until Tuesday evening US time to merge since it is a long weekend here.
>>>
>>> Thanks,
>>> Micah
>>>
>>> On Saturday, May 23, 2020, Wes McKinney <we...@gmail.com> wrote:
>>>>
>>>> Hi folks -- I've addressed a good deal of feedback and added a lot of
>>>> comments and with Kou's help have got the build passing, It would be
>>>> great if this could be merged soon to unblock follow up PRs
>>>>
>>>> On Wed, May 20, 2020 at 11:55 PM Wes McKinney <we...@gmail.com> wrote:
>>>> >
>>>> > I just opened the PR https://github.com/apache/arrow/pull/7240
>>>> >
>>>> > I'm sorry it's so big. I really think this is the best way. The only
>>>> > further work I plan to do on it is to get the CI passing.
>>>> >
>>>> > On Wed, May 20, 2020 at 12:26 PM Wes McKinney <we...@gmail.com> wrote:
>>>> > >
>>>> > > I'd guess I'm < 24 hours away from putting up my initial PR for this
>>>> > > work. Since the work is large and (for all practical purposes) nearly
>>>> > > impossible to separate into separately merge-ready PRs, I'll start a
>>>> > > new e-mail thread describing what I've done in more detail and
>>>> > > proposing a path for merging the PR (so as to unblock PRs into
>>>> > > arrow/compute and avoid rebasing headaches) and addressing review
>>>> > > feedback that will likely take several weeks to fully accumulate.
>>>> > >
>>>> > > On Mon, May 11, 2020 at 5:17 PM Wes McKinney <we...@gmail.com> wrote:
>>>> > > >
>>>> > > > I'm working actively on this but perhaps as expected it has ballooned
>>>> > > > into a very large project -- it's unclear at the moment whether I'll
>>>> > > > be able to break the work into smaller patches that are easier to
>>>> > > > digest. I'm working as fast as I can to have an initial
>>>> > > > feature-preserving PR up, but the changes to the src/arrow/compute
>>>> > > > directory are extensive, so I would please ask that we do not merge
>>>> > > > non-essential patches into cpp/src/arrow/compute until I get the PR up
>>>> > > > (estimated later this week at present rate) so we can see where things
>>>> > > > stand.
>>>> > > >
>>>> > > > On Wed, Apr 22, 2020 at 8:06 AM Wes McKinney <we...@gmail.com> wrote:
>>>> > > > >
>>>> > > > > On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield <em...@gmail.com> wrote:
>>>> > > > > >
>>>> > > > > > Hi Wes,
>>>> > > > > > I haven't had time to read the doc, but wanted to ask some questions on
>>>> > > > > > points raised on the thread.
>>>> > > > > >
>>>> > > > > > * For efficiency, kernels used for array-expr evaluation should write
>>>> > > > > > > into preallocated memory as their default mode. This enables the
>>>> > > > > > > interpreter to avoid temporary memory allocations and improve CPU
>>>> > > > > > > cache utilization. Almost none of our kernels are implemented this way
>>>> > > > > > > currently.
>>>> > > > > >
>>>> > > > > > Did something change, I was pretty sure I submitted a patch a while ago for
>>>> > > > > > boolean kernels, that separated out memory allocation from computation.
>>>> > > > > > Which should allow for writing to the same memory.  Is this a concern with
>>>> > > > > > the public Function APIs for the Kernel APIs themselves, or a lower level
>>>> > > > > > implementation concern?
>>>> > > > >
>>>> > > > > Yes, you did in the internal implementation [1]. The concern is the
>>>> > > > > public API and the general approach to implementing new kernels.
>>>> > > > >
>>>> > > > > I'm working on this right now (it's a large project so it will take me
>>>> > > > > a little while to produce something to be reviewed) so bear with me =)
>>>> > > > >
>>>> > > > > [1]: https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76
>>>> > > > >
>>>> > > > > > * Sorting is generally handled by different data processing nodes from
>>>> > > > > > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
>>>> > > > > > > Projections and Filters use expressions, they do not sort.
>>>> > > > > >
>>>> > > > > > Would sorting the list-column elements per row be an array-expr?
>>>> > > > >
>>>> > > > > Yes, as that's an element-wise function. When I said sorting I was
>>>> > > > > referring to ORDER BY. The functions we have that do sorting do so in
>>>> > > > > the context of a single array [2].
>>>> > > > >
>>>> > > > > A query engine must be able to sort a (potentially very large) stream
>>>> > > > > of record batches. One approach is for the Sort operator to exhaust
>>>> > > > > its child input, accumulating all of the record batches in memory
>>>> > > > > (spilling to disk as needed) and then sorting and emitting record
>>>> > > > > batches from the sorted records/tuples. See e.g. Impala's sorting code
>>>> > > > > [3] [4]
>>>> > > > >
>>>> > > > > [2]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34
>>>> > > > > [3]: https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h
>>>> > > > > [4]: https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h
>>>> > > > >
>>>> > > > > >
>>>> > > > > > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <we...@gmail.com> wrote:
>>>> > > > > >
>>>> > > > > > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <an...@python.org> wrote:
>>>> > > > > > > >
>>>> > > > > > > >
>>>> > > > > > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
>>>> > > > > > > > >>
>>>> > > > > > > > >> That said, in the SortToIndices case, this wouldn't be a problem,
>>>> > > > > > > since
>>>> > > > > > > > >> only the second pass writes to the output.
>>>> > > > > > > > >
>>>> > > > > > > > > This kernel is not valid for normal array-exprs (see the spreadsheet I
>>>> > > > > > > > > linked), such as what you can write in SQL
>>>> > > > > > > > >
>>>> > > > > > > > > Kernels like SortToIndices are a different type of function (in other
>>>> > > > > > > > > words, "not a SQL function") and so if we choose to allow such a
>>>> > > > > > > > > "non-SQL-like" functions in the expression evaluator then different
>>>> > > > > > > > > logic must be used.
>>>> > > > > > > >
>>>> > > > > > > > Hmm, I think that maybe I'm misunderstanding at which level we're
>>>> > > > > > > > talking here.  SortToIndices() may not be a "SQL function", but it looks
>>>> > > > > > > > like an important basic block for a query engine (since, after all,
>>>> > > > > > > > sorting results is an often used feature in SQL and other languages).
>>>> > > > > > > > So it should be usable *inside* the expression engine, even though it's
>>>> > > > > > > > not part of the exposed vocabulary, no?
>>>> > > > > > >
>>>> > > > > > > No, not as part of "expressions" as they are defined in the context of
>>>> > > > > > > SQL engines.
>>>> > > > > > >
>>>> > > > > > > Sorting is generally handled by different data processing nodes from
>>>> > > > > > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
>>>> > > > > > > Projections and Filters use expressions, they do not sort.
>>>> > > > > > >
>>>> > > > > > > > Regards
>>>> > > > > > > >
>>>> > > > > > > > Antoine.
>>>> > > > > > >

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

To be clear given the scope of code affected I think we should merge it
today and address further feedback in a follow up patch. I will be diligent
about responding to additional comments in the PR

On Sat, May 23, 2020, 3:19 PM Wes McKinney <we...@gmail.com> wrote:

> Yes you should still be able to comment. I will reopen the PR after it is
> merged
>
> On Sat, May 23, 2020, 2:52 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
>> Hi Wes,
>> Will we still be able to comment on the PR once it is closed?
>>
>>
>> If we want to be inclusive on feedback it might pay to wait until Tuesday
>> evening US time to merge since it is a long weekend here.
>>
>> Thanks,
>> Micah
>>
>> On Saturday, May 23, 2020, Wes McKinney <we...@gmail.com> wrote:
>>
>>> Hi folks -- I've addressed a good deal of feedback and added a lot of
>>> comments and with Kou's help have got the build passing, It would be
>>> great if this could be merged soon to unblock follow up PRs
>>>
>>> On Wed, May 20, 2020 at 11:55 PM Wes McKinney <we...@gmail.com>
>>> wrote:
>>> >
>>> > I just opened the PR https://github.com/apache/arrow/pull/7240
>>> >
>>> > I'm sorry it's so big. I really think this is the best way. The only
>>> > further work I plan to do on it is to get the CI passing.
>>> >
>>> > On Wed, May 20, 2020 at 12:26 PM Wes McKinney <we...@gmail.com>
>>> wrote:
>>> > >
>>> > > I'd guess I'm < 24 hours away from putting up my initial PR for this
>>> > > work. Since the work is large and (for all practical purposes) nearly
>>> > > impossible to separate into separately merge-ready PRs, I'll start a
>>> > > new e-mail thread describing what I've done in more detail and
>>> > > proposing a path for merging the PR (so as to unblock PRs into
>>> > > arrow/compute and avoid rebasing headaches) and addressing review
>>> > > feedback that will likely take several weeks to fully accumulate.
>>> > >
>>> > > On Mon, May 11, 2020 at 5:17 PM Wes McKinney <we...@gmail.com>
>>> wrote:
>>> > > >
>>> > > > I'm working actively on this but perhaps as expected it has
>>> ballooned
>>> > > > into a very large project -- it's unclear at the moment whether
>>> I'll
>>> > > > be able to break the work into smaller patches that are easier to
>>> > > > digest. I'm working as fast as I can to have an initial
>>> > > > feature-preserving PR up, but the changes to the src/arrow/compute
>>> > > > directory are extensive, so I would please ask that we do not merge
>>> > > > non-essential patches into cpp/src/arrow/compute until I get the
>>> PR up
>>> > > > (estimated later this week at present rate) so we can see where
>>> things
>>> > > > stand.
>>> > > >
>>> > > > On Wed, Apr 22, 2020 at 8:06 AM Wes McKinney <we...@gmail.com>
>>> wrote:
>>> > > > >
>>> > > > > On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield <
>>> emkornfield@gmail.com> wrote:
>>> > > > > >
>>> > > > > > Hi Wes,
>>> > > > > > I haven't had time to read the doc, but wanted to ask some
>>> questions on
>>> > > > > > points raised on the thread.
>>> > > > > >
>>> > > > > > * For efficiency, kernels used for array-expr evaluation
>>> should write
>>> > > > > > > into preallocated memory as their default mode. This enables
>>> the
>>> > > > > > > interpreter to avoid temporary memory allocations and
>>> improve CPU
>>> > > > > > > cache utilization. Almost none of our kernels are
>>> implemented this way
>>> > > > > > > currently.
>>> > > > > >
>>> > > > > > Did something change, I was pretty sure I submitted a patch a
>>> while ago for
>>> > > > > > boolean kernels, that separated out memory allocation from
>>> computation.
>>> > > > > > Which should allow for writing to the same memory.  Is this a
>>> concern with
>>> > > > > > the public Function APIs for the Kernel APIs themselves, or a
>>> lower level
>>> > > > > > implementation concern?
>>> > > > >
>>> > > > > Yes, you did in the internal implementation [1]. The concern is
>>> the
>>> > > > > public API and the general approach to implementing new kernels.
>>> > > > >
>>> > > > > I'm working on this right now (it's a large project so it will
>>> take me
>>> > > > > a little while to produce something to be reviewed) so bear with
>>> me =)
>>> > > > >
>>> > > > > [1]:
>>> https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76
>>> > > > >
>>> > > > > > * Sorting is generally handled by different data processing
>>> nodes from
>>> > > > > > > Projections, Aggregations / Hash Aggregations, Filters, and
>>> Joins.
>>> > > > > > > Projections and Filters use expressions, they do not sort.
>>> > > > > >
>>> > > > > > Would sorting the list-column elements per row be an
>>> array-expr?
>>> > > > >
>>> > > > > Yes, as that's an element-wise function. When I said sorting I
>>> was
>>> > > > > referring to ORDER BY. The functions we have that do sorting do
>>> so in
>>> > > > > the context of a single array [2].
>>> > > > >
>>> > > > > A query engine must be able to sort a (potentially very large)
>>> stream
>>> > > > > of record batches. One approach is for the Sort operator to
>>> exhaust
>>> > > > > its child input, accumulating all of the record batches in memory
>>> > > > > (spilling to disk as needed) and then sorting and emitting record
>>> > > > > batches from the sorted records/tuples. See e.g. Impala's
>>> sorting code
>>> > > > > [3] [4]
>>> > > > >
>>> > > > > [2]:
>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34
>>> > > > > [3]:
>>> https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h
>>> > > > > [4]:
>>> https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h
>>> > > > >
>>> > > > > >
>>> > > > > > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <
>>> wesmckinn@gmail.com> wrote:
>>> > > > > >
>>> > > > > > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <
>>> antoine@python.org> wrote:
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
>>> > > > > > > > >>
>>> > > > > > > > >> That said, in the SortToIndices case, this wouldn't be
>>> a problem,
>>> > > > > > > since
>>> > > > > > > > >> only the second pass writes to the output.
>>> > > > > > > > >
>>> > > > > > > > > This kernel is not valid for normal array-exprs (see the
>>> spreadsheet I
>>> > > > > > > > > linked), such as what you can write in SQL
>>> > > > > > > > >
>>> > > > > > > > > Kernels like SortToIndices are a different type of
>>> function (in other
>>> > > > > > > > > words, "not a SQL function") and so if we choose to
>>> allow such a
>>> > > > > > > > > "non-SQL-like" functions in the expression evaluator
>>> then different
>>> > > > > > > > > logic must be used.
>>> > > > > > > >
>>> > > > > > > > Hmm, I think that maybe I'm misunderstanding at which
>>> level we're
>>> > > > > > > > talking here.  SortToIndices() may not be a "SQL
>>> function", but it looks
>>> > > > > > > > like an important basic block for a query engine (since,
>>> after all,
>>> > > > > > > > sorting results is an often used feature in SQL and other
>>> languages).
>>> > > > > > > > So it should be usable *inside* the expression engine,
>>> even though it's
>>> > > > > > > > not part of the exposed vocabulary, no?
>>> > > > > > >
>>> > > > > > > No, not as part of "expressions" as they are defined in the
>>> context of
>>> > > > > > > SQL engines.
>>> > > > > > >
>>> > > > > > > Sorting is generally handled by different data processing
>>> nodes from
>>> > > > > > > Projections, Aggregations / Hash Aggregations, Filters, and
>>> Joins.
>>> > > > > > > Projections and Filters use expressions, they do not sort.
>>> > > > > > >
>>> > > > > > > > Regards
>>> > > > > > > >
>>> > > > > > > > Antoine.
>>> > > > > > >
>>>
>>

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

Yes you should still be able to comment. I will reopen the PR after it is
merged

On Sat, May 23, 2020, 2:52 PM Micah Kornfield <em...@gmail.com> wrote:

> Hi Wes,
> Will we still be able to comment on the PR once it is closed?
>
>
> If we want to be inclusive on feedback it might pay to wait until Tuesday
> evening US time to merge since it is a long weekend here.
>
> Thanks,
> Micah
>
> On Saturday, May 23, 2020, Wes McKinney <we...@gmail.com> wrote:
>
>> Hi folks -- I've addressed a good deal of feedback and added a lot of
>> comments and with Kou's help have got the build passing, It would be
>> great if this could be merged soon to unblock follow up PRs
>>
>> On Wed, May 20, 2020 at 11:55 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> >
>> > I just opened the PR https://github.com/apache/arrow/pull/7240
>> >
>> > I'm sorry it's so big. I really think this is the best way. The only
>> > further work I plan to do on it is to get the CI passing.
>> >
>> > On Wed, May 20, 2020 at 12:26 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> > >
>> > > I'd guess I'm < 24 hours away from putting up my initial PR for this
>> > > work. Since the work is large and (for all practical purposes) nearly
>> > > impossible to separate into separately merge-ready PRs, I'll start a
>> > > new e-mail thread describing what I've done in more detail and
>> > > proposing a path for merging the PR (so as to unblock PRs into
>> > > arrow/compute and avoid rebasing headaches) and addressing review
>> > > feedback that will likely take several weeks to fully accumulate.
>> > >
>> > > On Mon, May 11, 2020 at 5:17 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> > > >
>> > > > I'm working actively on this but perhaps as expected it has
>> ballooned
>> > > > into a very large project -- it's unclear at the moment whether I'll
>> > > > be able to break the work into smaller patches that are easier to
>> > > > digest. I'm working as fast as I can to have an initial
>> > > > feature-preserving PR up, but the changes to the src/arrow/compute
>> > > > directory are extensive, so I would please ask that we do not merge
>> > > > non-essential patches into cpp/src/arrow/compute until I get the PR
>> up
>> > > > (estimated later this week at present rate) so we can see where
>> things
>> > > > stand.
>> > > >
>> > > > On Wed, Apr 22, 2020 at 8:06 AM Wes McKinney <we...@gmail.com>
>> wrote:
>> > > > >
>> > > > > On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield <
>> emkornfield@gmail.com> wrote:
>> > > > > >
>> > > > > > Hi Wes,
>> > > > > > I haven't had time to read the doc, but wanted to ask some
>> questions on
>> > > > > > points raised on the thread.
>> > > > > >
>> > > > > > * For efficiency, kernels used for array-expr evaluation should
>> write
>> > > > > > > into preallocated memory as their default mode. This enables
>> the
>> > > > > > > interpreter to avoid temporary memory allocations and improve
>> CPU
>> > > > > > > cache utilization. Almost none of our kernels are implemented
>> this way
>> > > > > > > currently.
>> > > > > >
>> > > > > > Did something change, I was pretty sure I submitted a patch a
>> while ago for
>> > > > > > boolean kernels, that separated out memory allocation from
>> computation.
>> > > > > > Which should allow for writing to the same memory.  Is this a
>> concern with
>> > > > > > the public Function APIs for the Kernel APIs themselves, or a
>> lower level
>> > > > > > implementation concern?
>> > > > >
>> > > > > Yes, you did in the internal implementation [1]. The concern is
>> the
>> > > > > public API and the general approach to implementing new kernels.
>> > > > >
>> > > > > I'm working on this right now (it's a large project so it will
>> take me
>> > > > > a little while to produce something to be reviewed) so bear with
>> me =)
>> > > > >
>> > > > > [1]:
>> https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76
>> > > > >
>> > > > > > * Sorting is generally handled by different data processing
>> nodes from
>> > > > > > > Projections, Aggregations / Hash Aggregations, Filters, and
>> Joins.
>> > > > > > > Projections and Filters use expressions, they do not sort.
>> > > > > >
>> > > > > > Would sorting the list-column elements per row be an array-expr?
>> > > > >
>> > > > > Yes, as that's an element-wise function. When I said sorting I was
>> > > > > referring to ORDER BY. The functions we have that do sorting do
>> so in
>> > > > > the context of a single array [2].
>> > > > >
>> > > > > A query engine must be able to sort a (potentially very large)
>> stream
>> > > > > of record batches. One approach is for the Sort operator to
>> exhaust
>> > > > > its child input, accumulating all of the record batches in memory
>> > > > > (spilling to disk as needed) and then sorting and emitting record
>> > > > > batches from the sorted records/tuples. See e.g. Impala's sorting
>> code
>> > > > > [3] [4]
>> > > > >
>> > > > > [2]:
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34
>> > > > > [3]:
>> https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h
>> > > > > [4]:
>> https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h
>> > > > >
>> > > > > >
>> > > > > > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <
>> wesmckinn@gmail.com> wrote:
>> > > > > >
>> > > > > > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <
>> antoine@python.org> wrote:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
>> > > > > > > > >>
>> > > > > > > > >> That said, in the SortToIndices case, this wouldn't be a
>> problem,
>> > > > > > > since
>> > > > > > > > >> only the second pass writes to the output.
>> > > > > > > > >
>> > > > > > > > > This kernel is not valid for normal array-exprs (see the
>> spreadsheet I
>> > > > > > > > > linked), such as what you can write in SQL
>> > > > > > > > >
>> > > > > > > > > Kernels like SortToIndices are a different type of
>> function (in other
>> > > > > > > > > words, "not a SQL function") and so if we choose to allow
>> such a
>> > > > > > > > > "non-SQL-like" functions in the expression evaluator then
>> different
>> > > > > > > > > logic must be used.
>> > > > > > > >
>> > > > > > > > Hmm, I think that maybe I'm misunderstanding at which level
>> we're
>> > > > > > > > talking here.  SortToIndices() may not be a "SQL function",
>> but it looks
>> > > > > > > > like an important basic block for a query engine (since,
>> after all,
>> > > > > > > > sorting results is an often used feature in SQL and other
>> languages).
>> > > > > > > > So it should be usable *inside* the expression engine, even
>> though it's
>> > > > > > > > not part of the exposed vocabulary, no?
>> > > > > > >
>> > > > > > > No, not as part of "expressions" as they are defined in the
>> context of
>> > > > > > > SQL engines.
>> > > > > > >
>> > > > > > > Sorting is generally handled by different data processing
>> nodes from
>> > > > > > > Projections, Aggregations / Hash Aggregations, Filters, and
>> Joins.
>> > > > > > > Projections and Filters use expressions, they do not sort.
>> > > > > > >
>> > > > > > > > Regards
>> > > > > > > >
>> > > > > > > > Antoine.
>> > > > > > >
>>
>

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Micah Kornfield <em...@gmail.com>.

Hi Wes,
Will we still be able to comment on the PR once it is closed?


If we want to be inclusive on feedback it might pay to wait until Tuesday
evening US time to merge since it is a long weekend here.

Thanks,
Micah

On Saturday, May 23, 2020, Wes McKinney <we...@gmail.com> wrote:

> Hi folks -- I've addressed a good deal of feedback and added a lot of
> comments and with Kou's help have got the build passing, It would be
> great if this could be merged soon to unblock follow up PRs
>
> On Wed, May 20, 2020 at 11:55 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > I just opened the PR https://github.com/apache/arrow/pull/7240
> >
> > I'm sorry it's so big. I really think this is the best way. The only
> > further work I plan to do on it is to get the CI passing.
> >
> > On Wed, May 20, 2020 at 12:26 PM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > I'd guess I'm < 24 hours away from putting up my initial PR for this
> > > work. Since the work is large and (for all practical purposes) nearly
> > > impossible to separate into separately merge-ready PRs, I'll start a
> > > new e-mail thread describing what I've done in more detail and
> > > proposing a path for merging the PR (so as to unblock PRs into
> > > arrow/compute and avoid rebasing headaches) and addressing review
> > > feedback that will likely take several weeks to fully accumulate.
> > >
> > > On Mon, May 11, 2020 at 5:17 PM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > > I'm working actively on this but perhaps as expected it has ballooned
> > > > into a very large project -- it's unclear at the moment whether I'll
> > > > be able to break the work into smaller patches that are easier to
> > > > digest. I'm working as fast as I can to have an initial
> > > > feature-preserving PR up, but the changes to the src/arrow/compute
> > > > directory are extensive, so I would please ask that we do not merge
> > > > non-essential patches into cpp/src/arrow/compute until I get the PR
> up
> > > > (estimated later this week at present rate) so we can see where
> things
> > > > stand.
> > > >
> > > > On Wed, Apr 22, 2020 at 8:06 AM Wes McKinney <we...@gmail.com>
> wrote:
> > > > >
> > > > > On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield <
> emkornfield@gmail.com> wrote:
> > > > > >
> > > > > > Hi Wes,
> > > > > > I haven't had time to read the doc, but wanted to ask some
> questions on
> > > > > > points raised on the thread.
> > > > > >
> > > > > > * For efficiency, kernels used for array-expr evaluation should
> write
> > > > > > > into preallocated memory as their default mode. This enables
> the
> > > > > > > interpreter to avoid temporary memory allocations and improve
> CPU
> > > > > > > cache utilization. Almost none of our kernels are implemented
> this way
> > > > > > > currently.
> > > > > >
> > > > > > Did something change, I was pretty sure I submitted a patch a
> while ago for
> > > > > > boolean kernels, that separated out memory allocation from
> computation.
> > > > > > Which should allow for writing to the same memory.  Is this a
> concern with
> > > > > > the public Function APIs for the Kernel APIs themselves, or a
> lower level
> > > > > > implementation concern?
> > > > >
> > > > > Yes, you did in the internal implementation [1]. The concern is the
> > > > > public API and the general approach to implementing new kernels.
> > > > >
> > > > > I'm working on this right now (it's a large project so it will
> take me
> > > > > a little while to produce something to be reviewed) so bear with
> me =)
> > > > >
> > > > > [1]: https://github.com/apache/arrow/commit/
> 4910fbf4fda05b864daaba820db08291e4afdcb6#diff-
> 561ea05d36150eb15842f452e3f07c76
> > > > >
> > > > > > * Sorting is generally handled by different data processing
> nodes from
> > > > > > > Projections, Aggregations / Hash Aggregations, Filters, and
> Joins.
> > > > > > > Projections and Filters use expressions, they do not sort.
> > > > > >
> > > > > > Would sorting the list-column elements per row be an array-expr?
> > > > >
> > > > > Yes, as that's an element-wise function. When I said sorting I was
> > > > > referring to ORDER BY. The functions we have that do sorting do so
> in
> > > > > the context of a single array [2].
> > > > >
> > > > > A query engine must be able to sort a (potentially very large)
> stream
> > > > > of record batches. One approach is for the Sort operator to exhaust
> > > > > its child input, accumulating all of the record batches in memory
> > > > > (spilling to disk as needed) and then sorting and emitting record
> > > > > batches from the sorted records/tuples. See e.g. Impala's sorting
> code
> > > > > [3] [4]
> > > > >
> > > > > [2]: https://github.com/apache/arrow/blob/master/cpp/src/
> arrow/compute/kernels/sort_to_indices.h#L34
> > > > > [3]: https://github.com/apache/impala/blob/master/be/src/
> runtime/sorter.h
> > > > > [4]: https://github.com/apache/impala/blob/master/be/src/
> exec/sort-node.h
> > > > >
> > > > > >
> > > > > > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <
> wesmckinn@gmail.com> wrote:
> > > > > >
> > > > > > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <
> antoine@python.org> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
> > > > > > > > >>
> > > > > > > > >> That said, in the SortToIndices case, this wouldn't be a
> problem,
> > > > > > > since
> > > > > > > > >> only the second pass writes to the output.
> > > > > > > > >
> > > > > > > > > This kernel is not valid for normal array-exprs (see the
> spreadsheet I
> > > > > > > > > linked), such as what you can write in SQL
> > > > > > > > >
> > > > > > > > > Kernels like SortToIndices are a different type of
> function (in other
> > > > > > > > > words, "not a SQL function") and so if we choose to allow
> such a
> > > > > > > > > "non-SQL-like" functions in the expression evaluator then
> different
> > > > > > > > > logic must be used.
> > > > > > > >
> > > > > > > > Hmm, I think that maybe I'm misunderstanding at which level
> we're
> > > > > > > > talking here.  SortToIndices() may not be a "SQL function",
> but it looks
> > > > > > > > like an important basic block for a query engine (since,
> after all,
> > > > > > > > sorting results is an often used feature in SQL and other
> languages).
> > > > > > > > So it should be usable *inside* the expression engine, even
> though it's
> > > > > > > > not part of the exposed vocabulary, no?
> > > > > > >
> > > > > > > No, not as part of "expressions" as they are defined in the
> context of
> > > > > > > SQL engines.
> > > > > > >
> > > > > > > Sorting is generally handled by different data processing
> nodes from
> > > > > > > Projections, Aggregations / Hash Aggregations, Filters, and
> Joins.
> > > > > > > Projections and Filters use expressions, they do not sort.
> > > > > > >
> > > > > > > > Regards
> > > > > > > >
> > > > > > > > Antoine.
> > > > > > >
>

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

Hi folks -- I've addressed a good deal of feedback and added a lot of
comments and with Kou's help have got the build passing, It would be
great if this could be merged soon to unblock follow up PRs

On Wed, May 20, 2020 at 11:55 PM Wes McKinney <we...@gmail.com> wrote:
>
> I just opened the PR https://github.com/apache/arrow/pull/7240
>
> I'm sorry it's so big. I really think this is the best way. The only
> further work I plan to do on it is to get the CI passing.
>
> On Wed, May 20, 2020 at 12:26 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > I'd guess I'm < 24 hours away from putting up my initial PR for this
> > work. Since the work is large and (for all practical purposes) nearly
> > impossible to separate into separately merge-ready PRs, I'll start a
> > new e-mail thread describing what I've done in more detail and
> > proposing a path for merging the PR (so as to unblock PRs into
> > arrow/compute and avoid rebasing headaches) and addressing review
> > feedback that will likely take several weeks to fully accumulate.
> >
> > On Mon, May 11, 2020 at 5:17 PM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > I'm working actively on this but perhaps as expected it has ballooned
> > > into a very large project -- it's unclear at the moment whether I'll
> > > be able to break the work into smaller patches that are easier to
> > > digest. I'm working as fast as I can to have an initial
> > > feature-preserving PR up, but the changes to the src/arrow/compute
> > > directory are extensive, so I would please ask that we do not merge
> > > non-essential patches into cpp/src/arrow/compute until I get the PR up
> > > (estimated later this week at present rate) so we can see where things
> > > stand.
> > >
> > > On Wed, Apr 22, 2020 at 8:06 AM Wes McKinney <we...@gmail.com> wrote:
> > > >
> > > > On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield <em...@gmail.com> wrote:
> > > > >
> > > > > Hi Wes,
> > > > > I haven't had time to read the doc, but wanted to ask some questions on
> > > > > points raised on the thread.
> > > > >
> > > > > * For efficiency, kernels used for array-expr evaluation should write
> > > > > > into preallocated memory as their default mode. This enables the
> > > > > > interpreter to avoid temporary memory allocations and improve CPU
> > > > > > cache utilization. Almost none of our kernels are implemented this way
> > > > > > currently.
> > > > >
> > > > > Did something change, I was pretty sure I submitted a patch a while ago for
> > > > > boolean kernels, that separated out memory allocation from computation.
> > > > > Which should allow for writing to the same memory.  Is this a concern with
> > > > > the public Function APIs for the Kernel APIs themselves, or a lower level
> > > > > implementation concern?
> > > >
> > > > Yes, you did in the internal implementation [1]. The concern is the
> > > > public API and the general approach to implementing new kernels.
> > > >
> > > > I'm working on this right now (it's a large project so it will take me
> > > > a little while to produce something to be reviewed) so bear with me =)
> > > >
> > > > [1]: https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76
> > > >
> > > > > * Sorting is generally handled by different data processing nodes from
> > > > > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > > > > > Projections and Filters use expressions, they do not sort.
> > > > >
> > > > > Would sorting the list-column elements per row be an array-expr?
> > > >
> > > > Yes, as that's an element-wise function. When I said sorting I was
> > > > referring to ORDER BY. The functions we have that do sorting do so in
> > > > the context of a single array [2].
> > > >
> > > > A query engine must be able to sort a (potentially very large) stream
> > > > of record batches. One approach is for the Sort operator to exhaust
> > > > its child input, accumulating all of the record batches in memory
> > > > (spilling to disk as needed) and then sorting and emitting record
> > > > batches from the sorted records/tuples. See e.g. Impala's sorting code
> > > > [3] [4]
> > > >
> > > > [2]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34
> > > > [3]: https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h
> > > > [4]: https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h
> > > >
> > > > >
> > > > > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <we...@gmail.com> wrote:
> > > > >
> > > > > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <an...@python.org> wrote:
> > > > > > >
> > > > > > >
> > > > > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
> > > > > > > >>
> > > > > > > >> That said, in the SortToIndices case, this wouldn't be a problem,
> > > > > > since
> > > > > > > >> only the second pass writes to the output.
> > > > > > > >
> > > > > > > > This kernel is not valid for normal array-exprs (see the spreadsheet I
> > > > > > > > linked), such as what you can write in SQL
> > > > > > > >
> > > > > > > > Kernels like SortToIndices are a different type of function (in other
> > > > > > > > words, "not a SQL function") and so if we choose to allow such a
> > > > > > > > "non-SQL-like" functions in the expression evaluator then different
> > > > > > > > logic must be used.
> > > > > > >
> > > > > > > Hmm, I think that maybe I'm misunderstanding at which level we're
> > > > > > > talking here.  SortToIndices() may not be a "SQL function", but it looks
> > > > > > > like an important basic block for a query engine (since, after all,
> > > > > > > sorting results is an often used feature in SQL and other languages).
> > > > > > > So it should be usable *inside* the expression engine, even though it's
> > > > > > > not part of the exposed vocabulary, no?
> > > > > >
> > > > > > No, not as part of "expressions" as they are defined in the context of
> > > > > > SQL engines.
> > > > > >
> > > > > > Sorting is generally handled by different data processing nodes from
> > > > > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > > > > > Projections and Filters use expressions, they do not sort.
> > > > > >
> > > > > > > Regards
> > > > > > >
> > > > > > > Antoine.
> > > > > >

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

I just opened the PR https://github.com/apache/arrow/pull/7240

I'm sorry it's so big. I really think this is the best way. The only
further work I plan to do on it is to get the CI passing.

On Wed, May 20, 2020 at 12:26 PM Wes McKinney <we...@gmail.com> wrote:
>
> I'd guess I'm < 24 hours away from putting up my initial PR for this
> work. Since the work is large and (for all practical purposes) nearly
> impossible to separate into separately merge-ready PRs, I'll start a
> new e-mail thread describing what I've done in more detail and
> proposing a path for merging the PR (so as to unblock PRs into
> arrow/compute and avoid rebasing headaches) and addressing review
> feedback that will likely take several weeks to fully accumulate.
>
> On Mon, May 11, 2020 at 5:17 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > I'm working actively on this but perhaps as expected it has ballooned
> > into a very large project -- it's unclear at the moment whether I'll
> > be able to break the work into smaller patches that are easier to
> > digest. I'm working as fast as I can to have an initial
> > feature-preserving PR up, but the changes to the src/arrow/compute
> > directory are extensive, so I would please ask that we do not merge
> > non-essential patches into cpp/src/arrow/compute until I get the PR up
> > (estimated later this week at present rate) so we can see where things
> > stand.
> >
> > On Wed, Apr 22, 2020 at 8:06 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield <em...@gmail.com> wrote:
> > > >
> > > > Hi Wes,
> > > > I haven't had time to read the doc, but wanted to ask some questions on
> > > > points raised on the thread.
> > > >
> > > > * For efficiency, kernels used for array-expr evaluation should write
> > > > > into preallocated memory as their default mode. This enables the
> > > > > interpreter to avoid temporary memory allocations and improve CPU
> > > > > cache utilization. Almost none of our kernels are implemented this way
> > > > > currently.
> > > >
> > > > Did something change, I was pretty sure I submitted a patch a while ago for
> > > > boolean kernels, that separated out memory allocation from computation.
> > > > Which should allow for writing to the same memory.  Is this a concern with
> > > > the public Function APIs for the Kernel APIs themselves, or a lower level
> > > > implementation concern?
> > >
> > > Yes, you did in the internal implementation [1]. The concern is the
> > > public API and the general approach to implementing new kernels.
> > >
> > > I'm working on this right now (it's a large project so it will take me
> > > a little while to produce something to be reviewed) so bear with me =)
> > >
> > > [1]: https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76
> > >
> > > > * Sorting is generally handled by different data processing nodes from
> > > > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > > > > Projections and Filters use expressions, they do not sort.
> > > >
> > > > Would sorting the list-column elements per row be an array-expr?
> > >
> > > Yes, as that's an element-wise function. When I said sorting I was
> > > referring to ORDER BY. The functions we have that do sorting do so in
> > > the context of a single array [2].
> > >
> > > A query engine must be able to sort a (potentially very large) stream
> > > of record batches. One approach is for the Sort operator to exhaust
> > > its child input, accumulating all of the record batches in memory
> > > (spilling to disk as needed) and then sorting and emitting record
> > > batches from the sorted records/tuples. See e.g. Impala's sorting code
> > > [3] [4]
> > >
> > > [2]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34
> > > [3]: https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h
> > > [4]: https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h
> > >
> > > >
> > > > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <we...@gmail.com> wrote:
> > > >
> > > > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <an...@python.org> wrote:
> > > > > >
> > > > > >
> > > > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
> > > > > > >>
> > > > > > >> That said, in the SortToIndices case, this wouldn't be a problem,
> > > > > since
> > > > > > >> only the second pass writes to the output.
> > > > > > >
> > > > > > > This kernel is not valid for normal array-exprs (see the spreadsheet I
> > > > > > > linked), such as what you can write in SQL
> > > > > > >
> > > > > > > Kernels like SortToIndices are a different type of function (in other
> > > > > > > words, "not a SQL function") and so if we choose to allow such a
> > > > > > > "non-SQL-like" functions in the expression evaluator then different
> > > > > > > logic must be used.
> > > > > >
> > > > > > Hmm, I think that maybe I'm misunderstanding at which level we're
> > > > > > talking here.  SortToIndices() may not be a "SQL function", but it looks
> > > > > > like an important basic block for a query engine (since, after all,
> > > > > > sorting results is an often used feature in SQL and other languages).
> > > > > > So it should be usable *inside* the expression engine, even though it's
> > > > > > not part of the exposed vocabulary, no?
> > > > >
> > > > > No, not as part of "expressions" as they are defined in the context of
> > > > > SQL engines.
> > > > >
> > > > > Sorting is generally handled by different data processing nodes from
> > > > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > > > > Projections and Filters use expressions, they do not sort.
> > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > >

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

I'd guess I'm < 24 hours away from putting up my initial PR for this
work. Since the work is large and (for all practical purposes) nearly
impossible to separate into separately merge-ready PRs, I'll start a
new e-mail thread describing what I've done in more detail and
proposing a path for merging the PR (so as to unblock PRs into
arrow/compute and avoid rebasing headaches) and addressing review
feedback that will likely take several weeks to fully accumulate.

On Mon, May 11, 2020 at 5:17 PM Wes McKinney <we...@gmail.com> wrote:
>
> I'm working actively on this but perhaps as expected it has ballooned
> into a very large project -- it's unclear at the moment whether I'll
> be able to break the work into smaller patches that are easier to
> digest. I'm working as fast as I can to have an initial
> feature-preserving PR up, but the changes to the src/arrow/compute
> directory are extensive, so I would please ask that we do not merge
> non-essential patches into cpp/src/arrow/compute until I get the PR up
> (estimated later this week at present rate) so we can see where things
> stand.
>
> On Wed, Apr 22, 2020 at 8:06 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield <em...@gmail.com> wrote:
> > >
> > > Hi Wes,
> > > I haven't had time to read the doc, but wanted to ask some questions on
> > > points raised on the thread.
> > >
> > > * For efficiency, kernels used for array-expr evaluation should write
> > > > into preallocated memory as their default mode. This enables the
> > > > interpreter to avoid temporary memory allocations and improve CPU
> > > > cache utilization. Almost none of our kernels are implemented this way
> > > > currently.
> > >
> > > Did something change, I was pretty sure I submitted a patch a while ago for
> > > boolean kernels, that separated out memory allocation from computation.
> > > Which should allow for writing to the same memory.  Is this a concern with
> > > the public Function APIs for the Kernel APIs themselves, or a lower level
> > > implementation concern?
> >
> > Yes, you did in the internal implementation [1]. The concern is the
> > public API and the general approach to implementing new kernels.
> >
> > I'm working on this right now (it's a large project so it will take me
> > a little while to produce something to be reviewed) so bear with me =)
> >
> > [1]: https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76
> >
> > > * Sorting is generally handled by different data processing nodes from
> > > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > > > Projections and Filters use expressions, they do not sort.
> > >
> > > Would sorting the list-column elements per row be an array-expr?
> >
> > Yes, as that's an element-wise function. When I said sorting I was
> > referring to ORDER BY. The functions we have that do sorting do so in
> > the context of a single array [2].
> >
> > A query engine must be able to sort a (potentially very large) stream
> > of record batches. One approach is for the Sort operator to exhaust
> > its child input, accumulating all of the record batches in memory
> > (spilling to disk as needed) and then sorting and emitting record
> > batches from the sorted records/tuples. See e.g. Impala's sorting code
> > [3] [4]
> >
> > [2]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34
> > [3]: https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h
> > [4]: https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h
> >
> > >
> > > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <an...@python.org> wrote:
> > > > >
> > > > >
> > > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
> > > > > >>
> > > > > >> That said, in the SortToIndices case, this wouldn't be a problem,
> > > > since
> > > > > >> only the second pass writes to the output.
> > > > > >
> > > > > > This kernel is not valid for normal array-exprs (see the spreadsheet I
> > > > > > linked), such as what you can write in SQL
> > > > > >
> > > > > > Kernels like SortToIndices are a different type of function (in other
> > > > > > words, "not a SQL function") and so if we choose to allow such a
> > > > > > "non-SQL-like" functions in the expression evaluator then different
> > > > > > logic must be used.
> > > > >
> > > > > Hmm, I think that maybe I'm misunderstanding at which level we're
> > > > > talking here.  SortToIndices() may not be a "SQL function", but it looks
> > > > > like an important basic block for a query engine (since, after all,
> > > > > sorting results is an often used feature in SQL and other languages).
> > > > > So it should be usable *inside* the expression engine, even though it's
> > > > > not part of the exposed vocabulary, no?
> > > >
> > > > No, not as part of "expressions" as they are defined in the context of
> > > > SQL engines.
> > > >
> > > > Sorting is generally handled by different data processing nodes from
> > > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > > > Projections and Filters use expressions, they do not sort.
> > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > >

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

I'm working actively on this but perhaps as expected it has ballooned
into a very large project -- it's unclear at the moment whether I'll
be able to break the work into smaller patches that are easier to
digest. I'm working as fast as I can to have an initial
feature-preserving PR up, but the changes to the src/arrow/compute
directory are extensive, so I would please ask that we do not merge
non-essential patches into cpp/src/arrow/compute until I get the PR up
(estimated later this week at present rate) so we can see where things
stand.

On Wed, Apr 22, 2020 at 8:06 AM Wes McKinney <we...@gmail.com> wrote:
>
> On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield <em...@gmail.com> wrote:
> >
> > Hi Wes,
> > I haven't had time to read the doc, but wanted to ask some questions on
> > points raised on the thread.
> >
> > * For efficiency, kernels used for array-expr evaluation should write
> > > into preallocated memory as their default mode. This enables the
> > > interpreter to avoid temporary memory allocations and improve CPU
> > > cache utilization. Almost none of our kernels are implemented this way
> > > currently.
> >
> > Did something change, I was pretty sure I submitted a patch a while ago for
> > boolean kernels, that separated out memory allocation from computation.
> > Which should allow for writing to the same memory.  Is this a concern with
> > the public Function APIs for the Kernel APIs themselves, or a lower level
> > implementation concern?
>
> Yes, you did in the internal implementation [1]. The concern is the
> public API and the general approach to implementing new kernels.
>
> I'm working on this right now (it's a large project so it will take me
> a little while to produce something to be reviewed) so bear with me =)
>
> [1]: https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76
>
> > * Sorting is generally handled by different data processing nodes from
> > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > > Projections and Filters use expressions, they do not sort.
> >
> > Would sorting the list-column elements per row be an array-expr?
>
> Yes, as that's an element-wise function. When I said sorting I was
> referring to ORDER BY. The functions we have that do sorting do so in
> the context of a single array [2].
>
> A query engine must be able to sort a (potentially very large) stream
> of record batches. One approach is for the Sort operator to exhaust
> its child input, accumulating all of the record batches in memory
> (spilling to disk as needed) and then sorting and emitting record
> batches from the sorted records/tuples. See e.g. Impala's sorting code
> [3] [4]
>
> [2]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34
> [3]: https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h
> [4]: https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h
>
> >
> > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <an...@python.org> wrote:
> > > >
> > > >
> > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
> > > > >>
> > > > >> That said, in the SortToIndices case, this wouldn't be a problem,
> > > since
> > > > >> only the second pass writes to the output.
> > > > >
> > > > > This kernel is not valid for normal array-exprs (see the spreadsheet I
> > > > > linked), such as what you can write in SQL
> > > > >
> > > > > Kernels like SortToIndices are a different type of function (in other
> > > > > words, "not a SQL function") and so if we choose to allow such a
> > > > > "non-SQL-like" functions in the expression evaluator then different
> > > > > logic must be used.
> > > >
> > > > Hmm, I think that maybe I'm misunderstanding at which level we're
> > > > talking here.  SortToIndices() may not be a "SQL function", but it looks
> > > > like an important basic block for a query engine (since, after all,
> > > > sorting results is an often used feature in SQL and other languages).
> > > > So it should be usable *inside* the expression engine, even though it's
> > > > not part of the exposed vocabulary, no?
> > >
> > > No, not as part of "expressions" as they are defined in the context of
> > > SQL engines.
> > >
> > > Sorting is generally handled by different data processing nodes from
> > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > > Projections and Filters use expressions, they do not sort.
> > >
> > > > Regards
> > > >
> > > > Antoine.
> > >

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield <em...@gmail.com> wrote:
>
> Hi Wes,
> I haven't had time to read the doc, but wanted to ask some questions on
> points raised on the thread.
>
> * For efficiency, kernels used for array-expr evaluation should write
> > into preallocated memory as their default mode. This enables the
> > interpreter to avoid temporary memory allocations and improve CPU
> > cache utilization. Almost none of our kernels are implemented this way
> > currently.
>
> Did something change, I was pretty sure I submitted a patch a while ago for
> boolean kernels, that separated out memory allocation from computation.
> Which should allow for writing to the same memory.  Is this a concern with
> the public Function APIs for the Kernel APIs themselves, or a lower level
> implementation concern?

Yes, you did in the internal implementation [1]. The concern is the
public API and the general approach to implementing new kernels.

I'm working on this right now (it's a large project so it will take me
a little while to produce something to be reviewed) so bear with me =)

[1]: https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76

> * Sorting is generally handled by different data processing nodes from
> > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > Projections and Filters use expressions, they do not sort.
>
> Would sorting the list-column elements per row be an array-expr?

Yes, as that's an element-wise function. When I said sorting I was
referring to ORDER BY. The functions we have that do sorting do so in
the context of a single array [2].

A query engine must be able to sort a (potentially very large) stream
of record batches. One approach is for the Sort operator to exhaust
its child input, accumulating all of the record batches in memory
(spilling to disk as needed) and then sorting and emitting record
batches from the sorted records/tuples. See e.g. Impala's sorting code
[3] [4]

[2]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34
[3]: https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h
[4]: https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h

>
> On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <we...@gmail.com> wrote:
>
> > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <an...@python.org> wrote:
> > >
> > >
> > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
> > > >>
> > > >> That said, in the SortToIndices case, this wouldn't be a problem,
> > since
> > > >> only the second pass writes to the output.
> > > >
> > > > This kernel is not valid for normal array-exprs (see the spreadsheet I
> > > > linked), such as what you can write in SQL
> > > >
> > > > Kernels like SortToIndices are a different type of function (in other
> > > > words, "not a SQL function") and so if we choose to allow such a
> > > > "non-SQL-like" functions in the expression evaluator then different
> > > > logic must be used.
> > >
> > > Hmm, I think that maybe I'm misunderstanding at which level we're
> > > talking here.  SortToIndices() may not be a "SQL function", but it looks
> > > like an important basic block for a query engine (since, after all,
> > > sorting results is an often used feature in SQL and other languages).
> > > So it should be usable *inside* the expression engine, even though it's
> > > not part of the exposed vocabulary, no?
> >
> > No, not as part of "expressions" as they are defined in the context of
> > SQL engines.
> >
> > Sorting is generally handled by different data processing nodes from
> > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > Projections and Filters use expressions, they do not sort.
> >
> > > Regards
> > >
> > > Antoine.
> >

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Micah Kornfield <em...@gmail.com>.

Hi Wes,
I haven't had time to read the doc, but wanted to ask some questions on
points raised on the thread.

* For efficiency, kernels used for array-expr evaluation should write
> into preallocated memory as their default mode. This enables the
> interpreter to avoid temporary memory allocations and improve CPU
> cache utilization. Almost none of our kernels are implemented this way
> currently.

Did something change, I was pretty sure I submitted a patch a while ago for
boolean kernels, that separated out memory allocation from computation.
Which should allow for writing to the same memory.  Is this a concern with
the public Function APIs for the Kernel APIs themselves, or a lower level
implementation concern?

* Sorting is generally handled by different data processing nodes from
> Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> Projections and Filters use expressions, they do not sort.

Would sorting the list-column elements per row be an array-expr?

On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney <we...@gmail.com> wrote:

> On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
> > >>
> > >> That said, in the SortToIndices case, this wouldn't be a problem,
> since
> > >> only the second pass writes to the output.
> > >
> > > This kernel is not valid for normal array-exprs (see the spreadsheet I
> > > linked), such as what you can write in SQL
> > >
> > > Kernels like SortToIndices are a different type of function (in other
> > > words, "not a SQL function") and so if we choose to allow such a
> > > "non-SQL-like" functions in the expression evaluator then different
> > > logic must be used.
> >
> > Hmm, I think that maybe I'm misunderstanding at which level we're
> > talking here.  SortToIndices() may not be a "SQL function", but it looks
> > like an important basic block for a query engine (since, after all,
> > sorting results is an often used feature in SQL and other languages).
> > So it should be usable *inside* the expression engine, even though it's
> > not part of the exposed vocabulary, no?
>
> No, not as part of "expressions" as they are defined in the context of
> SQL engines.
>
> Sorting is generally handled by different data processing nodes from
> Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> Projections and Filters use expressions, they do not sort.
>
> > Regards
> >
> > Antoine.
>

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Le 21/04/2020 à 13:53, Wes McKinney a écrit :
> >>
> >> That said, in the SortToIndices case, this wouldn't be a problem, since
> >> only the second pass writes to the output.
> >
> > This kernel is not valid for normal array-exprs (see the spreadsheet I
> > linked), such as what you can write in SQL
> >
> > Kernels like SortToIndices are a different type of function (in other
> > words, "not a SQL function") and so if we choose to allow such a
> > "non-SQL-like" functions in the expression evaluator then different
> > logic must be used.
>
> Hmm, I think that maybe I'm misunderstanding at which level we're
> talking here.  SortToIndices() may not be a "SQL function", but it looks
> like an important basic block for a query engine (since, after all,
> sorting results is an often used feature in SQL and other languages).
> So it should be usable *inside* the expression engine, even though it's
> not part of the exposed vocabulary, no?

No, not as part of "expressions" as they are defined in the context of
SQL engines.

Sorting is generally handled by different data processing nodes from
Projections, Aggregations / Hash Aggregations, Filters, and Joins.
Projections and Filters use expressions, they do not sort.

> Regards
>
> Antoine.

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Antoine Pitrou <an...@python.org>.

Le 21/04/2020 à 13:53, Wes McKinney a écrit :
>>
>> That said, in the SortToIndices case, this wouldn't be a problem, since
>> only the second pass writes to the output.
> 
> This kernel is not valid for normal array-exprs (see the spreadsheet I
> linked), such as what you can write in SQL
> 
> Kernels like SortToIndices are a different type of function (in other
> words, "not a SQL function") and so if we choose to allow such a
> "non-SQL-like" functions in the expression evaluator then different
> logic must be used.

Hmm, I think that maybe I'm misunderstanding at which level we're
talking here.  SortToIndices() may not be a "SQL function", but it looks
like an important basic block for a query engine (since, after all,
sorting results is an often used feature in SQL and other languages).
So it should be usable *inside* the expression engine, even though it's
not part of the exposed vocabulary, no?

Regards

Antoine.

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Wes McKinney <we...@gmail.com>.

hi Antoine,

On Tue, Apr 21, 2020 at 4:54 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Le 21/04/2020 à 11:13, Antoine Pitrou a écrit :
> >

> It would be interesting to know how costly repeated
> allocation/deallocation is.  Modern allocators like jemalloc do their
> own caching instead of always returning memory to the system.  We could
> also have our own caching layer.

I'm sure that we can do our own experiments. The more significant
issue based on my understanding of the academic research is actually
memory bandwidth relating to CPU memory hierarchies. This is discussed
at length in the X100 paper so I'll direct you to the research rather
than debate about it here. Since that paper was written in 2005, you
might also look for papers in the last 15 years that have this paper
as a reference to see if there has been follow-on research as CPU
architectures have evolved.

> > This assumes that all these kernels can safely write into one of their
> > inputs.  This should be true for trivial ones, but not if e.g. a kernel
> > makes two passes over its input.  For example, the SortToIndices kernel
> > first scans the input for min and max values, and then switches on two
> > different sorting algorithms depending on those statistics (using a O(n)
> > counting sort if the values are in a small enough range).
>
> That said, in the SortToIndices case, this wouldn't be a problem, since
> only the second pass writes to the output.

This kernel is not valid for normal array-exprs (see the spreadsheet I
linked), such as what you can write in SQL

Kernels like SortToIndices are a different type of function (in other
words, "not a SQL function") and so if we choose to allow such a
"non-SQL-like" functions in the expression evaluator then different
logic must be used.

> Regards
>
> Antoine.

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Antoine Pitrou <an...@python.org>.

Le 21/04/2020 à 11:13, Antoine Pitrou a écrit :
> 
> This assumes that all these kernels can safely write into one of their
> inputs.  This should be true for trivial ones, but not if e.g. a kernel
> makes two passes over its input.  For example, the SortToIndices kernel
> first scans the input for min and max values, and then switches on two
> different sorting algorithms depending on those statistics (using a O(n)
> counting sort if the values are in a small enough range).

That said, in the SortToIndices case, this wouldn't be a problem, since
only the second pass writes to the output.

Regards

Antoine.

Re: [C++] Revamping approach to Arrow compute kernel development

Posted by Antoine Pitrou <an...@python.org>.

Hi Wes,

Le 18/04/2020 à 23:41, Wes McKinney a écrit :
> 
> There are some problems with our current collection of kernels in the
> context of array-expr evaluation in query processing:
> 
> * For efficiency, kernels used for array-expr evaluation should write
> into preallocated memory as their default mode. This enables the
> interpreter to avoid temporary memory allocations and improve CPU
> cache utilization. Almost none of our kernels are implemented this way
> currently.
> * The current approach for expr-type kernels of having a top-level
> memory-allocating function is not scalable for binding developers. I
> believe instead that kernels should be selected and invoked
> generically by using the string name of the kernel
> 
> On this last point, what I am suggesting is that we do something more like
> 
> ASSIGN_OR_RAISE(auto kernel, compute::GetKernel("greater", {type0, type1}));
> ArrayData* out = ... ;
> RETURN_NOT_OK(kernel->Call({arg0, arg1}, &out));

Sounds good to me.

> In particular, when we reason that successive kernel invocations can
> reuse memory, we can have code that is doing in essence
> 
> k1->Call({arg0, arg1}, &out)
> k2->Call({out}, &out))
> k3->Call({arg2, out}, &out)

This assumes that all these kernels can safely write into one of their
inputs.  This should be true for trivial ones, but not if e.g. a kernel
makes two passes over its input.  For example, the SortToIndices kernel
first scans the input for min and max values, and then switches on two
different sorting algorithms depending on those statistics (using a O(n)
counting sort if the values are in a small enough range).

(I'm also not sure how C++ handles this: if you have a `const T*` input
and a `T*` output, does C++ reason that the two pointers can point to
the same memory?)

It would be interesting to know how costly repeated
allocation/deallocation is.  Modern allocators like jemalloc do their
own caching instead of always returning memory to the system.  We could
also have our own caching layer.

Regards

Antoine.