You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Weston Pace <we...@gmail.com> on 2022/06/01 22:02:38 UTC

[C++] Kernel function registry evolution

We've had some evidence for a while now that the kernel functions
suffer from an overhead problem that prevents us from effectively
utilizing cache.  The latest and greatest evidence of this might be
[1].  A number of people have made some very interesting suggestions
that I think could really cut down on the overhead (e.g. preallocated
buffers).  However, whenever we start a discussion on implementation
it ends up getting bogged down because there is a lot of existing code
here and a massive refactor would be too difficult.

I'd like to propose we add a second kernel function registry.  There
doesn't need to be any user facing API change.  We could probably use
an approach like [2] to proxy to the old function registry when the
newer registry doesn't contain the asked-for function.  This would
allow us to focus on creating an efficient function registry without
having to worry about refactoring the existing kernels all at once.

Once we are happy with the new registry we can start to migrate the
existing kernel functions over to the new registry.  I don't expect
there will need to be a lot of change to the existing kernel functions
but whatever change is there can be done incrementally.

Does this seem like a good approach?  Am I missing something or does
anyone know of a better way to fix the existing implementation?

The main risk I can see is that we don't end up completing the
migration and end up maintaining two registries forever.  However, we
have enough interested people here at Voltron Data that I feel
confident we can get this pushed through.

[1] https://github.com/apache/arrow/pull/13179
[2] https://github.com/apache/arrow/pull/13252

Re: [C++] Kernel function registry evolution

Posted by Wes McKinney <we...@gmail.com>.

Right, the situations where array and scalar inputs are mixed (e.g.
Array + Scalar -> Array) are unaffected. The change will make more
sense when you see in the PR how much code I've been able to delete.

On Wed, Jun 29, 2022 at 1:03 PM Weston Pace <we...@gmail.com> wrote:
>
> This is only for the situation where ALL inputs and outputs are
> scalar.  Scalars, at the kernel level, do not have length.  So in this
> case there is nothing to repeat.  It does build a buffer, but just
> with a single value, so it is all O(1).
>
> On Wed, Jun 29, 2022 at 9:49 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Does boxing a scalar into an array actually build a buffer with the
> > repeated value, or is it more efficient than that?
> >
> >
> > Le 29/06/2022 à 17:57, Wes McKinney a écrit :
> > > I'm working on my next PR which addresses the "scalar output modality"
> > > and removes usages of ValueDescr and output shapes in general from the
> > > kernel execution machinery. The ability to provide all scalar input
> > > and scalar output of course is preserved — it's just that the "boxing"
> > > and "unboxing" of the inputs and outputs is happening at a higher
> > > level (i.e. there is only one place that boxes scalars into arrays and
> > > unboxes the output into a scalar) rather than being implemented on a
> > > piecemeal basis in the kernel implementations.
> > >
> > > Please keep an eye out for any non-trivial PRs that affect
> > > arrow/compute/kernels since it could cause some rebasing headache. I
> > > am working as fast as I can to complete this next PR.
> > >
> > > After this, the next thing is porting the aggregate kernels to use
> > > ExecSpan. Then I think the mass refactoring will cool off and we can
> > > address follow-on improvements like rewriting expression evaluation to
> > > utilize the span data structures to yield performance gains.
> > >
> > > On Mon, Jun 13, 2022 at 12:37 PM Wes McKinney <we...@gmail.com> wrote:
> > >>
> > >> I merged the PR a little while ago — thanks for David, Sasha for
> > >> helping review. If you have more comments or things you would like to
> > >> fix from that PR, please comment there and I can help address them in
> > >> follow up PRs.
> > >>
> > >> I'm going to work next on migrating the rest of the kernels to use
> > >> ExecSpan (and thus ExecBatchIterator can be removed in favor of
> > >> ExecSpanIterator — I'll add benchmarks so we can see the performance
> > >> difference), and then my next priority will be removing the ValueDescr
> > >> concept and the ValueDescr::SCALAR output path for the
> > >> scalar/elementwise kernels which will help us delete a lot of code
> > >>
> > >> I'll attach related Jiras to this umbrella issue:
> > >>
> > >> https://issues.apache.org/jira/browse/ARROW-16755
> > >>
> > >> On Fri, Jun 10, 2022 at 12:56 PM Wes McKinney <we...@gmail.com> wrote:
> > >>>
> > >>> PR is up: https://github.com/apache/arrow/pull/13364
> > >>>
> > >>> Look forward to getting this in since there's a bunch of follow on
> > >>> work that I'd like to get started on ASAP!
> > >>>
> > >>> On Thu, Jun 9, 2022 at 7:34 AM Wes McKinney <we...@gmail.com> wrote:
> > >>>>
> > >>>> I'm making good progress getting my branch PR-ready -- working through
> > >>>> the compute-scalar-test suite and fixing the little things I broke. I
> > >>>> hope I'll have it done by the end of the week.
> > >>>>
> > >>>> On Mon, Jun 6, 2022 at 3:21 PM Wes McKinney <we...@gmail.com> wrote:
> > >>>>>
> > >>>>> I created https://issues.apache.org/jira/browse/ARROW-16755 as an
> > >>>>> umbrella issue to track improvements to reduce overhead in the
> > >>>>> expression and kernel execution machinery. Please help by attaching
> > >>>>> related issues and creating new issues for specific individual efforts
> > >>>>> here. I'll work as quickly as I can to have my initial patch
> > >>>>> ARROW-16756 ready which will unblock the next few projects here
> > >>>>>
> > >>>>> On Mon, Jun 6, 2022 at 10:35 AM Wes McKinney <we...@gmail.com> wrote:
> > >>>>>>
> > >>>>>>   This is definitely only the first stage of cleanup and streamlining —
> > >>>>>> I anticipate multiple rounds of refactoring (maybe not as invasive and
> > >>>>>> painful as this one), and this patch I'm not sure will do a lot to
> > >>>>>> alleviate bottom line expression evaluation overhead but it creates
> > >>>>>> the environment (i.e. where a whole chain of scalar functions that all
> > >>>>>> write into preallocated memory can execute without having to touch
> > >>>>>> shared_ptrs or deal with other objects with excess microperformance
> > >>>>>> overhead) where such optimization can happen more easily.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou <an...@python.org> wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
> > >>>>>>>> Wow that's a lot of progress!
> > >>>>>>>> Definitely agree on the scalar outputs point.
> > >>>>>>>>
> > >>>>>>>> One point about the ArraySpan - why does it need to know its data type?
> > >>>>>>>> Once a kernel has been resolved by the registry, the kernel will only know
> > >>>>>>>> how to execute on the specific type it was resolved for, right?
> > >>>>>>>
> > >>>>>>> Because of parametric types for example (e.g. timestamps with a unit and
> > >>>>>>> timezone, or decimals with a precision and scale).
> > >>>>>>>
> > >>>>>>> Regards
> > >>>>>>>
> > >>>>>>> Antoine.

Re: [C++] Kernel function registry evolution

Posted by Weston Pace <we...@gmail.com>.

This is only for the situation where ALL inputs and outputs are
scalar.  Scalars, at the kernel level, do not have length.  So in this
case there is nothing to repeat.  It does build a buffer, but just
with a single value, so it is all O(1).

On Wed, Jun 29, 2022 at 9:49 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Does boxing a scalar into an array actually build a buffer with the
> repeated value, or is it more efficient than that?
>
>
> Le 29/06/2022 à 17:57, Wes McKinney a écrit :
> > I'm working on my next PR which addresses the "scalar output modality"
> > and removes usages of ValueDescr and output shapes in general from the
> > kernel execution machinery. The ability to provide all scalar input
> > and scalar output of course is preserved — it's just that the "boxing"
> > and "unboxing" of the inputs and outputs is happening at a higher
> > level (i.e. there is only one place that boxes scalars into arrays and
> > unboxes the output into a scalar) rather than being implemented on a
> > piecemeal basis in the kernel implementations.
> >
> > Please keep an eye out for any non-trivial PRs that affect
> > arrow/compute/kernels since it could cause some rebasing headache. I
> > am working as fast as I can to complete this next PR.
> >
> > After this, the next thing is porting the aggregate kernels to use
> > ExecSpan. Then I think the mass refactoring will cool off and we can
> > address follow-on improvements like rewriting expression evaluation to
> > utilize the span data structures to yield performance gains.
> >
> > On Mon, Jun 13, 2022 at 12:37 PM Wes McKinney <we...@gmail.com> wrote:
> >>
> >> I merged the PR a little while ago — thanks for David, Sasha for
> >> helping review. If you have more comments or things you would like to
> >> fix from that PR, please comment there and I can help address them in
> >> follow up PRs.
> >>
> >> I'm going to work next on migrating the rest of the kernels to use
> >> ExecSpan (and thus ExecBatchIterator can be removed in favor of
> >> ExecSpanIterator — I'll add benchmarks so we can see the performance
> >> difference), and then my next priority will be removing the ValueDescr
> >> concept and the ValueDescr::SCALAR output path for the
> >> scalar/elementwise kernels which will help us delete a lot of code
> >>
> >> I'll attach related Jiras to this umbrella issue:
> >>
> >> https://issues.apache.org/jira/browse/ARROW-16755
> >>
> >> On Fri, Jun 10, 2022 at 12:56 PM Wes McKinney <we...@gmail.com> wrote:
> >>>
> >>> PR is up: https://github.com/apache/arrow/pull/13364
> >>>
> >>> Look forward to getting this in since there's a bunch of follow on
> >>> work that I'd like to get started on ASAP!
> >>>
> >>> On Thu, Jun 9, 2022 at 7:34 AM Wes McKinney <we...@gmail.com> wrote:
> >>>>
> >>>> I'm making good progress getting my branch PR-ready -- working through
> >>>> the compute-scalar-test suite and fixing the little things I broke. I
> >>>> hope I'll have it done by the end of the week.
> >>>>
> >>>> On Mon, Jun 6, 2022 at 3:21 PM Wes McKinney <we...@gmail.com> wrote:
> >>>>>
> >>>>> I created https://issues.apache.org/jira/browse/ARROW-16755 as an
> >>>>> umbrella issue to track improvements to reduce overhead in the
> >>>>> expression and kernel execution machinery. Please help by attaching
> >>>>> related issues and creating new issues for specific individual efforts
> >>>>> here. I'll work as quickly as I can to have my initial patch
> >>>>> ARROW-16756 ready which will unblock the next few projects here
> >>>>>
> >>>>> On Mon, Jun 6, 2022 at 10:35 AM Wes McKinney <we...@gmail.com> wrote:
> >>>>>>
> >>>>>>   This is definitely only the first stage of cleanup and streamlining —
> >>>>>> I anticipate multiple rounds of refactoring (maybe not as invasive and
> >>>>>> painful as this one), and this patch I'm not sure will do a lot to
> >>>>>> alleviate bottom line expression evaluation overhead but it creates
> >>>>>> the environment (i.e. where a whole chain of scalar functions that all
> >>>>>> write into preallocated memory can execute without having to touch
> >>>>>> shared_ptrs or deal with other objects with excess microperformance
> >>>>>> overhead) where such optimization can happen more easily.
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou <an...@python.org> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
> >>>>>>>> Wow that's a lot of progress!
> >>>>>>>> Definitely agree on the scalar outputs point.
> >>>>>>>>
> >>>>>>>> One point about the ArraySpan - why does it need to know its data type?
> >>>>>>>> Once a kernel has been resolved by the registry, the kernel will only know
> >>>>>>>> how to execute on the specific type it was resolved for, right?
> >>>>>>>
> >>>>>>> Because of parametric types for example (e.g. timestamps with a unit and
> >>>>>>> timezone, or decimals with a precision and scale).
> >>>>>>>
> >>>>>>> Regards
> >>>>>>>
> >>>>>>> Antoine.

Re: [C++] Kernel function registry evolution

Posted by Antoine Pitrou <an...@python.org>.

Does boxing a scalar into an array actually build a buffer with the 
repeated value, or is it more efficient than that?


Le 29/06/2022 à 17:57, Wes McKinney a écrit :
> I'm working on my next PR which addresses the "scalar output modality"
> and removes usages of ValueDescr and output shapes in general from the
> kernel execution machinery. The ability to provide all scalar input
> and scalar output of course is preserved — it's just that the "boxing"
> and "unboxing" of the inputs and outputs is happening at a higher
> level (i.e. there is only one place that boxes scalars into arrays and
> unboxes the output into a scalar) rather than being implemented on a
> piecemeal basis in the kernel implementations.
> 
> Please keep an eye out for any non-trivial PRs that affect
> arrow/compute/kernels since it could cause some rebasing headache. I
> am working as fast as I can to complete this next PR.
> 
> After this, the next thing is porting the aggregate kernels to use
> ExecSpan. Then I think the mass refactoring will cool off and we can
> address follow-on improvements like rewriting expression evaluation to
> utilize the span data structures to yield performance gains.
> 
> On Mon, Jun 13, 2022 at 12:37 PM Wes McKinney <we...@gmail.com> wrote:
>>
>> I merged the PR a little while ago — thanks for David, Sasha for
>> helping review. If you have more comments or things you would like to
>> fix from that PR, please comment there and I can help address them in
>> follow up PRs.
>>
>> I'm going to work next on migrating the rest of the kernels to use
>> ExecSpan (and thus ExecBatchIterator can be removed in favor of
>> ExecSpanIterator — I'll add benchmarks so we can see the performance
>> difference), and then my next priority will be removing the ValueDescr
>> concept and the ValueDescr::SCALAR output path for the
>> scalar/elementwise kernels which will help us delete a lot of code
>>
>> I'll attach related Jiras to this umbrella issue:
>>
>> https://issues.apache.org/jira/browse/ARROW-16755
>>
>> On Fri, Jun 10, 2022 at 12:56 PM Wes McKinney <we...@gmail.com> wrote:
>>>
>>> PR is up: https://github.com/apache/arrow/pull/13364
>>>
>>> Look forward to getting this in since there's a bunch of follow on
>>> work that I'd like to get started on ASAP!
>>>
>>> On Thu, Jun 9, 2022 at 7:34 AM Wes McKinney <we...@gmail.com> wrote:
>>>>
>>>> I'm making good progress getting my branch PR-ready -- working through
>>>> the compute-scalar-test suite and fixing the little things I broke. I
>>>> hope I'll have it done by the end of the week.
>>>>
>>>> On Mon, Jun 6, 2022 at 3:21 PM Wes McKinney <we...@gmail.com> wrote:
>>>>>
>>>>> I created https://issues.apache.org/jira/browse/ARROW-16755 as an
>>>>> umbrella issue to track improvements to reduce overhead in the
>>>>> expression and kernel execution machinery. Please help by attaching
>>>>> related issues and creating new issues for specific individual efforts
>>>>> here. I'll work as quickly as I can to have my initial patch
>>>>> ARROW-16756 ready which will unblock the next few projects here
>>>>>
>>>>> On Mon, Jun 6, 2022 at 10:35 AM Wes McKinney <we...@gmail.com> wrote:
>>>>>>
>>>>>>   This is definitely only the first stage of cleanup and streamlining —
>>>>>> I anticipate multiple rounds of refactoring (maybe not as invasive and
>>>>>> painful as this one), and this patch I'm not sure will do a lot to
>>>>>> alleviate bottom line expression evaluation overhead but it creates
>>>>>> the environment (i.e. where a whole chain of scalar functions that all
>>>>>> write into preallocated memory can execute without having to touch
>>>>>> shared_ptrs or deal with other objects with excess microperformance
>>>>>> overhead) where such optimization can happen more easily.
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou <an...@python.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
>>>>>>>> Wow that's a lot of progress!
>>>>>>>> Definitely agree on the scalar outputs point.
>>>>>>>>
>>>>>>>> One point about the ArraySpan - why does it need to know its data type?
>>>>>>>> Once a kernel has been resolved by the registry, the kernel will only know
>>>>>>>> how to execute on the specific type it was resolved for, right?
>>>>>>>
>>>>>>> Because of parametric types for example (e.g. timestamps with a unit and
>>>>>>> timezone, or decimals with a precision and scale).
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Antoine.

Re: [C++] Kernel function registry evolution

Posted by Wes McKinney <we...@gmail.com>.

I'm working on my next PR which addresses the "scalar output modality"
and removes usages of ValueDescr and output shapes in general from the
kernel execution machinery. The ability to provide all scalar input
and scalar output of course is preserved — it's just that the "boxing"
and "unboxing" of the inputs and outputs is happening at a higher
level (i.e. there is only one place that boxes scalars into arrays and
unboxes the output into a scalar) rather than being implemented on a
piecemeal basis in the kernel implementations.

Please keep an eye out for any non-trivial PRs that affect
arrow/compute/kernels since it could cause some rebasing headache. I
am working as fast as I can to complete this next PR.

After this, the next thing is porting the aggregate kernels to use
ExecSpan. Then I think the mass refactoring will cool off and we can
address follow-on improvements like rewriting expression evaluation to
utilize the span data structures to yield performance gains.

On Mon, Jun 13, 2022 at 12:37 PM Wes McKinney <we...@gmail.com> wrote:
>
> I merged the PR a little while ago — thanks for David, Sasha for
> helping review. If you have more comments or things you would like to
> fix from that PR, please comment there and I can help address them in
> follow up PRs.
>
> I'm going to work next on migrating the rest of the kernels to use
> ExecSpan (and thus ExecBatchIterator can be removed in favor of
> ExecSpanIterator — I'll add benchmarks so we can see the performance
> difference), and then my next priority will be removing the ValueDescr
> concept and the ValueDescr::SCALAR output path for the
> scalar/elementwise kernels which will help us delete a lot of code
>
> I'll attach related Jiras to this umbrella issue:
>
> https://issues.apache.org/jira/browse/ARROW-16755
>
> On Fri, Jun 10, 2022 at 12:56 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > PR is up: https://github.com/apache/arrow/pull/13364
> >
> > Look forward to getting this in since there's a bunch of follow on
> > work that I'd like to get started on ASAP!
> >
> > On Thu, Jun 9, 2022 at 7:34 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > I'm making good progress getting my branch PR-ready -- working through
> > > the compute-scalar-test suite and fixing the little things I broke. I
> > > hope I'll have it done by the end of the week.
> > >
> > > On Mon, Jun 6, 2022 at 3:21 PM Wes McKinney <we...@gmail.com> wrote:
> > > >
> > > > I created https://issues.apache.org/jira/browse/ARROW-16755 as an
> > > > umbrella issue to track improvements to reduce overhead in the
> > > > expression and kernel execution machinery. Please help by attaching
> > > > related issues and creating new issues for specific individual efforts
> > > > here. I'll work as quickly as I can to have my initial patch
> > > > ARROW-16756 ready which will unblock the next few projects here
> > > >
> > > > On Mon, Jun 6, 2022 at 10:35 AM Wes McKinney <we...@gmail.com> wrote:
> > > > >
> > > > >  This is definitely only the first stage of cleanup and streamlining —
> > > > > I anticipate multiple rounds of refactoring (maybe not as invasive and
> > > > > painful as this one), and this patch I'm not sure will do a lot to
> > > > > alleviate bottom line expression evaluation overhead but it creates
> > > > > the environment (i.e. where a whole chain of scalar functions that all
> > > > > write into preallocated memory can execute without having to touch
> > > > > shared_ptrs or deal with other objects with excess microperformance
> > > > > overhead) where such optimization can happen more easily.
> > > > >
> > > > >
> > > > > On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou <an...@python.org> wrote:
> > > > > >
> > > > > >
> > > > > > Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
> > > > > > > Wow that's a lot of progress!
> > > > > > > Definitely agree on the scalar outputs point.
> > > > > > >
> > > > > > > One point about the ArraySpan - why does it need to know its data type?
> > > > > > > Once a kernel has been resolved by the registry, the kernel will only know
> > > > > > > how to execute on the specific type it was resolved for, right?
> > > > > >
> > > > > > Because of parametric types for example (e.g. timestamps with a unit and
> > > > > > timezone, or decimals with a precision and scale).
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.

Re: [C++] Kernel function registry evolution

Posted by Wes McKinney <we...@gmail.com>.

I merged the PR a little while ago — thanks for David, Sasha for
helping review. If you have more comments or things you would like to
fix from that PR, please comment there and I can help address them in
follow up PRs.

I'm going to work next on migrating the rest of the kernels to use
ExecSpan (and thus ExecBatchIterator can be removed in favor of
ExecSpanIterator — I'll add benchmarks so we can see the performance
difference), and then my next priority will be removing the ValueDescr
concept and the ValueDescr::SCALAR output path for the
scalar/elementwise kernels which will help us delete a lot of code

I'll attach related Jiras to this umbrella issue:

https://issues.apache.org/jira/browse/ARROW-16755

On Fri, Jun 10, 2022 at 12:56 PM Wes McKinney <we...@gmail.com> wrote:
>
> PR is up: https://github.com/apache/arrow/pull/13364
>
> Look forward to getting this in since there's a bunch of follow on
> work that I'd like to get started on ASAP!
>
> On Thu, Jun 9, 2022 at 7:34 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > I'm making good progress getting my branch PR-ready -- working through
> > the compute-scalar-test suite and fixing the little things I broke. I
> > hope I'll have it done by the end of the week.
> >
> > On Mon, Jun 6, 2022 at 3:21 PM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > I created https://issues.apache.org/jira/browse/ARROW-16755 as an
> > > umbrella issue to track improvements to reduce overhead in the
> > > expression and kernel execution machinery. Please help by attaching
> > > related issues and creating new issues for specific individual efforts
> > > here. I'll work as quickly as I can to have my initial patch
> > > ARROW-16756 ready which will unblock the next few projects here
> > >
> > > On Mon, Jun 6, 2022 at 10:35 AM Wes McKinney <we...@gmail.com> wrote:
> > > >
> > > >  This is definitely only the first stage of cleanup and streamlining —
> > > > I anticipate multiple rounds of refactoring (maybe not as invasive and
> > > > painful as this one), and this patch I'm not sure will do a lot to
> > > > alleviate bottom line expression evaluation overhead but it creates
> > > > the environment (i.e. where a whole chain of scalar functions that all
> > > > write into preallocated memory can execute without having to touch
> > > > shared_ptrs or deal with other objects with excess microperformance
> > > > overhead) where such optimization can happen more easily.
> > > >
> > > >
> > > > On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou <an...@python.org> wrote:
> > > > >
> > > > >
> > > > > Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
> > > > > > Wow that's a lot of progress!
> > > > > > Definitely agree on the scalar outputs point.
> > > > > >
> > > > > > One point about the ArraySpan - why does it need to know its data type?
> > > > > > Once a kernel has been resolved by the registry, the kernel will only know
> > > > > > how to execute on the specific type it was resolved for, right?
> > > > >
> > > > > Because of parametric types for example (e.g. timestamps with a unit and
> > > > > timezone, or decimals with a precision and scale).
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.

Re: [C++] Kernel function registry evolution

Posted by Wes McKinney <we...@gmail.com>.

PR is up: https://github.com/apache/arrow/pull/13364

Look forward to getting this in since there's a bunch of follow on
work that I'd like to get started on ASAP!

On Thu, Jun 9, 2022 at 7:34 AM Wes McKinney <we...@gmail.com> wrote:
>
> I'm making good progress getting my branch PR-ready -- working through
> the compute-scalar-test suite and fixing the little things I broke. I
> hope I'll have it done by the end of the week.
>
> On Mon, Jun 6, 2022 at 3:21 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > I created https://issues.apache.org/jira/browse/ARROW-16755 as an
> > umbrella issue to track improvements to reduce overhead in the
> > expression and kernel execution machinery. Please help by attaching
> > related issues and creating new issues for specific individual efforts
> > here. I'll work as quickly as I can to have my initial patch
> > ARROW-16756 ready which will unblock the next few projects here
> >
> > On Mon, Jun 6, 2022 at 10:35 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > >  This is definitely only the first stage of cleanup and streamlining —
> > > I anticipate multiple rounds of refactoring (maybe not as invasive and
> > > painful as this one), and this patch I'm not sure will do a lot to
> > > alleviate bottom line expression evaluation overhead but it creates
> > > the environment (i.e. where a whole chain of scalar functions that all
> > > write into preallocated memory can execute without having to touch
> > > shared_ptrs or deal with other objects with excess microperformance
> > > overhead) where such optimization can happen more easily.
> > >
> > >
> > > On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou <an...@python.org> wrote:
> > > >
> > > >
> > > > Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
> > > > > Wow that's a lot of progress!
> > > > > Definitely agree on the scalar outputs point.
> > > > >
> > > > > One point about the ArraySpan - why does it need to know its data type?
> > > > > Once a kernel has been resolved by the registry, the kernel will only know
> > > > > how to execute on the specific type it was resolved for, right?
> > > >
> > > > Because of parametric types for example (e.g. timestamps with a unit and
> > > > timezone, or decimals with a precision and scale).
> > > >
> > > > Regards
> > > >
> > > > Antoine.

Re: [C++] Kernel function registry evolution

Posted by Wes McKinney <we...@gmail.com>.

I'm making good progress getting my branch PR-ready -- working through
the compute-scalar-test suite and fixing the little things I broke. I
hope I'll have it done by the end of the week.

On Mon, Jun 6, 2022 at 3:21 PM Wes McKinney <we...@gmail.com> wrote:
>
> I created https://issues.apache.org/jira/browse/ARROW-16755 as an
> umbrella issue to track improvements to reduce overhead in the
> expression and kernel execution machinery. Please help by attaching
> related issues and creating new issues for specific individual efforts
> here. I'll work as quickly as I can to have my initial patch
> ARROW-16756 ready which will unblock the next few projects here
>
> On Mon, Jun 6, 2022 at 10:35 AM Wes McKinney <we...@gmail.com> wrote:
> >
> >  This is definitely only the first stage of cleanup and streamlining —
> > I anticipate multiple rounds of refactoring (maybe not as invasive and
> > painful as this one), and this patch I'm not sure will do a lot to
> > alleviate bottom line expression evaluation overhead but it creates
> > the environment (i.e. where a whole chain of scalar functions that all
> > write into preallocated memory can execute without having to touch
> > shared_ptrs or deal with other objects with excess microperformance
> > overhead) where such optimization can happen more easily.
> >
> >
> > On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou <an...@python.org> wrote:
> > >
> > >
> > > Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
> > > > Wow that's a lot of progress!
> > > > Definitely agree on the scalar outputs point.
> > > >
> > > > One point about the ArraySpan - why does it need to know its data type?
> > > > Once a kernel has been resolved by the registry, the kernel will only know
> > > > how to execute on the specific type it was resolved for, right?
> > >
> > > Because of parametric types for example (e.g. timestamps with a unit and
> > > timezone, or decimals with a precision and scale).
> > >
> > > Regards
> > >
> > > Antoine.

Re: [C++] Kernel function registry evolution

Posted by Wes McKinney <we...@gmail.com>.

I created https://issues.apache.org/jira/browse/ARROW-16755 as an
umbrella issue to track improvements to reduce overhead in the
expression and kernel execution machinery. Please help by attaching
related issues and creating new issues for specific individual efforts
here. I'll work as quickly as I can to have my initial patch
ARROW-16756 ready which will unblock the next few projects here

On Mon, Jun 6, 2022 at 10:35 AM Wes McKinney <we...@gmail.com> wrote:
>
>  This is definitely only the first stage of cleanup and streamlining —
> I anticipate multiple rounds of refactoring (maybe not as invasive and
> painful as this one), and this patch I'm not sure will do a lot to
> alleviate bottom line expression evaluation overhead but it creates
> the environment (i.e. where a whole chain of scalar functions that all
> write into preallocated memory can execute without having to touch
> shared_ptrs or deal with other objects with excess microperformance
> overhead) where such optimization can happen more easily.
>
>
> On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
> > > Wow that's a lot of progress!
> > > Definitely agree on the scalar outputs point.
> > >
> > > One point about the ArraySpan - why does it need to know its data type?
> > > Once a kernel has been resolved by the registry, the kernel will only know
> > > how to execute on the specific type it was resolved for, right?
> >
> > Because of parametric types for example (e.g. timestamps with a unit and
> > timezone, or decimals with a precision and scale).
> >
> > Regards
> >
> > Antoine.

Re: [C++] Kernel function registry evolution

Posted by Wes McKinney <we...@gmail.com>.

 This is definitely only the first stage of cleanup and streamlining —
I anticipate multiple rounds of refactoring (maybe not as invasive and
painful as this one), and this patch I'm not sure will do a lot to
alleviate bottom line expression evaluation overhead but it creates
the environment (i.e. where a whole chain of scalar functions that all
write into preallocated memory can execute without having to touch
shared_ptrs or deal with other objects with excess microperformance
overhead) where such optimization can happen more easily.

On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
> > Wow that's a lot of progress!
> > Definitely agree on the scalar outputs point.
> >
> > One point about the ArraySpan - why does it need to know its data type?
> > Once a kernel has been resolved by the registry, the kernel will only know
> > how to execute on the specific type it was resolved for, right?
>
> Because of parametric types for example (e.g. timestamps with a unit and
> timezone, or decimals with a precision and scale).
>
> Regards
>
> Antoine.

Re: [C++] Kernel function registry evolution

Posted by Antoine Pitrou <an...@python.org>.

Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit :
> Wow that's a lot of progress!
> Definitely agree on the scalar outputs point.
> 
> One point about the ArraySpan - why does it need to know its data type?
> Once a kernel has been resolved by the registry, the kernel will only know
> how to execute on the specific type it was resolved for, right?

Because of parametric types for example (e.g. timestamps with a unit and 
timezone, or decimals with a precision and scale).

Regards

Antoine.

Re: [C++] Kernel function registry evolution

Posted by Sasha Krassovsky <kr...@gmail.com>.

Wow that's a lot of progress!
Definitely agree on the scalar outputs point.

One point about the ArraySpan - why does it need to know its data type?
Once a kernel has been resolved by the registry, the kernel will only know
how to execute on the specific type it was resolved for, right? If so we
can just pass BufferRef to kernels and they'll interpret the bytes
properly. Otherwise I'm excited to see how performance changes as a result
of this change - fewer shared_ptr copies is definitely a good thing. As I
was profiling I did notice a few other non-shared_ptr related overheads, I
can dig deeper into those as well.

Sasha

On Sun, Jun 5, 2022 at 7:15 PM Wes McKinney <we...@gmail.com> wrote:

> I've made good progress on this the last couple days and I'm at the
> point of writing the new ScalarExecutor that uses the new
> ArraySpan/ExecSpan concepts,  having done the initial "porting" of the
> scalar kernels to use the new data structures. I'm tired so I'm
> stopping for today, but I'm going to try the next couple days to get
> the codebase compiling again and the unit tests passing, which will
> hopefully go smoothly since we have lots of tests but we will see how
> it goes.
>
> The refactoring was extremely gnarly (especially scalar_if_else.cc)
> and I have a growing list of notes about how we can improve the array
> kernels codebase. In particular, we should definitely eliminate
> "scalar" kernel outputs. This only gets triggered when all inputs are
> scalars, but in that case we should instead promote all inputs to
> arrays of length 1 (some kernels already do this!) and use the array
> code paths. This would allow us to delete a ton of tedious code and
> simplify our maintenance burden — there is no performance critical
> need for all-scalar function evaluation, so the performance hit of
> up-promoting scalars to arrays-of-length-1 is totally worth it for the
> ability to delete a ton of code.
>
> There are some critical unanswered questions around how we will handle
> the dichotomy between kernels that allocate memory and those that
> don't — I am hopeful that in subsequent PRs we can establish a scheme
> wherein we can use the ArraySpan data structure even for non-primitive
> outputs, but some kind of allocated-buffer bookkeeping concept in the
> KernelContext so that the executor layer can know when it needs to
> destroy a buffer that is no longer needed, rather than relying on the
> ArrayData destructor to release memory.
>
> My branch is here, but I would say it's a couple days away from being
> review-ready:
>
>
> https://github.com/apache/arrow/compare/master...wesm:lightweight-exec-batch
>
> I'll post a PR when I have something closer to a green build. We
> probably won't want to let this PR linger since it will cause
> conflicts with any PRs touching scalar kernels.
>
>
> On Fri, Jun 3, 2022 at 1:59 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > Thanks Sasha — this is helpful. I'm going to take a college try at
> > just the scalar kernels and see what I can accomplish over the next
> > few days — will attempt to get a PR up for review with the C++ tests
> > passing. I'm expecting assorted workarounds for the various kernels
> > that do zero-copy optimizations (setting output buffers with input
> > buffers — such optimizations should likely be carried out elsewhere),
> > etc., but I will keep notes about challenges that I run into to assist
> > with further refactoring and improvements.
> >
> > On Fri, Jun 3, 2022 at 1:20 PM Sasha Krassovsky
> > <kr...@gmail.com> wrote:
> > >
> > > Hi all,
> > > I’ve been thinking about some sort of refactoring of this registry for
> a while now, and I’ve written down some thoughts, please leave your
> comments.
> > >
> > >
> https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing
> <
> https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing
> >
> > >
> > > I agree with Weston that starting with a small set of kernels and
> drilling down the implementation until we have a design with the
> characteristics we want seems like a good idea. One point to consider is
> the tradeoff between the size of our set of guinea pig kernels and how
> representative they are of the rest. We will probably need to cover nested
> types as Weston said, aggregations, etc.
> > >
> > > Sasha
> > >
> > > > On Jun 3, 2022, at 2:35 AM, Weston Pace <we...@gmail.com>
> wrote:
> > > >
> > > > That approach looks great and very much in line with some of the
> stuff
> > > > we have in light_array.h so I think it's very compatible.  If you
> have
> > > > the time to push this refactoring through then go for it.  Don't let
> > > > anything I'm saying deter any ongoing efforts.
> > > >
> > > > I'm just advocating that we be open to any kind of incremental
> > > > approach.  I looked a little closer at this today and you and Antoine
> > > > are correct that "two function registries" is maybe not the right
> > > > extension point.  Instead it might be that we have two classes of
> > > > scalar functions.  For example, the "at most three buffers" ArraySpan
> > > > that you've linked only works for types without children.  We do have
> > > > some scalar functions that run on nested types (e.g. count, index,
> > > > etc.)  My point is that rather than force ourselves to deal with all
> > > > of these corner cases we should be open to an incremental solution.
> > > > Maybe there is a ScalarPrimitive and a ScalarNested kernel type or
> > > > Scalar and ScalarSpan.
> > > >
> > > > Also, if we know we are also going to want to tweak the output
> > > > interface (I don't know for sure if we will) then maybe it makes
> sense
> > > > to pick a small set of kernels and incrementally improve that small
> > > > set until we think we've made all the changes we are going to want
> for
> > > > the near future, and then promote those changes to the wider set of
> > > > kernels.
> > > >
> > > > On Thu, Jun 2, 2022 at 8:07 AM Wes McKinney <we...@gmail.com>
> wrote:
> > > >>
> > > >> On this topic, I actually have started prototyping a new
> ScalarKernel
> > > >> exec interface that uses a non-owning, shared_ptr-free "ArraySpan"
> > > >> data structure based on some prior conversations
> > > >>
> > > >>
> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163
> > > >>
> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/kernel.h#L552
> > > >>
> > > >> My idea was to refactor just the existing scalar kernels (which by
> my
> > > >> estimation seems tractable since so many go through the common
> > > >> templated kernel generators) and leave the other kinds of kernels
> > > >> untouched so that they can be refactored in due course. I hadn't
> > > >> planned to try to tackle changing the "output" interface from
> > > >> shared_ptr<ArrayData> because there are some kernels that create new
> > > >> ArrayData and some that don't.
> > > >>
> > > >> If this seems like a reasonable path, I can try to have a first
> draft
> > > >> PR ready to go maybe by Monday (I was going to work on this over the
> > > >> weekend when I can have some uninterrupted time to do the
> > > >> refactoring). I'm not sure that a new registry is going to be needed
> > > >>
> > > >> On Thu, Jun 2, 2022 at 2:50 AM Antoine Pitrou <an...@python.org>
> wrote:
> > > >>>
> > > >>>
> > > >>> Le 02/06/2022 à 00:02, Weston Pace a écrit :
> > > >>>>
> > > >>>> I'd like to propose we add a second kernel function registry.
> There
> > > >>>> doesn't need to be any user facing API change.  We could probably
> use
> > > >>>> an approach like [2] to proxy to the old function registry when
> the
> > > >>>> newer registry doesn't contain the asked-for function.  This would
> > > >>>> allow us to focus on creating an efficient function registry
> without
> > > >>>> having to worry about refactoring the existing kernels all at
> once.
> > > >>>
> > > >>> Why do we need a separate registry for this?
> > >
>

Re: [C++] Kernel function registry evolution

Posted by Wes McKinney <we...@gmail.com>.

I've made good progress on this the last couple days and I'm at the
point of writing the new ScalarExecutor that uses the new
ArraySpan/ExecSpan concepts,  having done the initial "porting" of the
scalar kernels to use the new data structures. I'm tired so I'm
stopping for today, but I'm going to try the next couple days to get
the codebase compiling again and the unit tests passing, which will
hopefully go smoothly since we have lots of tests but we will see how
it goes.

The refactoring was extremely gnarly (especially scalar_if_else.cc)
and I have a growing list of notes about how we can improve the array
kernels codebase. In particular, we should definitely eliminate
"scalar" kernel outputs. This only gets triggered when all inputs are
scalars, but in that case we should instead promote all inputs to
arrays of length 1 (some kernels already do this!) and use the array
code paths. This would allow us to delete a ton of tedious code and
simplify our maintenance burden — there is no performance critical
need for all-scalar function evaluation, so the performance hit of
up-promoting scalars to arrays-of-length-1 is totally worth it for the
ability to delete a ton of code.

There are some critical unanswered questions around how we will handle
the dichotomy between kernels that allocate memory and those that
don't — I am hopeful that in subsequent PRs we can establish a scheme
wherein we can use the ArraySpan data structure even for non-primitive
outputs, but some kind of allocated-buffer bookkeeping concept in the
KernelContext so that the executor layer can know when it needs to
destroy a buffer that is no longer needed, rather than relying on the
ArrayData destructor to release memory.

My branch is here, but I would say it's a couple days away from being
review-ready:

https://github.com/apache/arrow/compare/master...wesm:lightweight-exec-batch

I'll post a PR when I have something closer to a green build. We
probably won't want to let this PR linger since it will cause
conflicts with any PRs touching scalar kernels.


On Fri, Jun 3, 2022 at 1:59 PM Wes McKinney <we...@gmail.com> wrote:
>
> Thanks Sasha — this is helpful. I'm going to take a college try at
> just the scalar kernels and see what I can accomplish over the next
> few days — will attempt to get a PR up for review with the C++ tests
> passing. I'm expecting assorted workarounds for the various kernels
> that do zero-copy optimizations (setting output buffers with input
> buffers — such optimizations should likely be carried out elsewhere),
> etc., but I will keep notes about challenges that I run into to assist
> with further refactoring and improvements.
>
> On Fri, Jun 3, 2022 at 1:20 PM Sasha Krassovsky
> <kr...@gmail.com> wrote:
> >
> > Hi all,
> > I’ve been thinking about some sort of refactoring of this registry for a while now, and I’ve written down some thoughts, please leave your comments.
> >
> > https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing <https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing>
> >
> > I agree with Weston that starting with a small set of kernels and drilling down the implementation until we have a design with the characteristics we want seems like a good idea. One point to consider is the tradeoff between the size of our set of guinea pig kernels and how representative they are of the rest. We will probably need to cover nested types as Weston said, aggregations, etc.
> >
> > Sasha
> >
> > > On Jun 3, 2022, at 2:35 AM, Weston Pace <we...@gmail.com> wrote:
> > >
> > > That approach looks great and very much in line with some of the stuff
> > > we have in light_array.h so I think it's very compatible.  If you have
> > > the time to push this refactoring through then go for it.  Don't let
> > > anything I'm saying deter any ongoing efforts.
> > >
> > > I'm just advocating that we be open to any kind of incremental
> > > approach.  I looked a little closer at this today and you and Antoine
> > > are correct that "two function registries" is maybe not the right
> > > extension point.  Instead it might be that we have two classes of
> > > scalar functions.  For example, the "at most three buffers" ArraySpan
> > > that you've linked only works for types without children.  We do have
> > > some scalar functions that run on nested types (e.g. count, index,
> > > etc.)  My point is that rather than force ourselves to deal with all
> > > of these corner cases we should be open to an incremental solution.
> > > Maybe there is a ScalarPrimitive and a ScalarNested kernel type or
> > > Scalar and ScalarSpan.
> > >
> > > Also, if we know we are also going to want to tweak the output
> > > interface (I don't know for sure if we will) then maybe it makes sense
> > > to pick a small set of kernels and incrementally improve that small
> > > set until we think we've made all the changes we are going to want for
> > > the near future, and then promote those changes to the wider set of
> > > kernels.
> > >
> > > On Thu, Jun 2, 2022 at 8:07 AM Wes McKinney <we...@gmail.com> wrote:
> > >>
> > >> On this topic, I actually have started prototyping a new ScalarKernel
> > >> exec interface that uses a non-owning, shared_ptr-free "ArraySpan"
> > >> data structure based on some prior conversations
> > >>
> > >> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163
> > >> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/kernel.h#L552
> > >>
> > >> My idea was to refactor just the existing scalar kernels (which by my
> > >> estimation seems tractable since so many go through the common
> > >> templated kernel generators) and leave the other kinds of kernels
> > >> untouched so that they can be refactored in due course. I hadn't
> > >> planned to try to tackle changing the "output" interface from
> > >> shared_ptr<ArrayData> because there are some kernels that create new
> > >> ArrayData and some that don't.
> > >>
> > >> If this seems like a reasonable path, I can try to have a first draft
> > >> PR ready to go maybe by Monday (I was going to work on this over the
> > >> weekend when I can have some uninterrupted time to do the
> > >> refactoring). I'm not sure that a new registry is going to be needed
> > >>
> > >> On Thu, Jun 2, 2022 at 2:50 AM Antoine Pitrou <an...@python.org> wrote:
> > >>>
> > >>>
> > >>> Le 02/06/2022 à 00:02, Weston Pace a écrit :
> > >>>>
> > >>>> I'd like to propose we add a second kernel function registry.  There
> > >>>> doesn't need to be any user facing API change.  We could probably use
> > >>>> an approach like [2] to proxy to the old function registry when the
> > >>>> newer registry doesn't contain the asked-for function.  This would
> > >>>> allow us to focus on creating an efficient function registry without
> > >>>> having to worry about refactoring the existing kernels all at once.
> > >>>
> > >>> Why do we need a separate registry for this?
> >

Re: [C++] Kernel function registry evolution

Posted by Wes McKinney <we...@gmail.com>.

Thanks Sasha — this is helpful. I'm going to take a college try at
just the scalar kernels and see what I can accomplish over the next
few days — will attempt to get a PR up for review with the C++ tests
passing. I'm expecting assorted workarounds for the various kernels
that do zero-copy optimizations (setting output buffers with input
buffers — such optimizations should likely be carried out elsewhere),
etc., but I will keep notes about challenges that I run into to assist
with further refactoring and improvements.

On Fri, Jun 3, 2022 at 1:20 PM Sasha Krassovsky
<kr...@gmail.com> wrote:
>
> Hi all,
> I’ve been thinking about some sort of refactoring of this registry for a while now, and I’ve written down some thoughts, please leave your comments.
>
> https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing <https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing>
>
> I agree with Weston that starting with a small set of kernels and drilling down the implementation until we have a design with the characteristics we want seems like a good idea. One point to consider is the tradeoff between the size of our set of guinea pig kernels and how representative they are of the rest. We will probably need to cover nested types as Weston said, aggregations, etc.
>
> Sasha
>
> > On Jun 3, 2022, at 2:35 AM, Weston Pace <we...@gmail.com> wrote:
> >
> > That approach looks great and very much in line with some of the stuff
> > we have in light_array.h so I think it's very compatible.  If you have
> > the time to push this refactoring through then go for it.  Don't let
> > anything I'm saying deter any ongoing efforts.
> >
> > I'm just advocating that we be open to any kind of incremental
> > approach.  I looked a little closer at this today and you and Antoine
> > are correct that "two function registries" is maybe not the right
> > extension point.  Instead it might be that we have two classes of
> > scalar functions.  For example, the "at most three buffers" ArraySpan
> > that you've linked only works for types without children.  We do have
> > some scalar functions that run on nested types (e.g. count, index,
> > etc.)  My point is that rather than force ourselves to deal with all
> > of these corner cases we should be open to an incremental solution.
> > Maybe there is a ScalarPrimitive and a ScalarNested kernel type or
> > Scalar and ScalarSpan.
> >
> > Also, if we know we are also going to want to tweak the output
> > interface (I don't know for sure if we will) then maybe it makes sense
> > to pick a small set of kernels and incrementally improve that small
> > set until we think we've made all the changes we are going to want for
> > the near future, and then promote those changes to the wider set of
> > kernels.
> >
> > On Thu, Jun 2, 2022 at 8:07 AM Wes McKinney <we...@gmail.com> wrote:
> >>
> >> On this topic, I actually have started prototyping a new ScalarKernel
> >> exec interface that uses a non-owning, shared_ptr-free "ArraySpan"
> >> data structure based on some prior conversations
> >>
> >> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163
> >> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/kernel.h#L552
> >>
> >> My idea was to refactor just the existing scalar kernels (which by my
> >> estimation seems tractable since so many go through the common
> >> templated kernel generators) and leave the other kinds of kernels
> >> untouched so that they can be refactored in due course. I hadn't
> >> planned to try to tackle changing the "output" interface from
> >> shared_ptr<ArrayData> because there are some kernels that create new
> >> ArrayData and some that don't.
> >>
> >> If this seems like a reasonable path, I can try to have a first draft
> >> PR ready to go maybe by Monday (I was going to work on this over the
> >> weekend when I can have some uninterrupted time to do the
> >> refactoring). I'm not sure that a new registry is going to be needed
> >>
> >> On Thu, Jun 2, 2022 at 2:50 AM Antoine Pitrou <an...@python.org> wrote:
> >>>
> >>>
> >>> Le 02/06/2022 à 00:02, Weston Pace a écrit :
> >>>>
> >>>> I'd like to propose we add a second kernel function registry.  There
> >>>> doesn't need to be any user facing API change.  We could probably use
> >>>> an approach like [2] to proxy to the old function registry when the
> >>>> newer registry doesn't contain the asked-for function.  This would
> >>>> allow us to focus on creating an efficient function registry without
> >>>> having to worry about refactoring the existing kernels all at once.
> >>>
> >>> Why do we need a separate registry for this?
>

Re: [C++] Kernel function registry evolution

Posted by Sasha Krassovsky <kr...@gmail.com>.

Hi all, 
I’ve been thinking about some sort of refactoring of this registry for a while now, and I’ve written down some thoughts, please leave your comments.

https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing <https://docs.google.com/document/d/1LAN9I_Y9cZaG2a84j1wLY8jSlK3gDXYMle-VtyFCAE8/edit?usp=sharing>

I agree with Weston that starting with a small set of kernels and drilling down the implementation until we have a design with the characteristics we want seems like a good idea. One point to consider is the tradeoff between the size of our set of guinea pig kernels and how representative they are of the rest. We will probably need to cover nested types as Weston said, aggregations, etc. 

Sasha

> On Jun 3, 2022, at 2:35 AM, Weston Pace <we...@gmail.com> wrote:
> 
> That approach looks great and very much in line with some of the stuff
> we have in light_array.h so I think it's very compatible.  If you have
> the time to push this refactoring through then go for it.  Don't let
> anything I'm saying deter any ongoing efforts.
> 
> I'm just advocating that we be open to any kind of incremental
> approach.  I looked a little closer at this today and you and Antoine
> are correct that "two function registries" is maybe not the right
> extension point.  Instead it might be that we have two classes of
> scalar functions.  For example, the "at most three buffers" ArraySpan
> that you've linked only works for types without children.  We do have
> some scalar functions that run on nested types (e.g. count, index,
> etc.)  My point is that rather than force ourselves to deal with all
> of these corner cases we should be open to an incremental solution.
> Maybe there is a ScalarPrimitive and a ScalarNested kernel type or
> Scalar and ScalarSpan.
> 
> Also, if we know we are also going to want to tweak the output
> interface (I don't know for sure if we will) then maybe it makes sense
> to pick a small set of kernels and incrementally improve that small
> set until we think we've made all the changes we are going to want for
> the near future, and then promote those changes to the wider set of
> kernels.
> 
> On Thu, Jun 2, 2022 at 8:07 AM Wes McKinney <we...@gmail.com> wrote:
>> 
>> On this topic, I actually have started prototyping a new ScalarKernel
>> exec interface that uses a non-owning, shared_ptr-free "ArraySpan"
>> data structure based on some prior conversations
>> 
>> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163
>> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/kernel.h#L552
>> 
>> My idea was to refactor just the existing scalar kernels (which by my
>> estimation seems tractable since so many go through the common
>> templated kernel generators) and leave the other kinds of kernels
>> untouched so that they can be refactored in due course. I hadn't
>> planned to try to tackle changing the "output" interface from
>> shared_ptr<ArrayData> because there are some kernels that create new
>> ArrayData and some that don't.
>> 
>> If this seems like a reasonable path, I can try to have a first draft
>> PR ready to go maybe by Monday (I was going to work on this over the
>> weekend when I can have some uninterrupted time to do the
>> refactoring). I'm not sure that a new registry is going to be needed
>> 
>> On Thu, Jun 2, 2022 at 2:50 AM Antoine Pitrou <an...@python.org> wrote:
>>> 
>>> 
>>> Le 02/06/2022 à 00:02, Weston Pace a écrit :
>>>> 
>>>> I'd like to propose we add a second kernel function registry.  There
>>>> doesn't need to be any user facing API change.  We could probably use
>>>> an approach like [2] to proxy to the old function registry when the
>>>> newer registry doesn't contain the asked-for function.  This would
>>>> allow us to focus on creating an efficient function registry without
>>>> having to worry about refactoring the existing kernels all at once.
>>> 
>>> Why do we need a separate registry for this?

Re: [C++] Kernel function registry evolution

Posted by Weston Pace <we...@gmail.com>.

That approach looks great and very much in line with some of the stuff
we have in light_array.h so I think it's very compatible.  If you have
the time to push this refactoring through then go for it.  Don't let
anything I'm saying deter any ongoing efforts.

I'm just advocating that we be open to any kind of incremental
approach.  I looked a little closer at this today and you and Antoine
are correct that "two function registries" is maybe not the right
extension point.  Instead it might be that we have two classes of
scalar functions.  For example, the "at most three buffers" ArraySpan
that you've linked only works for types without children.  We do have
some scalar functions that run on nested types (e.g. count, index,
etc.)  My point is that rather than force ourselves to deal with all
of these corner cases we should be open to an incremental solution.
Maybe there is a ScalarPrimitive and a ScalarNested kernel type or
Scalar and ScalarSpan.

Also, if we know we are also going to want to tweak the output
interface (I don't know for sure if we will) then maybe it makes sense
to pick a small set of kernels and incrementally improve that small
set until we think we've made all the changes we are going to want for
the near future, and then promote those changes to the wider set of
kernels.

On Thu, Jun 2, 2022 at 8:07 AM Wes McKinney <we...@gmail.com> wrote:
>
> On this topic, I actually have started prototyping a new ScalarKernel
> exec interface that uses a non-owning, shared_ptr-free "ArraySpan"
> data structure based on some prior conversations
>
> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163
> https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/kernel.h#L552
>
> My idea was to refactor just the existing scalar kernels (which by my
> estimation seems tractable since so many go through the common
> templated kernel generators) and leave the other kinds of kernels
> untouched so that they can be refactored in due course. I hadn't
> planned to try to tackle changing the "output" interface from
> shared_ptr<ArrayData> because there are some kernels that create new
> ArrayData and some that don't.
>
> If this seems like a reasonable path, I can try to have a first draft
> PR ready to go maybe by Monday (I was going to work on this over the
> weekend when I can have some uninterrupted time to do the
> refactoring). I'm not sure that a new registry is going to be needed
>
> On Thu, Jun 2, 2022 at 2:50 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Le 02/06/2022 à 00:02, Weston Pace a écrit :
> > >
> > > I'd like to propose we add a second kernel function registry.  There
> > > doesn't need to be any user facing API change.  We could probably use
> > > an approach like [2] to proxy to the old function registry when the
> > > newer registry doesn't contain the asked-for function.  This would
> > > allow us to focus on creating an efficient function registry without
> > > having to worry about refactoring the existing kernels all at once.
> >
> > Why do we need a separate registry for this?

Re: [C++] Kernel function registry evolution

Posted by Wes McKinney <we...@gmail.com>.

On this topic, I actually have started prototyping a new ScalarKernel
exec interface that uses a non-owning, shared_ptr-free "ArraySpan"
data structure based on some prior conversations

https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163
https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/kernel.h#L552

My idea was to refactor just the existing scalar kernels (which by my
estimation seems tractable since so many go through the common
templated kernel generators) and leave the other kinds of kernels
untouched so that they can be refactored in due course. I hadn't
planned to try to tackle changing the "output" interface from
shared_ptr<ArrayData> because there are some kernels that create new
ArrayData and some that don't.

If this seems like a reasonable path, I can try to have a first draft
PR ready to go maybe by Monday (I was going to work on this over the
weekend when I can have some uninterrupted time to do the
refactoring). I'm not sure that a new registry is going to be needed

On Thu, Jun 2, 2022 at 2:50 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Le 02/06/2022 à 00:02, Weston Pace a écrit :
> >
> > I'd like to propose we add a second kernel function registry.  There
> > doesn't need to be any user facing API change.  We could probably use
> > an approach like [2] to proxy to the old function registry when the
> > newer registry doesn't contain the asked-for function.  This would
> > allow us to focus on creating an efficient function registry without
> > having to worry about refactoring the existing kernels all at once.
>
> Why do we need a separate registry for this?

Re: [C++] Kernel function registry evolution

Posted by Antoine Pitrou <an...@python.org>.

Le 02/06/2022 à 00:02, Weston Pace a écrit :
> 
> I'd like to propose we add a second kernel function registry.  There
> doesn't need to be any user facing API change.  We could probably use
> an approach like [2] to proxy to the old function registry when the
> newer registry doesn't contain the asked-for function.  This would
> allow us to focus on creating an efficient function registry without
> having to worry about refactoring the existing kernels all at once.

Why do we need a separate registry for this?