You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Ian Cook <ia...@ursacomputing.com> on 2022/04/12 20:17:53 UTC

Arrow sync call April 13 at 12:00 US/Eastern, 16:00 UTC

Hi all,

Our biweekly sync call is tomorrow at 12:00 noon Eastern time.

The Zoom meeting URL for this and other biweekly Arrow sync calls is:
https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09

Alternatively, enter this information into the Zoom website or app to
join the call:
Meeting ID: 876 4903 3008
Passcode: 958092

Thanks,
Ian

Re: Arrow sync call April 27 at 12:00 US/Eastern, 16:00 UTC

Posted by Benson Muite <be...@emailplus.org>.
Attendees:
Ian Joiner
Matthew Topol
Benson Muite

Discussion points:
1) New book on Arrow - covers C++, Python and Go, out in June
2) Building ORC bindings in R would be useful, extensions to parallel R?
3) Comparing ORC and Parquet for IO
4) IO optimization vs SIMD optimization - Parquet seems well optimized, 
so SIMD would be more helpful, but SIMD requires some care from Go.
5) Substrait would be great to use from Go if it is developed more fully.
6) A developer guide to developing Arrow on the cloud maybe useful

Re: Arrow sync call April 27 at 12:00 US/Eastern, 16:00 UTC

Posted by Ian Cook <ia...@ursacomputing.com>.
Thanks Benson!

The Zoom meeting URL for this and other biweekly Arrow sync calls is:
https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09

Alternatively, enter this information into the Zoom website or app to
join the call:
Meeting ID: 876 4903 3008
Passcode: 958092

The Zoom meeting will start when the first attendee joins, even if
there is no one from Voltron Data present.

Ian
On Wed, Apr 27, 2022 at 9:45 AM David Li <li...@apache.org> wrote:
>
> Thanks Benson. If you are able to take notes this week that would be much appreciated.
>
> And thanks Joris for the clarification.
>
> On Wed, Apr 27, 2022, at 09:34, Joris Van den Bossche wrote:
> > As a small clarification: the zoom meeting link itself should still work
> > for anyone to join, it's only there is no one from Voltron Data to lead the
> > meeting / take notes (so I also won't be present today).
> >
> > Joris
> >
> > On Wed, 27 Apr 2022 at 13:05, Benson Muite <be...@emailplus.org>
> > wrote:
> >
> >> Hi,
> >>
> >> Can host if required, though the timing is not ideal for me. It may be
> >> helpful to vary the timing in future.
> >>
> >> Benson
> >>
> >> On 4/25/22 2:49 PM, David Li wrote:
> >> > Following up here:
> >> >
> >> >> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
> >> not be able to host the fortnightly sync call. Is anyone available to run
> >> the meeting that day?
> >> >
> >> > Is anyone available to run the sync call this Wednesday?
> >> >
> >> > On Wed, Apr 13, 2022, at 12:59, David Li wrote:
> >> >> Attendees:
> >> >>
> >> >> - David Li
> >> >> - Eduardo Ponce
> >> >> - Gavin Ray
> >> >> - Ian Cook
> >> >> - James Duong
> >> >> - Matthew Topol
> >> >> - Nic
> >> >> - Niranda
> >> >> - Raul Cumplido
> >> >> - Rok
> >> >> - Weston Pace
> >> >> - Will Jones
> >> >>
> >> >> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
> >> >> not be able to host the fortnightly sync call. Is anyone available to
> >> >> run the meeting that day?
> >> >>
> >> >> Agenda:
> >> >>
> >> >> 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
> >> >> next ~1-2 weeks. See the ML post [1] for details, including a wiki page
> >> >> listing outstanding issues. In particular, there are some Go PRs that
> >> >> could use attention from an interested Go developer [2], as well as
> >> >> some temporal kernel PRs that could use a review [3].
> >> >>
> >> >> Arrow C++ Compute Engine: Weston gave a status update;
> >> >> APIs/documentation has been improved for users, though likely most will
> >> >> use it through an API like Substrait; basic Substrait support has been
> >> >> added with forthcoming improvements; more tooling to measure
> >> >> performance is being worked on; general kernel execution overhead is
> >> >> being addressed with an eye towards running smaller batches through the
> >> >> engine. An asof join implementation is being worked on, and Go is
> >> >> working towards Substrait bindings to be able to bind to the C++ engine.
> >> >>
> >> >> Kernel vectorization/SIMD: Eduardo has been looking at making some of
> >> >> the primitive kernels (e.g. arithmetic) more easily autovectorized by
> >> >> the compiler, testing a variety of approaches. See related discussion
> >> >> [4]. We do not have benchmarks to evaluate compiler performance in this
> >> >> regard generally, but we have manually inspected some compiler output
> >> >> and found that not all compilers manage to do this with the current
> >> >> kernel implementations. We also don't have a holistic way to evaluate
> >> >> this going forward, nor do we have a sense for current benchmark
> >> >> coverage, though possibly we could generate benchmarks. However, it was
> >> >> pointed out that general engine performance is likely more important,
> >> >> and that current profiling indicates kernels are not yet a bottleneck,
> >> >> though there may be low-hanging fruit here.
> >> >>
> >> >> Flight/Flight SQL: we discussed the barriers to Flight SQL support in
> >> >> Go; Flight SQL heavily uses union types which are not yet implemented.
> >> >> A further proposal [5] has been submitted to extend the type metadata,
> >> >> please take a look for those interested. The GetXdbcTypeInfo proposal
> >> >> was merged, and the inline data proposal is still outstanding (but
> >> >> probably ready to have a vote).
> >> >>
> >> >> IPC/Format: it was asked if there's an IPC structure for serializing a
> >> >> single array to reduce overhead. Current APIs likely suffice but
> >> >> Niranda may submit a separate discussion to explain further.
> >> >>
> >> >> [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
> >> >> [2]: https://github.com/apache/arrow/pull/12158
> >> >> [3]: https://github.com/apache/arrow/pull/12657
> >> >> [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
> >> >> [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
> >> >>
> >> >> On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
> >> >>> Hi all,
> >> >>>
> >> >>> Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
> >> >>>
> >> >>> The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> >> >>> https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> >> >>>
> >> >>> Alternatively, enter this information into the Zoom website or app to
> >> >>> join the call:
> >> >>> Meeting ID: 876 4903 3008
> >> >>> Passcode: 958092
> >> >>>
> >> >>> Thanks,
> >> >>> Ian
> >>
> >>

Re: Arrow sync call April 27 at 12:00 US/Eastern, 16:00 UTC

Posted by David Li <li...@apache.org>.
Thanks Benson. If you are able to take notes this week that would be much appreciated.

And thanks Joris for the clarification.

On Wed, Apr 27, 2022, at 09:34, Joris Van den Bossche wrote:
> As a small clarification: the zoom meeting link itself should still work
> for anyone to join, it's only there is no one from Voltron Data to lead the
> meeting / take notes (so I also won't be present today).
>
> Joris
>
> On Wed, 27 Apr 2022 at 13:05, Benson Muite <be...@emailplus.org>
> wrote:
>
>> Hi,
>>
>> Can host if required, though the timing is not ideal for me. It may be
>> helpful to vary the timing in future.
>>
>> Benson
>>
>> On 4/25/22 2:49 PM, David Li wrote:
>> > Following up here:
>> >
>> >> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
>> not be able to host the fortnightly sync call. Is anyone available to run
>> the meeting that day?
>> >
>> > Is anyone available to run the sync call this Wednesday?
>> >
>> > On Wed, Apr 13, 2022, at 12:59, David Li wrote:
>> >> Attendees:
>> >>
>> >> - David Li
>> >> - Eduardo Ponce
>> >> - Gavin Ray
>> >> - Ian Cook
>> >> - James Duong
>> >> - Matthew Topol
>> >> - Nic
>> >> - Niranda
>> >> - Raul Cumplido
>> >> - Rok
>> >> - Weston Pace
>> >> - Will Jones
>> >>
>> >> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
>> >> not be able to host the fortnightly sync call. Is anyone available to
>> >> run the meeting that day?
>> >>
>> >> Agenda:
>> >>
>> >> 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
>> >> next ~1-2 weeks. See the ML post [1] for details, including a wiki page
>> >> listing outstanding issues. In particular, there are some Go PRs that
>> >> could use attention from an interested Go developer [2], as well as
>> >> some temporal kernel PRs that could use a review [3].
>> >>
>> >> Arrow C++ Compute Engine: Weston gave a status update;
>> >> APIs/documentation has been improved for users, though likely most will
>> >> use it through an API like Substrait; basic Substrait support has been
>> >> added with forthcoming improvements; more tooling to measure
>> >> performance is being worked on; general kernel execution overhead is
>> >> being addressed with an eye towards running smaller batches through the
>> >> engine. An asof join implementation is being worked on, and Go is
>> >> working towards Substrait bindings to be able to bind to the C++ engine.
>> >>
>> >> Kernel vectorization/SIMD: Eduardo has been looking at making some of
>> >> the primitive kernels (e.g. arithmetic) more easily autovectorized by
>> >> the compiler, testing a variety of approaches. See related discussion
>> >> [4]. We do not have benchmarks to evaluate compiler performance in this
>> >> regard generally, but we have manually inspected some compiler output
>> >> and found that not all compilers manage to do this with the current
>> >> kernel implementations. We also don't have a holistic way to evaluate
>> >> this going forward, nor do we have a sense for current benchmark
>> >> coverage, though possibly we could generate benchmarks. However, it was
>> >> pointed out that general engine performance is likely more important,
>> >> and that current profiling indicates kernels are not yet a bottleneck,
>> >> though there may be low-hanging fruit here.
>> >>
>> >> Flight/Flight SQL: we discussed the barriers to Flight SQL support in
>> >> Go; Flight SQL heavily uses union types which are not yet implemented.
>> >> A further proposal [5] has been submitted to extend the type metadata,
>> >> please take a look for those interested. The GetXdbcTypeInfo proposal
>> >> was merged, and the inline data proposal is still outstanding (but
>> >> probably ready to have a vote).
>> >>
>> >> IPC/Format: it was asked if there's an IPC structure for serializing a
>> >> single array to reduce overhead. Current APIs likely suffice but
>> >> Niranda may submit a separate discussion to explain further.
>> >>
>> >> [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
>> >> [2]: https://github.com/apache/arrow/pull/12158
>> >> [3]: https://github.com/apache/arrow/pull/12657
>> >> [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
>> >> [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
>> >>
>> >> On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
>> >>> Hi all,
>> >>>
>> >>> Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
>> >>>
>> >>> The Zoom meeting URL for this and other biweekly Arrow sync calls is:
>> >>> https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
>> >>>
>> >>> Alternatively, enter this information into the Zoom website or app to
>> >>> join the call:
>> >>> Meeting ID: 876 4903 3008
>> >>> Passcode: 958092
>> >>>
>> >>> Thanks,
>> >>> Ian
>>
>>

Re: Arrow sync call April 27 at 12:00 US/Eastern, 16:00 UTC

Posted by Joris Van den Bossche <jo...@gmail.com>.
As a small clarification: the zoom meeting link itself should still work
for anyone to join, it's only there is no one from Voltron Data to lead the
meeting / take notes (so I also won't be present today).

Joris

On Wed, 27 Apr 2022 at 13:05, Benson Muite <be...@emailplus.org>
wrote:

> Hi,
>
> Can host if required, though the timing is not ideal for me. It may be
> helpful to vary the timing in future.
>
> Benson
>
> On 4/25/22 2:49 PM, David Li wrote:
> > Following up here:
> >
> >> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
> not be able to host the fortnightly sync call. Is anyone available to run
> the meeting that day?
> >
> > Is anyone available to run the sync call this Wednesday?
> >
> > On Wed, Apr 13, 2022, at 12:59, David Li wrote:
> >> Attendees:
> >>
> >> - David Li
> >> - Eduardo Ponce
> >> - Gavin Ray
> >> - Ian Cook
> >> - James Duong
> >> - Matthew Topol
> >> - Nic
> >> - Niranda
> >> - Raul Cumplido
> >> - Rok
> >> - Weston Pace
> >> - Will Jones
> >>
> >> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
> >> not be able to host the fortnightly sync call. Is anyone available to
> >> run the meeting that day?
> >>
> >> Agenda:
> >>
> >> 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
> >> next ~1-2 weeks. See the ML post [1] for details, including a wiki page
> >> listing outstanding issues. In particular, there are some Go PRs that
> >> could use attention from an interested Go developer [2], as well as
> >> some temporal kernel PRs that could use a review [3].
> >>
> >> Arrow C++ Compute Engine: Weston gave a status update;
> >> APIs/documentation has been improved for users, though likely most will
> >> use it through an API like Substrait; basic Substrait support has been
> >> added with forthcoming improvements; more tooling to measure
> >> performance is being worked on; general kernel execution overhead is
> >> being addressed with an eye towards running smaller batches through the
> >> engine. An asof join implementation is being worked on, and Go is
> >> working towards Substrait bindings to be able to bind to the C++ engine.
> >>
> >> Kernel vectorization/SIMD: Eduardo has been looking at making some of
> >> the primitive kernels (e.g. arithmetic) more easily autovectorized by
> >> the compiler, testing a variety of approaches. See related discussion
> >> [4]. We do not have benchmarks to evaluate compiler performance in this
> >> regard generally, but we have manually inspected some compiler output
> >> and found that not all compilers manage to do this with the current
> >> kernel implementations. We also don't have a holistic way to evaluate
> >> this going forward, nor do we have a sense for current benchmark
> >> coverage, though possibly we could generate benchmarks. However, it was
> >> pointed out that general engine performance is likely more important,
> >> and that current profiling indicates kernels are not yet a bottleneck,
> >> though there may be low-hanging fruit here.
> >>
> >> Flight/Flight SQL: we discussed the barriers to Flight SQL support in
> >> Go; Flight SQL heavily uses union types which are not yet implemented.
> >> A further proposal [5] has been submitted to extend the type metadata,
> >> please take a look for those interested. The GetXdbcTypeInfo proposal
> >> was merged, and the inline data proposal is still outstanding (but
> >> probably ready to have a vote).
> >>
> >> IPC/Format: it was asked if there's an IPC structure for serializing a
> >> single array to reduce overhead. Current APIs likely suffice but
> >> Niranda may submit a separate discussion to explain further.
> >>
> >> [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
> >> [2]: https://github.com/apache/arrow/pull/12158
> >> [3]: https://github.com/apache/arrow/pull/12657
> >> [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
> >> [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
> >>
> >> On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
> >>> Hi all,
> >>>
> >>> Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
> >>>
> >>> The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> >>> https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> >>>
> >>> Alternatively, enter this information into the Zoom website or app to
> >>> join the call:
> >>> Meeting ID: 876 4903 3008
> >>> Passcode: 958092
> >>>
> >>> Thanks,
> >>> Ian
>
>

Re: Arrow sync call April 27 at 12:00 US/Eastern, 16:00 UTC

Posted by Benson Muite <be...@emailplus.org>.
Hi,

Can host if required, though the timing is not ideal for me. It may be 
helpful to vary the timing in future.

Benson

On 4/25/22 2:49 PM, David Li wrote:
> Following up here:
> 
>> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will not be able to host the fortnightly sync call. Is anyone available to run the meeting that day?
> 
> Is anyone available to run the sync call this Wednesday?
> 
> On Wed, Apr 13, 2022, at 12:59, David Li wrote:
>> Attendees:
>>
>> - David Li
>> - Eduardo Ponce
>> - Gavin Ray
>> - Ian Cook
>> - James Duong
>> - Matthew Topol
>> - Nic
>> - Niranda
>> - Raul Cumplido
>> - Rok
>> - Weston Pace
>> - Will Jones
>>
>> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
>> not be able to host the fortnightly sync call. Is anyone available to
>> run the meeting that day?
>>
>> Agenda:
>>
>> 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
>> next ~1-2 weeks. See the ML post [1] for details, including a wiki page
>> listing outstanding issues. In particular, there are some Go PRs that
>> could use attention from an interested Go developer [2], as well as
>> some temporal kernel PRs that could use a review [3].
>>
>> Arrow C++ Compute Engine: Weston gave a status update;
>> APIs/documentation has been improved for users, though likely most will
>> use it through an API like Substrait; basic Substrait support has been
>> added with forthcoming improvements; more tooling to measure
>> performance is being worked on; general kernel execution overhead is
>> being addressed with an eye towards running smaller batches through the
>> engine. An asof join implementation is being worked on, and Go is
>> working towards Substrait bindings to be able to bind to the C++ engine.
>>
>> Kernel vectorization/SIMD: Eduardo has been looking at making some of
>> the primitive kernels (e.g. arithmetic) more easily autovectorized by
>> the compiler, testing a variety of approaches. See related discussion
>> [4]. We do not have benchmarks to evaluate compiler performance in this
>> regard generally, but we have manually inspected some compiler output
>> and found that not all compilers manage to do this with the current
>> kernel implementations. We also don't have a holistic way to evaluate
>> this going forward, nor do we have a sense for current benchmark
>> coverage, though possibly we could generate benchmarks. However, it was
>> pointed out that general engine performance is likely more important,
>> and that current profiling indicates kernels are not yet a bottleneck,
>> though there may be low-hanging fruit here.
>>
>> Flight/Flight SQL: we discussed the barriers to Flight SQL support in
>> Go; Flight SQL heavily uses union types which are not yet implemented.
>> A further proposal [5] has been submitted to extend the type metadata,
>> please take a look for those interested. The GetXdbcTypeInfo proposal
>> was merged, and the inline data proposal is still outstanding (but
>> probably ready to have a vote).
>>
>> IPC/Format: it was asked if there's an IPC structure for serializing a
>> single array to reduce overhead. Current APIs likely suffice but
>> Niranda may submit a separate discussion to explain further.
>>
>> [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
>> [2]: https://github.com/apache/arrow/pull/12158
>> [3]: https://github.com/apache/arrow/pull/12657
>> [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
>> [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
>>
>> On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
>>> Hi all,
>>>
>>> Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
>>>
>>> The Zoom meeting URL for this and other biweekly Arrow sync calls is:
>>> https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
>>>
>>> Alternatively, enter this information into the Zoom website or app to
>>> join the call:
>>> Meeting ID: 876 4903 3008
>>> Passcode: 958092
>>>
>>> Thanks,
>>> Ian


Re: Arrow sync call April 13 at 12:00 US/Eastern, 16:00 UTC

Posted by Benson Muite <be...@emailplus.org>.
On 4/25/22 2:49 PM, David Li wrote:
> Following up here:
> 
>> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will not be able to host the fortnightly sync call. Is anyone available to run the meeting that day?
> 
> Is anyone available to run the sync call this Wednesday?
> 
> On Wed, Apr 13, 2022, at 12:59, David Li wrote:
>> Attendees:
>>
>> - David Li
>> - Eduardo Ponce
>> - Gavin Ray
>> - Ian Cook
>> - James Duong
>> - Matthew Topol
>> - Nic
>> - Niranda
>> - Raul Cumplido
>> - Rok
>> - Weston Pace
>> - Will Jones
>>
>> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
>> not be able to host the fortnightly sync call. Is anyone available to
>> run the meeting that day?
>>
>> Agenda:
>>
>> 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
>> next ~1-2 weeks. See the ML post [1] for details, including a wiki page
>> listing outstanding issues. In particular, there are some Go PRs that
>> could use attention from an interested Go developer [2], as well as
>> some temporal kernel PRs that could use a review [3].
>>
>> Arrow C++ Compute Engine: Weston gave a status update;
>> APIs/documentation has been improved for users, though likely most will
>> use it through an API like Substrait; basic Substrait support has been
>> added with forthcoming improvements; more tooling to measure
>> performance is being worked on; general kernel execution overhead is
>> being addressed with an eye towards running smaller batches through the
>> engine. An asof join implementation is being worked on, and Go is
>> working towards Substrait bindings to be able to bind to the C++ engine.
>>
>> Kernel vectorization/SIMD: Eduardo has been looking at making some of
>> the primitive kernels (e.g. arithmetic) more easily autovectorized by
>> the compiler, testing a variety of approaches. See related discussion
>> [4]. We do not have benchmarks to evaluate compiler performance in this
>> regard generally, but we have manually inspected some compiler output
>> and found that not all compilers manage to do this with the current
>> kernel implementations. We also don't have a holistic way to evaluate
>> this going forward, nor do we have a sense for current benchmark
>> coverage, though possibly we could generate benchmarks. However, it was
>> pointed out that general engine performance is likely more important,
>> and that current profiling indicates kernels are not yet a bottleneck,
>> though there may be low-hanging fruit here.
>>
>> Flight/Flight SQL: we discussed the barriers to Flight SQL support in
>> Go; Flight SQL heavily uses union types which are not yet implemented.
>> A further proposal [5] has been submitted to extend the type metadata,
>> please take a look for those interested. The GetXdbcTypeInfo proposal
>> was merged, and the inline data proposal is still outstanding (but
>> probably ready to have a vote).
>>
>> IPC/Format: it was asked if there's an IPC structure for serializing a
>> single array to reduce overhead. Current APIs likely suffice but
>> Niranda may submit a separate discussion to explain further.
>>
>> [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
>> [2]: https://github.com/apache/arrow/pull/12158
>> [3]: https://github.com/apache/arrow/pull/12657
>> [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
>> [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
>>
>> On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
>>> Hi all,
>>>
>>> Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
>>>
>>> The Zoom meeting URL for this and other biweekly Arrow sync calls is:
>>> https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
>>>
>>> Alternatively, enter this information into the Zoom website or app to
>>> join the call:
>>> Meeting ID: 876 4903 3008
>>> Passcode: 958092
>>>
>>> Thanks,
>>> Ian


Re: Arrow sync call April 13 at 12:00 US/Eastern, 16:00 UTC

Posted by David Li <li...@apache.org>.
Following up here:

> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will not be able to host the fortnightly sync call. Is anyone available to run the meeting that day?

Is anyone available to run the sync call this Wednesday?

On Wed, Apr 13, 2022, at 12:59, David Li wrote:
> Attendees:
>
> - David Li
> - Eduardo Ponce
> - Gavin Ray
> - Ian Cook
> - James Duong
> - Matthew Topol
> - Nic
> - Niranda
> - Raul Cumplido
> - Rok
> - Weston Pace
> - Will Jones
>
> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will 
> not be able to host the fortnightly sync call. Is anyone available to 
> run the meeting that day?
>
> Agenda:
>
> 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the 
> next ~1-2 weeks. See the ML post [1] for details, including a wiki page 
> listing outstanding issues. In particular, there are some Go PRs that 
> could use attention from an interested Go developer [2], as well as 
> some temporal kernel PRs that could use a review [3].
>
> Arrow C++ Compute Engine: Weston gave a status update; 
> APIs/documentation has been improved for users, though likely most will 
> use it through an API like Substrait; basic Substrait support has been 
> added with forthcoming improvements; more tooling to measure 
> performance is being worked on; general kernel execution overhead is 
> being addressed with an eye towards running smaller batches through the 
> engine. An asof join implementation is being worked on, and Go is 
> working towards Substrait bindings to be able to bind to the C++ engine.
>
> Kernel vectorization/SIMD: Eduardo has been looking at making some of 
> the primitive kernels (e.g. arithmetic) more easily autovectorized by 
> the compiler, testing a variety of approaches. See related discussion 
> [4]. We do not have benchmarks to evaluate compiler performance in this 
> regard generally, but we have manually inspected some compiler output 
> and found that not all compilers manage to do this with the current 
> kernel implementations. We also don't have a holistic way to evaluate 
> this going forward, nor do we have a sense for current benchmark 
> coverage, though possibly we could generate benchmarks. However, it was 
> pointed out that general engine performance is likely more important, 
> and that current profiling indicates kernels are not yet a bottleneck, 
> though there may be low-hanging fruit here.
>
> Flight/Flight SQL: we discussed the barriers to Flight SQL support in 
> Go; Flight SQL heavily uses union types which are not yet implemented. 
> A further proposal [5] has been submitted to extend the type metadata, 
> please take a look for those interested. The GetXdbcTypeInfo proposal 
> was merged, and the inline data proposal is still outstanding (but 
> probably ready to have a vote). 
>
> IPC/Format: it was asked if there's an IPC structure for serializing a 
> single array to reduce overhead. Current APIs likely suffice but 
> Niranda may submit a separate discussion to explain further. 
>
> [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
> [2]: https://github.com/apache/arrow/pull/12158
> [3]: https://github.com/apache/arrow/pull/12657
> [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
> [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
>
> On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
>> Hi all,
>>
>> Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
>>
>> The Zoom meeting URL for this and other biweekly Arrow sync calls is:
>> https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
>>
>> Alternatively, enter this information into the Zoom website or app to
>> join the call:
>> Meeting ID: 876 4903 3008
>> Passcode: 958092
>>
>> Thanks,
>> Ian

Re: Arrow sync call April 13 at 12:00 US/Eastern, 16:00 UTC

Posted by Will Jones <wi...@gmail.com>.
That sounds great, Weston! Agreed that syncing up with releases seems like
a good idea.

On Mon, Apr 18, 2022 at 11:51 AM Weston Pace <we...@gmail.com> wrote:

> I'm happy to provide a quarterly update on C++ engine work but in the
> future I'll draft it in PR form so others have a chance to pitch in.
> I was inspired by, and hope to mimic, the Rust community's very cool
> quarterly roadmap [1][2] as a place to have higher level discussions
> on what people are hoping to work on.  Since the C++ implementation
> has quarterly releases we can probably sync up with releases so I'll
> start a discussion about halfway to the 8.0.0 release.
>
> [1]
> https://docs.google.com/document/d/1t64vZwZnXm9MyFj2qz3xcAkSxK3Wu12giS3KrS4nDE0/edit
> [2] https://github.com/apache/arrow-datafusion/pull/2133
>
> On Mon, Apr 18, 2022 at 7:20 AM Will Jones <wi...@gmail.com>
> wrote:
> >
> > Thanks Weston for providing the update on the C++ compute engine. IMO, it
> > would be very welcome to have that update be a quarterly email to the dev
> > mailing list, and may provide an opportunity to highlight issues in Jira
> > that are good first issues or neglected but important.
> >
> > On Wed, Apr 13, 2022 at 10:00 AM David Li <li...@apache.org> wrote:
> >
> > > Attendees:
> > >
> > > - David Li
> > > - Eduardo Ponce
> > > - Gavin Ray
> > > - Ian Cook
> > > - James Duong
> > > - Matthew Topol
> > > - Nic
> > > - Niranda
> > > - Raul Cumplido
> > > - Rok
> > > - Weston Pace
> > > - Will Jones
> > >
> > > N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
> > > not be able to host the fortnightly sync call. Is anyone available to
> run
> > > the meeting that day?
> > >
> > > Agenda:
> > >
> > > 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
> > > next ~1-2 weeks. See the ML post [1] for details, including a wiki page
> > > listing outstanding issues. In particular, there are some Go PRs that
> could
> > > use attention from an interested Go developer [2], as well as some
> temporal
> > > kernel PRs that could use a review [3].
> > >
> > > Arrow C++ Compute Engine: Weston gave a status update;
> APIs/documentation
> > > has been improved for users, though likely most will use it through an
> API
> > > like Substrait; basic Substrait support has been added with forthcoming
> > > improvements; more tooling to measure performance is being worked on;
> > > general kernel execution overhead is being addressed with an eye
> towards
> > > running smaller batches through the engine. An asof join
> implementation is
> > > being worked on, and Go is working towards Substrait bindings to be
> able to
> > > bind to the C++ engine.
> > >
> > > Kernel vectorization/SIMD: Eduardo has been looking at making some of
> the
> > > primitive kernels (e.g. arithmetic) more easily autovectorized by the
> > > compiler, testing a variety of approaches. See related discussion [4].
> We
> > > do not have benchmarks to evaluate compiler performance in this regard
> > > generally, but we have manually inspected some compiler output and
> found
> > > that not all compilers manage to do this with the current kernel
> > > implementations. We also don't have a holistic way to evaluate this
> going
> > > forward, nor do we have a sense for current benchmark coverage, though
> > > possibly we could generate benchmarks. However, it was pointed out that
> > > general engine performance is likely more important, and that current
> > > profiling indicates kernels are not yet a bottleneck, though there may
> be
> > > low-hanging fruit here.
> > >
> > > Flight/Flight SQL: we discussed the barriers to Flight SQL support in
> Go;
> > > Flight SQL heavily uses union types which are not yet implemented. A
> > > further proposal [5] has been submitted to extend the type metadata,
> please
> > > take a look for those interested. The GetXdbcTypeInfo proposal was
> merged,
> > > and the inline data proposal is still outstanding (but probably ready
> to
> > > have a vote).
> > >
> > > IPC/Format: it was asked if there's an IPC structure for serializing a
> > > single array to reduce overhead. Current APIs likely suffice but
> Niranda
> > > may submit a separate discussion to explain further.
> > >
> > > [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
> > > [2]: https://github.com/apache/arrow/pull/12158
> > > [3]: https://github.com/apache/arrow/pull/12657
> > > [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
> > > [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
> > >
> > > On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
> > > > Hi all,
> > > >
> > > > Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
> > > >
> > > > The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> > > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> > > >
> > > > Alternatively, enter this information into the Zoom website or app to
> > > > join the call:
> > > > Meeting ID: 876 4903 3008
> > > > Passcode: 958092
> > > >
> > > > Thanks,
> > > > Ian
> > >
>

Re: Arrow sync call April 13 at 12:00 US/Eastern, 16:00 UTC

Posted by Weston Pace <we...@gmail.com>.
I'm happy to provide a quarterly update on C++ engine work but in the
future I'll draft it in PR form so others have a chance to pitch in.
I was inspired by, and hope to mimic, the Rust community's very cool
quarterly roadmap [1][2] as a place to have higher level discussions
on what people are hoping to work on.  Since the C++ implementation
has quarterly releases we can probably sync up with releases so I'll
start a discussion about halfway to the 8.0.0 release.

[1] https://docs.google.com/document/d/1t64vZwZnXm9MyFj2qz3xcAkSxK3Wu12giS3KrS4nDE0/edit
[2] https://github.com/apache/arrow-datafusion/pull/2133

On Mon, Apr 18, 2022 at 7:20 AM Will Jones <wi...@gmail.com> wrote:
>
> Thanks Weston for providing the update on the C++ compute engine. IMO, it
> would be very welcome to have that update be a quarterly email to the dev
> mailing list, and may provide an opportunity to highlight issues in Jira
> that are good first issues or neglected but important.
>
> On Wed, Apr 13, 2022 at 10:00 AM David Li <li...@apache.org> wrote:
>
> > Attendees:
> >
> > - David Li
> > - Eduardo Ponce
> > - Gavin Ray
> > - Ian Cook
> > - James Duong
> > - Matthew Topol
> > - Nic
> > - Niranda
> > - Raul Cumplido
> > - Rok
> > - Weston Pace
> > - Will Jones
> >
> > N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
> > not be able to host the fortnightly sync call. Is anyone available to run
> > the meeting that day?
> >
> > Agenda:
> >
> > 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
> > next ~1-2 weeks. See the ML post [1] for details, including a wiki page
> > listing outstanding issues. In particular, there are some Go PRs that could
> > use attention from an interested Go developer [2], as well as some temporal
> > kernel PRs that could use a review [3].
> >
> > Arrow C++ Compute Engine: Weston gave a status update; APIs/documentation
> > has been improved for users, though likely most will use it through an API
> > like Substrait; basic Substrait support has been added with forthcoming
> > improvements; more tooling to measure performance is being worked on;
> > general kernel execution overhead is being addressed with an eye towards
> > running smaller batches through the engine. An asof join implementation is
> > being worked on, and Go is working towards Substrait bindings to be able to
> > bind to the C++ engine.
> >
> > Kernel vectorization/SIMD: Eduardo has been looking at making some of the
> > primitive kernels (e.g. arithmetic) more easily autovectorized by the
> > compiler, testing a variety of approaches. See related discussion [4]. We
> > do not have benchmarks to evaluate compiler performance in this regard
> > generally, but we have manually inspected some compiler output and found
> > that not all compilers manage to do this with the current kernel
> > implementations. We also don't have a holistic way to evaluate this going
> > forward, nor do we have a sense for current benchmark coverage, though
> > possibly we could generate benchmarks. However, it was pointed out that
> > general engine performance is likely more important, and that current
> > profiling indicates kernels are not yet a bottleneck, though there may be
> > low-hanging fruit here.
> >
> > Flight/Flight SQL: we discussed the barriers to Flight SQL support in Go;
> > Flight SQL heavily uses union types which are not yet implemented. A
> > further proposal [5] has been submitted to extend the type metadata, please
> > take a look for those interested. The GetXdbcTypeInfo proposal was merged,
> > and the inline data proposal is still outstanding (but probably ready to
> > have a vote).
> >
> > IPC/Format: it was asked if there's an IPC structure for serializing a
> > single array to reduce overhead. Current APIs likely suffice but Niranda
> > may submit a separate discussion to explain further.
> >
> > [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
> > [2]: https://github.com/apache/arrow/pull/12158
> > [3]: https://github.com/apache/arrow/pull/12657
> > [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
> > [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
> >
> > On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
> > > Hi all,
> > >
> > > Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
> > >
> > > The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> > >
> > > Alternatively, enter this information into the Zoom website or app to
> > > join the call:
> > > Meeting ID: 876 4903 3008
> > > Passcode: 958092
> > >
> > > Thanks,
> > > Ian
> >

Re: Arrow sync call April 13 at 12:00 US/Eastern, 16:00 UTC

Posted by Will Jones <wi...@gmail.com>.
Thanks Weston for providing the update on the C++ compute engine. IMO, it
would be very welcome to have that update be a quarterly email to the dev
mailing list, and may provide an opportunity to highlight issues in Jira
that are good first issues or neglected but important.

On Wed, Apr 13, 2022 at 10:00 AM David Li <li...@apache.org> wrote:

> Attendees:
>
> - David Li
> - Eduardo Ponce
> - Gavin Ray
> - Ian Cook
> - James Duong
> - Matthew Topol
> - Nic
> - Niranda
> - Raul Cumplido
> - Rok
> - Weston Pace
> - Will Jones
>
> N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
> not be able to host the fortnightly sync call. Is anyone available to run
> the meeting that day?
>
> Agenda:
>
> 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
> next ~1-2 weeks. See the ML post [1] for details, including a wiki page
> listing outstanding issues. In particular, there are some Go PRs that could
> use attention from an interested Go developer [2], as well as some temporal
> kernel PRs that could use a review [3].
>
> Arrow C++ Compute Engine: Weston gave a status update; APIs/documentation
> has been improved for users, though likely most will use it through an API
> like Substrait; basic Substrait support has been added with forthcoming
> improvements; more tooling to measure performance is being worked on;
> general kernel execution overhead is being addressed with an eye towards
> running smaller batches through the engine. An asof join implementation is
> being worked on, and Go is working towards Substrait bindings to be able to
> bind to the C++ engine.
>
> Kernel vectorization/SIMD: Eduardo has been looking at making some of the
> primitive kernels (e.g. arithmetic) more easily autovectorized by the
> compiler, testing a variety of approaches. See related discussion [4]. We
> do not have benchmarks to evaluate compiler performance in this regard
> generally, but we have manually inspected some compiler output and found
> that not all compilers manage to do this with the current kernel
> implementations. We also don't have a holistic way to evaluate this going
> forward, nor do we have a sense for current benchmark coverage, though
> possibly we could generate benchmarks. However, it was pointed out that
> general engine performance is likely more important, and that current
> profiling indicates kernels are not yet a bottleneck, though there may be
> low-hanging fruit here.
>
> Flight/Flight SQL: we discussed the barriers to Flight SQL support in Go;
> Flight SQL heavily uses union types which are not yet implemented. A
> further proposal [5] has been submitted to extend the type metadata, please
> take a look for those interested. The GetXdbcTypeInfo proposal was merged,
> and the inline data proposal is still outstanding (but probably ready to
> have a vote).
>
> IPC/Format: it was asked if there's an IPC structure for serializing a
> single array to reduce overhead. Current APIs likely suffice but Niranda
> may submit a separate discussion to explain further.
>
> [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
> [2]: https://github.com/apache/arrow/pull/12158
> [3]: https://github.com/apache/arrow/pull/12657
> [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
> [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
>
> On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
> > Hi all,
> >
> > Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
> >
> > The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> >
> > Alternatively, enter this information into the Zoom website or app to
> > join the call:
> > Meeting ID: 876 4903 3008
> > Passcode: 958092
> >
> > Thanks,
> > Ian
>

Re: Arrow sync call April 13 at 12:00 US/Eastern, 16:00 UTC

Posted by David Li <li...@apache.org>.
Attendees:

- David Li
- Eduardo Ponce
- Gavin Ray
- Ian Cook
- James Duong
- Matthew Topol
- Nic
- Niranda
- Raul Cumplido
- Rok
- Weston Pace
- Will Jones

N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will not be able to host the fortnightly sync call. Is anyone available to run the meeting that day?

Agenda:

8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the next ~1-2 weeks. See the ML post [1] for details, including a wiki page listing outstanding issues. In particular, there are some Go PRs that could use attention from an interested Go developer [2], as well as some temporal kernel PRs that could use a review [3].

Arrow C++ Compute Engine: Weston gave a status update; APIs/documentation has been improved for users, though likely most will use it through an API like Substrait; basic Substrait support has been added with forthcoming improvements; more tooling to measure performance is being worked on; general kernel execution overhead is being addressed with an eye towards running smaller batches through the engine. An asof join implementation is being worked on, and Go is working towards Substrait bindings to be able to bind to the C++ engine.

Kernel vectorization/SIMD: Eduardo has been looking at making some of the primitive kernels (e.g. arithmetic) more easily autovectorized by the compiler, testing a variety of approaches. See related discussion [4]. We do not have benchmarks to evaluate compiler performance in this regard generally, but we have manually inspected some compiler output and found that not all compilers manage to do this with the current kernel implementations. We also don't have a holistic way to evaluate this going forward, nor do we have a sense for current benchmark coverage, though possibly we could generate benchmarks. However, it was pointed out that general engine performance is likely more important, and that current profiling indicates kernels are not yet a bottleneck, though there may be low-hanging fruit here.

Flight/Flight SQL: we discussed the barriers to Flight SQL support in Go; Flight SQL heavily uses union types which are not yet implemented. A further proposal [5] has been submitted to extend the type metadata, please take a look for those interested. The GetXdbcTypeInfo proposal was merged, and the inline data proposal is still outstanding (but probably ready to have a vote). 

IPC/Format: it was asked if there's an IPC structure for serializing a single array to reduce overhead. Current APIs likely suffice but Niranda may submit a separate discussion to explain further. 

[1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
[2]: https://github.com/apache/arrow/pull/12158
[3]: https://github.com/apache/arrow/pull/12657
[4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
[5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6

On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
> Hi all,
>
> Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
>
> The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
>
> Alternatively, enter this information into the Zoom website or app to
> join the call:
> Meeting ID: 876 4903 3008
> Passcode: 958092
>
> Thanks,
> Ian