You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Ian Cook <ia...@ursacomputing.com> on 2022/06/08 14:44:35 UTC

Arrow sync call June 8 at 12:00 US/Eastern, 16:00 UTC

Hi all,

Our biweekly sync call is today at 12:00 noon Eastern time.

The Zoom meeting URL for this and other biweekly Arrow sync calls is:
https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09

Alternatively, enter this information into the Zoom website or app to
join the call:
Meeting ID: 876 4903 3008
Passcode: 958092

Thanks,
Ian

Re: Arrow sync call June 8 at 12:00 US/Eastern, 16:00 UTC

Posted by Gavin Ray <ra...@gmail.com>.
This is awesome, thanks so much for the comprehensive reply

RE: point #9, also holding my breath for data update operations
(INSERT/UPDATE/DELETE) to be added to Substrait
Have an open issue about it, it needs design work (which I don't think I'm
qualified to do)

Add Insert/Update/Delete basic functionality to specification · Issue #128
· substrait-io/substrait (github.com)
<https://github.com/substrait-io/substrait/issues/128>

On Wed, Jun 8, 2022 at 11:09 PM Ian Cook <ia...@ursacomputing.com> wrote:

> Hi Gavin,
>
> There was no detailed discussion in the meeting about this, just some
> general comments, but I'll share a few areas of collaboration that I'm
> aware of:
> - There is work ongoing to enable the Arrow C++ compute engine (aka
> "Acero") to consume Substrait plans, change them into ExecPlans, and
> execute them. Work started on this late last year [1] and has
> continued since then [2].
> - There are plans to adopt Substrait in DataFusion [3] and Ballista [4]
>
> There are also several other Sustrait-related projects not directly in
> Arrow repos that engineers at Voltron Data are working on:
> - Creating a Substrait compiler for Ibis [5], to allow Python users to
> write code in a convenient analytics DSL and have it execute on
> engines that can consume Substrait
> - Creating a Substrait compiler for dplyr [6], to allow R users to
> write dplyr code that can execute on engines that can consume
> Substrait
> - Creating a Substrait plan validator [7]
> - Planning for "ADBC" to support Substrait [8]
> - Defining more functions in the Substrait specification [9] <-- This
> is an area where we could use more help
>
> Thanks,
> Ian
>
> [1] https://github.com/apache/arrow/pull/11707
> [2]
> https://github.com/apache/arrow/pulls?q=is%3Apr+substrait+label%3Alang-c%2B%2B
> [3] https://github.com/apache/arrow-datafusion/issues/2646
> [4] https://github.com/apache/arrow-ballista/issues/32
> [5] https://github.com/ibis-project/ibis-substrait/
> [6] https://github.com/voltrondata/substrait-r
> [7] http://github.com/substrait-io/substrait-validator
> [8]
> https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/
> [9] https://github.com/substrait-io/substrait/tree/main/extensions
>
>
>
> On Wed, Jun 8, 2022 at 5:41 PM Gavin Ray <ra...@gmail.com> wrote:
> >
> > Thanks Ian -- can I ask whether there was any discussion of note that
> > happened around Arrow + Substrait stuff?
> >
> >
> > On Wed, Jun 8, 2022 at 5:31 PM Ian Cook <ia...@ursacomputing.com> wrote:
> >
> > > Attendees:
> > >
> > > Ian Cook
> > > Raúl Cumplido
> > > Alenka Frim
> > > Ian Joiner
> > > Will Jones
> > > Jorge Leitão
> > > David Li
> > > Rok Mihevc
> > > Ashish Paliwal
> > > Matthew Topol
> > > Jacob Wujciak
> > >
> > >
> > > Discussion:
> > >
> > > Recent changes to the merge script for apache/arrow PRs
> > > - Now uses a personal access token (PAT) to authenticate to the ASF
> Jira
> > > - Now requires the GitHub PAT to have workflow scope
> > > - See discussion about this on Zulip [1]
> > >
> > > Stabilizing the C Stream interface
> > > - It has been 20 months since its introduction, with no changes
> > > - See the ML discussion [2] about this
> > > - Will Jones has put up two PRs [3][4] and started a vote [5] about
> > > this on the mailing list
> > >
> > > Changes to release management guide
> > > - Most of the content from the release management guide has been moved
> > > [6] from Confluence [7] to the Arrow repo [8] where it is built as
> > > part of the Arrow docs site [9]
> > >
> > > Proposed changes to release process
> > > -  Raúl has proposed [10] a change to the release process to simplify
> > > creation of release candidates and has opened a PR [11] to update the
> > > release management guide to reflect this change
> > >
> > > Substrait project
> > > - There is more collaboration happening between the Arrow and Substrait
> > > projects
> > > - There is a Substrait Community page [12] with details about how to
> > > get involved in Substrait
> > >
> > > Proposal to Dockerize the integration tests:
> > > - Jorge opened a PR proposing this [13] that Raúl and Jacob are
> reviewing
> > >
> > > [1]
> > >
> https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Merge.20script.20with.20API.20keys/near/285049925
> > > [2] https://lists.apache.org/thread/0y604o9s3wkyty328wv8d21ol7s40q55
> > > [3] https://github.com/apache/arrow/pull/13345
> > > [4] https://github.com/apache/arrow-rs/pull/1821
> > > [5] https://lists.apache.org/thread/5bvk6m3y3wl0m4jdsnyhdylt1w5j288k
> > > [6] https://github.com/apache/arrow/pull/13272
> > > [7]
> > >
> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> > > [8]
> > >
> https://github.com/apache/arrow/blob/master/docs/source/developers/release.rst
> > > [9] https://arrow.apache.org/docs/dev/developers/release.html
> > > [10] https://lists.apache.org/thread/g6mqpyq2hc11xbgrq2pf653njzy53plt
> > > [11] https://github.com/apache/arrow/pull/13308
> > > [12] https://substrait.io/community/
> > > [13] https://github.com/apache/arrow/pull/12407
> > >
> > > On Wed, Jun 8, 2022 at 10:44 AM Ian Cook <ia...@ursacomputing.com>
> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Our biweekly sync call is today at 12:00 noon Eastern time.
> > > >
> > > > The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> > > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> > > >
> > > > Alternatively, enter this information into the Zoom website or app to
> > > > join the call:
> > > > Meeting ID: 876 4903 3008
> > > > Passcode: 958092
> > > >
> > > > Thanks,
> > > > Ian
> > >
>

Re: Arrow sync call June 8 at 12:00 US/Eastern, 16:00 UTC

Posted by Ian Cook <ia...@ursacomputing.com>.
Hi Gavin,

There was no detailed discussion in the meeting about this, just some
general comments, but I'll share a few areas of collaboration that I'm
aware of:
- There is work ongoing to enable the Arrow C++ compute engine (aka
"Acero") to consume Substrait plans, change them into ExecPlans, and
execute them. Work started on this late last year [1] and has
continued since then [2].
- There are plans to adopt Substrait in DataFusion [3] and Ballista [4]

There are also several other Sustrait-related projects not directly in
Arrow repos that engineers at Voltron Data are working on:
- Creating a Substrait compiler for Ibis [5], to allow Python users to
write code in a convenient analytics DSL and have it execute on
engines that can consume Substrait
- Creating a Substrait compiler for dplyr [6], to allow R users to
write dplyr code that can execute on engines that can consume
Substrait
- Creating a Substrait plan validator [7]
- Planning for "ADBC" to support Substrait [8]
- Defining more functions in the Substrait specification [9] <-- This
is an area where we could use more help

Thanks,
Ian

[1] https://github.com/apache/arrow/pull/11707
[2] https://github.com/apache/arrow/pulls?q=is%3Apr+substrait+label%3Alang-c%2B%2B
[3] https://github.com/apache/arrow-datafusion/issues/2646
[4] https://github.com/apache/arrow-ballista/issues/32
[5] https://github.com/ibis-project/ibis-substrait/
[6] https://github.com/voltrondata/substrait-r
[7] http://github.com/substrait-io/substrait-validator
[8] https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/
[9] https://github.com/substrait-io/substrait/tree/main/extensions



On Wed, Jun 8, 2022 at 5:41 PM Gavin Ray <ra...@gmail.com> wrote:
>
> Thanks Ian -- can I ask whether there was any discussion of note that
> happened around Arrow + Substrait stuff?
>
>
> On Wed, Jun 8, 2022 at 5:31 PM Ian Cook <ia...@ursacomputing.com> wrote:
>
> > Attendees:
> >
> > Ian Cook
> > Raúl Cumplido
> > Alenka Frim
> > Ian Joiner
> > Will Jones
> > Jorge Leitão
> > David Li
> > Rok Mihevc
> > Ashish Paliwal
> > Matthew Topol
> > Jacob Wujciak
> >
> >
> > Discussion:
> >
> > Recent changes to the merge script for apache/arrow PRs
> > - Now uses a personal access token (PAT) to authenticate to the ASF Jira
> > - Now requires the GitHub PAT to have workflow scope
> > - See discussion about this on Zulip [1]
> >
> > Stabilizing the C Stream interface
> > - It has been 20 months since its introduction, with no changes
> > - See the ML discussion [2] about this
> > - Will Jones has put up two PRs [3][4] and started a vote [5] about
> > this on the mailing list
> >
> > Changes to release management guide
> > - Most of the content from the release management guide has been moved
> > [6] from Confluence [7] to the Arrow repo [8] where it is built as
> > part of the Arrow docs site [9]
> >
> > Proposed changes to release process
> > -  Raúl has proposed [10] a change to the release process to simplify
> > creation of release candidates and has opened a PR [11] to update the
> > release management guide to reflect this change
> >
> > Substrait project
> > - There is more collaboration happening between the Arrow and Substrait
> > projects
> > - There is a Substrait Community page [12] with details about how to
> > get involved in Substrait
> >
> > Proposal to Dockerize the integration tests:
> > - Jorge opened a PR proposing this [13] that Raúl and Jacob are reviewing
> >
> > [1]
> > https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Merge.20script.20with.20API.20keys/near/285049925
> > [2] https://lists.apache.org/thread/0y604o9s3wkyty328wv8d21ol7s40q55
> > [3] https://github.com/apache/arrow/pull/13345
> > [4] https://github.com/apache/arrow-rs/pull/1821
> > [5] https://lists.apache.org/thread/5bvk6m3y3wl0m4jdsnyhdylt1w5j288k
> > [6] https://github.com/apache/arrow/pull/13272
> > [7]
> > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> > [8]
> > https://github.com/apache/arrow/blob/master/docs/source/developers/release.rst
> > [9] https://arrow.apache.org/docs/dev/developers/release.html
> > [10] https://lists.apache.org/thread/g6mqpyq2hc11xbgrq2pf653njzy53plt
> > [11] https://github.com/apache/arrow/pull/13308
> > [12] https://substrait.io/community/
> > [13] https://github.com/apache/arrow/pull/12407
> >
> > On Wed, Jun 8, 2022 at 10:44 AM Ian Cook <ia...@ursacomputing.com> wrote:
> > >
> > > Hi all,
> > >
> > > Our biweekly sync call is today at 12:00 noon Eastern time.
> > >
> > > The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> > >
> > > Alternatively, enter this information into the Zoom website or app to
> > > join the call:
> > > Meeting ID: 876 4903 3008
> > > Passcode: 958092
> > >
> > > Thanks,
> > > Ian
> >

Re: Arrow sync call June 8 at 12:00 US/Eastern, 16:00 UTC

Posted by Gavin Ray <ra...@gmail.com>.
Thanks Ian -- can I ask whether there was any discussion of note that
happened around Arrow + Substrait stuff?


On Wed, Jun 8, 2022 at 5:31 PM Ian Cook <ia...@ursacomputing.com> wrote:

> Attendees:
>
> Ian Cook
> Raúl Cumplido
> Alenka Frim
> Ian Joiner
> Will Jones
> Jorge Leitão
> David Li
> Rok Mihevc
> Ashish Paliwal
> Matthew Topol
> Jacob Wujciak
>
>
> Discussion:
>
> Recent changes to the merge script for apache/arrow PRs
> - Now uses a personal access token (PAT) to authenticate to the ASF Jira
> - Now requires the GitHub PAT to have workflow scope
> - See discussion about this on Zulip [1]
>
> Stabilizing the C Stream interface
> - It has been 20 months since its introduction, with no changes
> - See the ML discussion [2] about this
> - Will Jones has put up two PRs [3][4] and started a vote [5] about
> this on the mailing list
>
> Changes to release management guide
> - Most of the content from the release management guide has been moved
> [6] from Confluence [7] to the Arrow repo [8] where it is built as
> part of the Arrow docs site [9]
>
> Proposed changes to release process
> -  Raúl has proposed [10] a change to the release process to simplify
> creation of release candidates and has opened a PR [11] to update the
> release management guide to reflect this change
>
> Substrait project
> - There is more collaboration happening between the Arrow and Substrait
> projects
> - There is a Substrait Community page [12] with details about how to
> get involved in Substrait
>
> Proposal to Dockerize the integration tests:
> - Jorge opened a PR proposing this [13] that Raúl and Jacob are reviewing
>
> [1]
> https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Merge.20script.20with.20API.20keys/near/285049925
> [2] https://lists.apache.org/thread/0y604o9s3wkyty328wv8d21ol7s40q55
> [3] https://github.com/apache/arrow/pull/13345
> [4] https://github.com/apache/arrow-rs/pull/1821
> [5] https://lists.apache.org/thread/5bvk6m3y3wl0m4jdsnyhdylt1w5j288k
> [6] https://github.com/apache/arrow/pull/13272
> [7]
> https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> [8]
> https://github.com/apache/arrow/blob/master/docs/source/developers/release.rst
> [9] https://arrow.apache.org/docs/dev/developers/release.html
> [10] https://lists.apache.org/thread/g6mqpyq2hc11xbgrq2pf653njzy53plt
> [11] https://github.com/apache/arrow/pull/13308
> [12] https://substrait.io/community/
> [13] https://github.com/apache/arrow/pull/12407
>
> On Wed, Jun 8, 2022 at 10:44 AM Ian Cook <ia...@ursacomputing.com> wrote:
> >
> > Hi all,
> >
> > Our biweekly sync call is today at 12:00 noon Eastern time.
> >
> > The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> >
> > Alternatively, enter this information into the Zoom website or app to
> > join the call:
> > Meeting ID: 876 4903 3008
> > Passcode: 958092
> >
> > Thanks,
> > Ian
>

Re: Arrow sync call June 8 at 12:00 US/Eastern, 16:00 UTC

Posted by Ian Cook <ia...@ursacomputing.com>.
Attendees:

Ian Cook
Raúl Cumplido
Alenka Frim
Ian Joiner
Will Jones
Jorge Leitão
David Li
Rok Mihevc
Ashish Paliwal
Matthew Topol
Jacob Wujciak


Discussion:

Recent changes to the merge script for apache/arrow PRs
- Now uses a personal access token (PAT) to authenticate to the ASF Jira
- Now requires the GitHub PAT to have workflow scope
- See discussion about this on Zulip [1]

Stabilizing the C Stream interface
- It has been 20 months since its introduction, with no changes
- See the ML discussion [2] about this
- Will Jones has put up two PRs [3][4] and started a vote [5] about
this on the mailing list

Changes to release management guide
- Most of the content from the release management guide has been moved
[6] from Confluence [7] to the Arrow repo [8] where it is built as
part of the Arrow docs site [9]

Proposed changes to release process
-  Raúl has proposed [10] a change to the release process to simplify
creation of release candidates and has opened a PR [11] to update the
release management guide to reflect this change

Substrait project
- There is more collaboration happening between the Arrow and Substrait projects
- There is a Substrait Community page [12] with details about how to
get involved in Substrait

Proposal to Dockerize the integration tests:
- Jorge opened a PR proposing this [13] that Raúl and Jacob are reviewing

[1] https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Merge.20script.20with.20API.20keys/near/285049925
[2] https://lists.apache.org/thread/0y604o9s3wkyty328wv8d21ol7s40q55
[3] https://github.com/apache/arrow/pull/13345
[4] https://github.com/apache/arrow-rs/pull/1821
[5] https://lists.apache.org/thread/5bvk6m3y3wl0m4jdsnyhdylt1w5j288k
[6] https://github.com/apache/arrow/pull/13272
[7] https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
[8] https://github.com/apache/arrow/blob/master/docs/source/developers/release.rst
[9] https://arrow.apache.org/docs/dev/developers/release.html
[10] https://lists.apache.org/thread/g6mqpyq2hc11xbgrq2pf653njzy53plt
[11] https://github.com/apache/arrow/pull/13308
[12] https://substrait.io/community/
[13] https://github.com/apache/arrow/pull/12407

On Wed, Jun 8, 2022 at 10:44 AM Ian Cook <ia...@ursacomputing.com> wrote:
>
> Hi all,
>
> Our biweekly sync call is today at 12:00 noon Eastern time.
>
> The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
>
> Alternatively, enter this information into the Zoom website or app to
> join the call:
> Meeting ID: 876 4903 3008
> Passcode: 958092
>
> Thanks,
> Ian