You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Andy Grove <an...@gmail.com> on 2022/12/07 00:50:09 UTC

[Rust] Time to move datafusion-substrait development into DataFusion project?

Hi,

The DataFusion community has built an integration between DataFusion and
Substrait under the datafusion-contrib GitHub organization [1].

The project is now receiving regular contributions from NVIDIA (who are
using it internally for a research project), and now GreptimeDB have
expressed an interest in contributing as well [2].

I think that we should consider moving development of this crate into
DataFusion and wanted to see what others think of this idea.

One reason for moving it into the official project is so that we can access
this functionality from the DataFusion Python bindings without adding a
dependency on a project that is outside ASF governance.

I have created a GtiHub issue [3] where we can discuss this more, or we can
discuss it here on the mailing list.

I look forward to hearing some opinions on this.

Thanks,

Andy.

[1] https://github.com/datafusion-contrib/datafusion-substrait
[2] https://github.com/datafusion-contrib/datafusion-substrait/pull/34
[3] https://github.com/apache/arrow-datafusion/issues/4536

Re: [Rust] Time to move datafusion-substrait development into DataFusion project?

Posted by Andy Grove <an...@gmail.com>.
Thanks for the feedback here and on the issue.

I have started the IP clearance process and am currently waiting for one
contributor to submit an ICLA. Once I have that I will file the paperwork
with the ASF.



On Wed, Dec 7, 2022 at 1:54 PM Martin Grigorov <mg...@apache.org> wrote:

> Hi Andy,
>
> Usually I'd say that development outside of ASF should be faster because
> you can publish a new release even after each commit.
> In ASF you need to do a VOTE and wait for 3 binding +1s and 72 hours.
> A user of datafusion-substrait could use git dependency to use latest
> version even without published crate.
> But I see that datafusion-substrait currently depends on datafusion 13.0
> and I guess this is the main reason for moving it to arrow-datafusion.
> Another solution would be datafusion-substrait to depend on
> arrow-datafusion master via a git dependency.
>
> +1 to move it as a subproject to arrow-datafusion now!
> This will avoid collecting [more] (I)CLAs later and I see that there are
> plans replace datafusion-proto with it at some point.
>
> Martin
>
> On Wed, Dec 7, 2022 at 2:50 AM Andy Grove <an...@gmail.com> wrote:
>
> > Hi,
> >
> > The DataFusion community has built an integration between DataFusion and
> > Substrait under the datafusion-contrib GitHub organization [1].
> >
> > The project is now receiving regular contributions from NVIDIA (who are
> > using it internally for a research project), and now GreptimeDB have
> > expressed an interest in contributing as well [2].
> >
> > I think that we should consider moving development of this crate into
> > DataFusion and wanted to see what others think of this idea.
> >
> > One reason for moving it into the official project is so that we can
> access
> > this functionality from the DataFusion Python bindings without adding a
> > dependency on a project that is outside ASF governance.
> >
> > I have created a GtiHub issue [3] where we can discuss this more, or we
> can
> > discuss it here on the mailing list.
> >
> > I look forward to hearing some opinions on this.
> >
> > Thanks,
> >
> > Andy.
> >
> > [1] https://github.com/datafusion-contrib/datafusion-substrait
> > [2] https://github.com/datafusion-contrib/datafusion-substrait/pull/34
> > [3] https://github.com/apache/arrow-datafusion/issues/4536
> >
>

Re: [Rust] Time to move datafusion-substrait development into DataFusion project?

Posted by Martin Grigorov <mg...@apache.org>.
Hi Andy,

Usually I'd say that development outside of ASF should be faster because
you can publish a new release even after each commit.
In ASF you need to do a VOTE and wait for 3 binding +1s and 72 hours.
A user of datafusion-substrait could use git dependency to use latest
version even without published crate.
But I see that datafusion-substrait currently depends on datafusion 13.0
and I guess this is the main reason for moving it to arrow-datafusion.
Another solution would be datafusion-substrait to depend on
arrow-datafusion master via a git dependency.

+1 to move it as a subproject to arrow-datafusion now!
This will avoid collecting [more] (I)CLAs later and I see that there are
plans replace datafusion-proto with it at some point.

Martin

On Wed, Dec 7, 2022 at 2:50 AM Andy Grove <an...@gmail.com> wrote:

> Hi,
>
> The DataFusion community has built an integration between DataFusion and
> Substrait under the datafusion-contrib GitHub organization [1].
>
> The project is now receiving regular contributions from NVIDIA (who are
> using it internally for a research project), and now GreptimeDB have
> expressed an interest in contributing as well [2].
>
> I think that we should consider moving development of this crate into
> DataFusion and wanted to see what others think of this idea.
>
> One reason for moving it into the official project is so that we can access
> this functionality from the DataFusion Python bindings without adding a
> dependency on a project that is outside ASF governance.
>
> I have created a GtiHub issue [3] where we can discuss this more, or we can
> discuss it here on the mailing list.
>
> I look forward to hearing some opinions on this.
>
> Thanks,
>
> Andy.
>
> [1] https://github.com/datafusion-contrib/datafusion-substrait
> [2] https://github.com/datafusion-contrib/datafusion-substrait/pull/34
> [3] https://github.com/apache/arrow-datafusion/issues/4536
>