You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Andrew Lamb <al...@influxdata.com> on 2022/06/28 18:37:54 UTC

[RUST][DISCUSS] Donate object_store_rs to Arrow

Hello Rust / Arrow community,

There is a proposal [1] in DataFusion to switch the abstraction used to
read from remote object storage. Due to various reasons,  this code is
currently in its own crate/library [2] with unclear governance. We would
like to contribute this code to the Apache Arrow project and I am hoping to
gather community feedback on this idea.

The full details can be found on [3]. Here is the rationale

1. A common, high quality object store abstraction for communicating with
various remote object stores is useful for a range of projects and usecases.
2. Such a library is directly aligned with the Arrow mission of providing
building blocks for modern high performance analytics systems
3. The clear governance of Apache Arrow offers the best chance to build a
unified and strong community around this crate, hopefully both increasing
its adoption and attracting community contributions for its long term
evolution and maintenance

Please let us know your thoughts,
Andrew


[1] https://github.com/apache/arrow-datafusion/issues/2489
[2] https://github.com/influxdata/object_store_rs/issues/41
[3] https://github.com/influxdata/object_store_rs/issues/41

Re: [RUST][DISCUSS] Donate object_store_rs to Arrow

Posted by Andrew Lamb <al...@influxdata.com>.
Thank you all for the comments and discussion; Given there appears to be
consensus from the community to accept this donation, I have written up a
plan [1] and will begin to execute it over the coming weeks.

Any and all feedback is more than welcome.

Thanks again
Andrew

[1] https://github.com/apache/arrow-rs/issues/2030

On Sun, Jul 3, 2022 at 11:21 PM Andy Grove <an...@gmail.com> wrote:

> Apologies for being late to the discussion but I was busy at Data & AI
> summit last week. I am supportive of this initiative even though I have not
> been following this very closely.
>
> Thank you Andrew and Raphael for driving this innovation.
>
> Andy.
>
> On Wed, Jun 29, 2022 at 11:40 AM Raphael Taylor-Davies <
> tustvold@apache.org>
> wrote:
>
> > The proposal [1] to switch DataFusion to object_store_rs is looking like
> > it will be merged in the coming few days, so any concerns/reservations
> > please speak up!
> >
> > Regarding release management, my personal preference would be for it to
> > managed and released as part of the arrow-rs release process. This will
> > keep the maintenance burden low by building on a preexisting and mature
> > release process, whilst permitting closer integration between arrow-rs
> and
> > object storage going forward. However, should we decide on something
> > different, both myself and Andrew are happy to continue to support the
> > crates evolution regardless, and the hope is that through donating this
> > project to arrow others may come to share this burden in time
> >
> > [1]
> >
> https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1170271295
> >
> > On 2022/06/28 22:52:50 Will Jones wrote:
> > > Thanks for bringing up this discussion, Andrew!
> > >
> > > I started contributing to the crate in the hopes of using it in
> > DataFusion
> > > and delta-rs. I've thus far found it to be a high-quality codebase. I'd
> > > like to see the crate adopted throughout the Rust Arrow ecosystem, and
> if
> > > going under Apache governance helps that end, I support that.
> > >
> > > My one question is: Do we have maintainers (not sure if it has to be a
> > > committer or a PMC member) in the Apache project who are ready to take
> on
> > > release tasks for the crate?
> > >
> > >
> > >
> > > On Tue, Jun 28, 2022 at 11:44 AM Andrew Lamb <al...@influxdata.com>
> > wrote:
> > >
> > > > Hello Rust / Arrow community,
> > > >
> > > > There is a proposal [1] in DataFusion to switch the abstraction used
> to
> > > > read from remote object storage. Due to various reasons,  this code
> is
> > > > currently in its own crate/library [2] with unclear governance. We
> > would
> > > > like to contribute this code to the Apache Arrow project and I am
> > hoping to
> > > > gather community feedback on this idea.
> > > >
> > > > The full details can be found on [3]. Here is the rationale
> > > >
> > > > 1. A common, high quality object store abstraction for communicating
> > with
> > > > various remote object stores is useful for a range of projects and
> > > > usecases.
> > > > 2. Such a library is directly aligned with the Arrow mission of
> > providing
> > > > building blocks for modern high performance analytics systems
> > > > 3. The clear governance of Apache Arrow offers the best chance to
> > build a
> > > > unified and strong community around this crate, hopefully both
> > increasing
> > > > its adoption and attracting community contributions for its long term
> > > > evolution and maintenance
> > > >
> > > > Please let us know your thoughts,
> > > > Andrew
> > > >
> > > >
> > > > [1] https://github.com/apache/arrow-datafusion/issues/2489
> > > > [2] https://github.com/influxdata/object_store_rs/issues/41
> > > > [3] https://github.com/influxdata/object_store_rs/issues/41
> > > >
> > >
> >
>

Re: [RUST][DISCUSS] Donate object_store_rs to Arrow

Posted by Andy Grove <an...@gmail.com>.
Apologies for being late to the discussion but I was busy at Data & AI
summit last week. I am supportive of this initiative even though I have not
been following this very closely.

Thank you Andrew and Raphael for driving this innovation.

Andy.

On Wed, Jun 29, 2022 at 11:40 AM Raphael Taylor-Davies <tu...@apache.org>
wrote:

> The proposal [1] to switch DataFusion to object_store_rs is looking like
> it will be merged in the coming few days, so any concerns/reservations
> please speak up!
>
> Regarding release management, my personal preference would be for it to
> managed and released as part of the arrow-rs release process. This will
> keep the maintenance burden low by building on a preexisting and mature
> release process, whilst permitting closer integration between arrow-rs and
> object storage going forward. However, should we decide on something
> different, both myself and Andrew are happy to continue to support the
> crates evolution regardless, and the hope is that through donating this
> project to arrow others may come to share this burden in time
>
> [1]
> https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1170271295
>
> On 2022/06/28 22:52:50 Will Jones wrote:
> > Thanks for bringing up this discussion, Andrew!
> >
> > I started contributing to the crate in the hopes of using it in
> DataFusion
> > and delta-rs. I've thus far found it to be a high-quality codebase. I'd
> > like to see the crate adopted throughout the Rust Arrow ecosystem, and if
> > going under Apache governance helps that end, I support that.
> >
> > My one question is: Do we have maintainers (not sure if it has to be a
> > committer or a PMC member) in the Apache project who are ready to take on
> > release tasks for the crate?
> >
> >
> >
> > On Tue, Jun 28, 2022 at 11:44 AM Andrew Lamb <al...@influxdata.com>
> wrote:
> >
> > > Hello Rust / Arrow community,
> > >
> > > There is a proposal [1] in DataFusion to switch the abstraction used to
> > > read from remote object storage. Due to various reasons,  this code is
> > > currently in its own crate/library [2] with unclear governance. We
> would
> > > like to contribute this code to the Apache Arrow project and I am
> hoping to
> > > gather community feedback on this idea.
> > >
> > > The full details can be found on [3]. Here is the rationale
> > >
> > > 1. A common, high quality object store abstraction for communicating
> with
> > > various remote object stores is useful for a range of projects and
> > > usecases.
> > > 2. Such a library is directly aligned with the Arrow mission of
> providing
> > > building blocks for modern high performance analytics systems
> > > 3. The clear governance of Apache Arrow offers the best chance to
> build a
> > > unified and strong community around this crate, hopefully both
> increasing
> > > its adoption and attracting community contributions for its long term
> > > evolution and maintenance
> > >
> > > Please let us know your thoughts,
> > > Andrew
> > >
> > >
> > > [1] https://github.com/apache/arrow-datafusion/issues/2489
> > > [2] https://github.com/influxdata/object_store_rs/issues/41
> > > [3] https://github.com/influxdata/object_store_rs/issues/41
> > >
> >
>

Re: [RUST][DISCUSS] Donate object_store_rs to Arrow

Posted by Raphael Taylor-Davies <tu...@apache.org>.
The proposal [1] to switch DataFusion to object_store_rs is looking like it will be merged in the coming few days, so any concerns/reservations please speak up!

Regarding release management, my personal preference would be for it to managed and released as part of the arrow-rs release process. This will keep the maintenance burden low by building on a preexisting and mature release process, whilst permitting closer integration between arrow-rs and object storage going forward. However, should we decide on something different, both myself and Andrew are happy to continue to support the crates evolution regardless, and the hope is that through donating this project to arrow others may come to share this burden in time

[1] https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1170271295

On 2022/06/28 22:52:50 Will Jones wrote:
> Thanks for bringing up this discussion, Andrew!
> 
> I started contributing to the crate in the hopes of using it in DataFusion
> and delta-rs. I've thus far found it to be a high-quality codebase. I'd
> like to see the crate adopted throughout the Rust Arrow ecosystem, and if
> going under Apache governance helps that end, I support that.
> 
> My one question is: Do we have maintainers (not sure if it has to be a
> committer or a PMC member) in the Apache project who are ready to take on
> release tasks for the crate?
> 
> 
> 
> On Tue, Jun 28, 2022 at 11:44 AM Andrew Lamb <al...@influxdata.com> wrote:
> 
> > Hello Rust / Arrow community,
> >
> > There is a proposal [1] in DataFusion to switch the abstraction used to
> > read from remote object storage. Due to various reasons,  this code is
> > currently in its own crate/library [2] with unclear governance. We would
> > like to contribute this code to the Apache Arrow project and I am hoping to
> > gather community feedback on this idea.
> >
> > The full details can be found on [3]. Here is the rationale
> >
> > 1. A common, high quality object store abstraction for communicating with
> > various remote object stores is useful for a range of projects and
> > usecases.
> > 2. Such a library is directly aligned with the Arrow mission of providing
> > building blocks for modern high performance analytics systems
> > 3. The clear governance of Apache Arrow offers the best chance to build a
> > unified and strong community around this crate, hopefully both increasing
> > its adoption and attracting community contributions for its long term
> > evolution and maintenance
> >
> > Please let us know your thoughts,
> > Andrew
> >
> >
> > [1] https://github.com/apache/arrow-datafusion/issues/2489
> > [2] https://github.com/influxdata/object_store_rs/issues/41
> > [3] https://github.com/influxdata/object_store_rs/issues/41
> >
> 

Re: [RUST][DISCUSS] Donate object_store_rs to Arrow

Posted by Will Jones <wi...@gmail.com>.
Thanks for bringing up this discussion, Andrew!

I started contributing to the crate in the hopes of using it in DataFusion
and delta-rs. I've thus far found it to be a high-quality codebase. I'd
like to see the crate adopted throughout the Rust Arrow ecosystem, and if
going under Apache governance helps that end, I support that.

My one question is: Do we have maintainers (not sure if it has to be a
committer or a PMC member) in the Apache project who are ready to take on
release tasks for the crate?



On Tue, Jun 28, 2022 at 11:44 AM Andrew Lamb <al...@influxdata.com> wrote:

> Hello Rust / Arrow community,
>
> There is a proposal [1] in DataFusion to switch the abstraction used to
> read from remote object storage. Due to various reasons,  this code is
> currently in its own crate/library [2] with unclear governance. We would
> like to contribute this code to the Apache Arrow project and I am hoping to
> gather community feedback on this idea.
>
> The full details can be found on [3]. Here is the rationale
>
> 1. A common, high quality object store abstraction for communicating with
> various remote object stores is useful for a range of projects and
> usecases.
> 2. Such a library is directly aligned with the Arrow mission of providing
> building blocks for modern high performance analytics systems
> 3. The clear governance of Apache Arrow offers the best chance to build a
> unified and strong community around this crate, hopefully both increasing
> its adoption and attracting community contributions for its long term
> evolution and maintenance
>
> Please let us know your thoughts,
> Andrew
>
>
> [1] https://github.com/apache/arrow-datafusion/issues/2489
> [2] https://github.com/influxdata/object_store_rs/issues/41
> [3] https://github.com/influxdata/object_store_rs/issues/41
>