You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Jorge Cardoso Leitão <jo...@gmail.com> on 2021/08/02 17:58:40 UTC

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Hi,

Sorry for the delay.

If there is a path towards an official release under a <1.0.0 versioning
schema aligned with the rest of the Rust ecosystem and in line with the
stability of the API, then IMO we should move all development to within
Apache experimental asap (I can handle this and the likely IP clearance
round). If we require a release >=1.X.Y to it and/or a schedule, then I
prefer to keep expectations aligned and postpone any movement.

Under the move situation, I was thinking in something as follows:

* gradually stop maintaining "arrow" in crates, offering a maintenance
window over which we release patches (*)
* work towards achieving feature parity on arrow2/parquet2 on the
experimental repos.
* keep releasing arrow2/parquet2 under a 0.X model during the step above
(**)
* migrate to arrow-rs and archive experimentals (***)
* break arrow2 in smaller crates so that we can version the APIs at a
different cadence
* once a crate reaches some stability (this is always opinionated, but it
is fine), we bump it to 1.0 and announce a maintenance plan ala tokio
<https://tokio.rs/blog/2020-12-tokio-1-0>.

(*) e.g. "we will continue to patch the arrow crate up to at least 6 months
starting after the first release of arrow2 that supports
a) nested parquet read and write
b) union array (including IPC integration tests)
c) map array (including IPC integration tests)"

(**) officially or un-officially (I would suggest officially so that we can
acknowledge everyone's work on it, but no strong feelings)

(***) something like:
1. place arrow2 on top of a clear arrow repo so that the full contribution
history up to that point preserved
2. make arrow-rs the home of arrow2 (i.e. we start releasing arrow2 from
arrow-rs) and archive the experimental repos; create arrow-rs-parquet or
something for parquet2.

In summary, the core pain point for me is the current versioning of arrow,
which I feel is incompatible with my goals for arrow2 and the ecosystem I
envision it supporting :)

Best,
Jorge

On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <we...@gmail.com> wrote:

> I think it would also be fine to push “beta” arrow2 crates out of a repo
> under apache/ so long as they are not marked on crates.io as being
> Apache-official releases. There’s a possible slippery slope there, but as
> long as we are on a path to formalizing the releases I think it is okay.
>
> On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com> wrote:
>
> > Jorge -- do you feel like we have a resolution on what to do with arrow2
> in
> > the near term?
> >
> > The current state of affairs seems to me that arrow2 is released from
> > https://github.com/jorgecarleitao/arrow2 to crates.io (which is fine).
> > Are
> > you happy with keeping development in the jorgecarleitao repo where you
> > will retain maximal control and flexibility until it is ready to start
> > integrating?
> >
> > Or would you prefer to put it into one of the apache repos and subject
> its
> > development and release to the normal Arrow governance model (tarball,
> > vote, etc)?
> >
> > Since you are the primary author/architect I think you should have a
> > substantial say at this stage.
> >
> > Andrew
> >
> >
> > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> >
> > > I would be happy with this approach. Thank you for the suggestion
> > >
> > > This hybrid approach of both arrow and arrow2 in the same repo seems
> > > better to me than separate repos.
> > >
> > > What I really care about is ensuring we don't have two crates/APIs
> > > indefinitely -- as long as we are continually making progress towards
> > > unification that is what is important to me.
> > >
> > > Andrew
> > >
> > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove <an...@gmail.com>
> > wrote:
> > >
> > >> Apologies for being late to this discussion.
> > >>
> > >> There is a hybrid option to consider here where we add the arrow2 code
> > >> into
> > >> the arrow crate as a separate module, so we release one crate
> containing
> > >> the "old" API (which we can mark as deprecated) as well as the new
> API.
> > >> Java did a similar thing a long time ago with "java.io" versus
> > "java.nio"
> > >> (new IO).
> > >>
> > >> I agree that the versioning wouldn't be ideal, but this seems like it
> > >> might
> > >> be a pragmatic compromise?
> > >>
> > >> Thanks,
> > >>
> > >> Andy.
> > >>
> > >>
> > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb <al...@influxdata.com>
> > wrote:
> > >>
> > >> > What I meant is that when you decide arrow2 is suitable for release
> to
> > >> > existing arrow users, I stand ready to help you incorporate it into
> > >> arrow.
> > >> >
> > >> > All the feedback I have heard so far from the rest of the community
> is
> > >> that
> > >> > we are ready. One might even say we are anxious to do so :)
> > >> >
> > >> > Andrew
> > >> >
> > >>
> > >
> >
>

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Posted by Benjamin Blodgett <be...@gmail.com>.
great idea!

On Tue, Aug 3, 2021 at 8:49 AM Andy Grove <an...@gmail.com> wrote:

> I also like the idea of moving arrow2/parquet2 into the official repos.
> This is effectively what we did with Ballista, which is still experimental.
> Ballista was simpler because it depends on DataFusion rather than the other
> way around, but I like the idea of using feature flags to enable DataFusion
> on arrow2/parquet2.
>
> I don't see any reason why we wouldn't be able to also release
> arrow2/parquet2 with suitable 0.x.x versioning as well (as we plan on doing
> with Ballista) and releasing would be much easier if they are in the
> official repos.
>
>
> On Tue, Aug 3, 2021 at 7:13 AM paddy horan <pa...@hotmail.com> wrote:
>
> > Hi Jorge,
> >
> > What do you think about moving Arrow2 into the main Arrow repo where it
> is
> > only enabled via an "experimental" feature flag?  This would allow
> > development of Arrow2 to proceed in the main repo but also this would be
> a
> > clear signal that Arrow2 is <1.0.  When we feel ready (i.e. Arrow2 is
> 1.0)
> > we can release it in the next main release with Arrow2 being the default
> > and move the existing implementation behind a "legacy" feature flag.
> >
> > Here is why I think this might work well:
> >  - People contributing to the Arrow project will naturally contribute to
> > Arrow2.  At the moment, some people will still contribute to Arrow
> instead
> > of Arrow2 just by virtue of it being the "official" implementation.
> > However, if both are in one repo people will want to contribute to the
> > "future", i.e. Arrow2.
> >  - the experimental flag will be a clear signal to the existing Arrow
> > community that Arrow2 is the future but that it is <1.0
> >  - existing users will be well supported in this transition
> >  - In general, I think the longer that development proceeds in separate
> > repos the harder it will be to eventually merge the two in a way that
> > supports existing users.
> >
> > Do you think would work?
> >
> > Paddy
> >
> > -----Original Message-----
> > From: Jorge Cardoso Leitão <jo...@gmail.com>
> > Sent: Monday, August 2, 2021 1:59 PM
> > To: dev@arrow.apache.org
> > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> >
> > Hi,
> >
> > Sorry for the delay.
> >
> > If there is a path towards an official release under a <1.0.0 versioning
> > schema aligned with the rest of the Rust ecosystem and in line with the
> > stability of the API, then IMO we should move all development to within
> > Apache experimental asap (I can handle this and the likely IP clearance
> > round). If we require a release >=1.X.Y to it and/or a schedule, then I
> > prefer to keep expectations aligned and postpone any movement.
> >
> > Under the move situation, I was thinking in something as follows:
> >
> > * gradually stop maintaining "arrow" in crates, offering a maintenance
> > window over which we release patches (*)
> > * work towards achieving feature parity on arrow2/parquet2 on the
> > experimental repos.
> > * keep releasing arrow2/parquet2 under a 0.X model during the step above
> > (**)
> > * migrate to arrow-rs and archive experimentals (***)
> > * break arrow2 in smaller crates so that we can version the APIs at a
> > different cadence
> > * once a crate reaches some stability (this is always opinionated, but it
> > is fine), we bump it to 1.0 and announce a maintenance plan ala tokio <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lpj8KTpf3c3t0zxo28dSqtuJ82xfMtPssmxzNkrj%2BBQ%3D&amp;reserved=0
> > >.
> >
> > (*) e.g. "we will continue to patch the arrow crate up to at least 6
> > months starting after the first release of arrow2 that supports
> > a) nested parquet read and write
> > b) union array (including IPC integration tests)
> > c) map array (including IPC integration tests)"
> >
> > (**) officially or un-officially (I would suggest officially so that we
> > can acknowledge everyone's work on it, but no strong feelings)
> >
> > (***) something like:
> > 1. place arrow2 on top of a clear arrow repo so that the full
> contribution
> > history up to that point preserved 2. make arrow-rs the home of arrow2
> > (i.e. we start releasing arrow2 from
> > arrow-rs) and archive the experimental repos; create arrow-rs-parquet or
> > something for parquet2.
> >
> > In summary, the core pain point for me is the current versioning of
> arrow,
> > which I feel is incompatible with my goals for arrow2 and the ecosystem I
> > envision it supporting :)
> >
> > Best,
> > Jorge
> >
> > On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > I think it would also be fine to push "beta" arrow2 crates out of a
> > > repo under apache/ so long as they are not marked on crates.io as
> > > being Apache-official releases. There's a possible slippery slope
> > > there, but as long as we are on a path to formalizing the releases I
> > think it is okay.
> > >
> > > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com>
> > wrote:
> > >
> > > > Jorge -- do you feel like we have a resolution on what to do with
> > > > arrow2
> > > in
> > > > the near term?
> > > >
> > > > The current state of affairs seems to me that arrow2 is released
> > > > from
> > > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=W1TaT%2BFVGrGL1Oay9QclLozhkfNS78jPdrkZFIFRtjA%3D&amp;reserved=0
> > to crates.io (which is fine).
> > > > Are
> > > > you happy with keeping development in the jorgecarleitao repo where
> > > > you will retain maximal control and flexibility until it is ready to
> > > > start integrating?
> > > >
> > > > Or would you prefer to put it into one of the apache repos and
> > > > subject
> > > its
> > > > development and release to the normal Arrow governance model
> > > > (tarball, vote, etc)?
> > > >
> > > > Since you are the primary author/architect I think you should have a
> > > > substantial say at this stage.
> > > >
> > > > Andrew
> > > >
> > > >
> > > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <al...@influxdata.com>
> > > wrote:
> > > >
> > > > > I would be happy with this approach. Thank you for the suggestion
> > > > >
> > > > > This hybrid approach of both arrow and arrow2 in the same repo
> > > > > seems better to me than separate repos.
> > > > >
> > > > > What I really care about is ensuring we don't have two crates/APIs
> > > > > indefinitely -- as long as we are continually making progress
> > > > > towards unification that is what is important to me.
> > > > >
> > > > > Andrew
> > > > >
> > > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove <an...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Apologies for being late to this discussion.
> > > > >>
> > > > >> There is a hybrid option to consider here where we add the arrow2
> > > > >> code into the arrow crate as a separate module, so we release one
> > > > >> crate
> > > containing
> > > > >> the "old" API (which we can mark as deprecated) as well as the
> > > > >> new
> > > API.
> > > > >> Java did a similar thing a long time ago with "java.io" versus
> > > > "java.nio"
> > > > >> (new IO).
> > > > >>
> > > > >> I agree that the versioning wouldn't be ideal, but this seems
> > > > >> like it might be a pragmatic compromise?
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Andy.
> > > > >>
> > > > >>
> > > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb
> > > > >> <al...@influxdata.com>
> > > > wrote:
> > > > >>
> > > > >> > What I meant is that when you decide arrow2 is suitable for
> > > > >> > release
> > > to
> > > > >> > existing arrow users, I stand ready to help you incorporate it
> > > > >> > into
> > > > >> arrow.
> > > > >> >
> > > > >> > All the feedback I have heard so far from the rest of the
> > > > >> > community
> > > is
> > > > >> that
> > > > >> > we are ready. One might even say we are anxious to do so :)
> > > > >> >
> > > > >> > Andrew
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Posted by Andy Grove <an...@gmail.com>.
I also like the idea of moving arrow2/parquet2 into the official repos.
This is effectively what we did with Ballista, which is still experimental.
Ballista was simpler because it depends on DataFusion rather than the other
way around, but I like the idea of using feature flags to enable DataFusion
on arrow2/parquet2.

I don't see any reason why we wouldn't be able to also release
arrow2/parquet2 with suitable 0.x.x versioning as well (as we plan on doing
with Ballista) and releasing would be much easier if they are in the
official repos.


On Tue, Aug 3, 2021 at 7:13 AM paddy horan <pa...@hotmail.com> wrote:

> Hi Jorge,
>
> What do you think about moving Arrow2 into the main Arrow repo where it is
> only enabled via an "experimental" feature flag?  This would allow
> development of Arrow2 to proceed in the main repo but also this would be a
> clear signal that Arrow2 is <1.0.  When we feel ready (i.e. Arrow2 is 1.0)
> we can release it in the next main release with Arrow2 being the default
> and move the existing implementation behind a "legacy" feature flag.
>
> Here is why I think this might work well:
>  - People contributing to the Arrow project will naturally contribute to
> Arrow2.  At the moment, some people will still contribute to Arrow instead
> of Arrow2 just by virtue of it being the "official" implementation.
> However, if both are in one repo people will want to contribute to the
> "future", i.e. Arrow2.
>  - the experimental flag will be a clear signal to the existing Arrow
> community that Arrow2 is the future but that it is <1.0
>  - existing users will be well supported in this transition
>  - In general, I think the longer that development proceeds in separate
> repos the harder it will be to eventually merge the two in a way that
> supports existing users.
>
> Do you think would work?
>
> Paddy
>
> -----Original Message-----
> From: Jorge Cardoso Leitão <jo...@gmail.com>
> Sent: Monday, August 2, 2021 1:59 PM
> To: dev@arrow.apache.org
> Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
>
> Hi,
>
> Sorry for the delay.
>
> If there is a path towards an official release under a <1.0.0 versioning
> schema aligned with the rest of the Rust ecosystem and in line with the
> stability of the API, then IMO we should move all development to within
> Apache experimental asap (I can handle this and the likely IP clearance
> round). If we require a release >=1.X.Y to it and/or a schedule, then I
> prefer to keep expectations aligned and postpone any movement.
>
> Under the move situation, I was thinking in something as follows:
>
> * gradually stop maintaining "arrow" in crates, offering a maintenance
> window over which we release patches (*)
> * work towards achieving feature parity on arrow2/parquet2 on the
> experimental repos.
> * keep releasing arrow2/parquet2 under a 0.X model during the step above
> (**)
> * migrate to arrow-rs and archive experimentals (***)
> * break arrow2 in smaller crates so that we can version the APIs at a
> different cadence
> * once a crate reaches some stability (this is always opinionated, but it
> is fine), we bump it to 1.0 and announce a maintenance plan ala tokio <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lpj8KTpf3c3t0zxo28dSqtuJ82xfMtPssmxzNkrj%2BBQ%3D&amp;reserved=0
> >.
>
> (*) e.g. "we will continue to patch the arrow crate up to at least 6
> months starting after the first release of arrow2 that supports
> a) nested parquet read and write
> b) union array (including IPC integration tests)
> c) map array (including IPC integration tests)"
>
> (**) officially or un-officially (I would suggest officially so that we
> can acknowledge everyone's work on it, but no strong feelings)
>
> (***) something like:
> 1. place arrow2 on top of a clear arrow repo so that the full contribution
> history up to that point preserved 2. make arrow-rs the home of arrow2
> (i.e. we start releasing arrow2 from
> arrow-rs) and archive the experimental repos; create arrow-rs-parquet or
> something for parquet2.
>
> In summary, the core pain point for me is the current versioning of arrow,
> which I feel is incompatible with my goals for arrow2 and the ecosystem I
> envision it supporting :)
>
> Best,
> Jorge
>
> On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <we...@gmail.com> wrote:
>
> > I think it would also be fine to push "beta" arrow2 crates out of a
> > repo under apache/ so long as they are not marked on crates.io as
> > being Apache-official releases. There's a possible slippery slope
> > there, but as long as we are on a path to formalizing the releases I
> think it is okay.
> >
> > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> >
> > > Jorge -- do you feel like we have a resolution on what to do with
> > > arrow2
> > in
> > > the near term?
> > >
> > > The current state of affairs seems to me that arrow2 is released
> > > from
> > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=W1TaT%2BFVGrGL1Oay9QclLozhkfNS78jPdrkZFIFRtjA%3D&amp;reserved=0
> to crates.io (which is fine).
> > > Are
> > > you happy with keeping development in the jorgecarleitao repo where
> > > you will retain maximal control and flexibility until it is ready to
> > > start integrating?
> > >
> > > Or would you prefer to put it into one of the apache repos and
> > > subject
> > its
> > > development and release to the normal Arrow governance model
> > > (tarball, vote, etc)?
> > >
> > > Since you are the primary author/architect I think you should have a
> > > substantial say at this stage.
> > >
> > > Andrew
> > >
> > >
> > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <al...@influxdata.com>
> > wrote:
> > >
> > > > I would be happy with this approach. Thank you for the suggestion
> > > >
> > > > This hybrid approach of both arrow and arrow2 in the same repo
> > > > seems better to me than separate repos.
> > > >
> > > > What I really care about is ensuring we don't have two crates/APIs
> > > > indefinitely -- as long as we are continually making progress
> > > > towards unification that is what is important to me.
> > > >
> > > > Andrew
> > > >
> > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove <an...@gmail.com>
> > > wrote:
> > > >
> > > >> Apologies for being late to this discussion.
> > > >>
> > > >> There is a hybrid option to consider here where we add the arrow2
> > > >> code into the arrow crate as a separate module, so we release one
> > > >> crate
> > containing
> > > >> the "old" API (which we can mark as deprecated) as well as the
> > > >> new
> > API.
> > > >> Java did a similar thing a long time ago with "java.io" versus
> > > "java.nio"
> > > >> (new IO).
> > > >>
> > > >> I agree that the versioning wouldn't be ideal, but this seems
> > > >> like it might be a pragmatic compromise?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Andy.
> > > >>
> > > >>
> > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb
> > > >> <al...@influxdata.com>
> > > wrote:
> > > >>
> > > >> > What I meant is that when you decide arrow2 is suitable for
> > > >> > release
> > to
> > > >> > existing arrow users, I stand ready to help you incorporate it
> > > >> > into
> > > >> arrow.
> > > >> >
> > > >> > All the feedback I have heard so far from the rest of the
> > > >> > community
> > is
> > > >> that
> > > >> > we are ready. One might even say we are anxious to do so :)
> > > >> >
> > > >> > Andrew
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Posted by Andrew Lamb <al...@influxdata.com>.
I  agree with you both.

Users would love to have a project with multi-year maintenance with a
completely stable backwards compatible API (aka what tokio has promised)
that does everything they need.

However, building such software is (very) costly both initially and then
much more so for the ongoing maintenance; Until there is a need
(demonstrated by the willingness to pay the cost) from users of Rust/Arrow
for such maintenance I don't see how to make it happen.

Evidence of the lack of demand for longer 'supported' releases in my mind:
No one I know of has asked for, let alone volunteered to help create an
arrow-rs maintenance release (e.g.  4.4.1)  with just bug fixes. We have
all the process setup to make it happen, but no one cares yet.

I agree with Adam that there is middle ground here and I don't see any
insurmountable incompatibilities in release versions or processes.

Andrew

On Fri, Aug 6, 2021 at 5:31 AM Adam Lippai <ad...@rigo.sk> wrote:

> Hi,
>
> Thanks for the detailed answer.
>
> In contrast to my previous email, my opinionated part:
>
> Generally I like the idea of smaller crates, it helps with a lot of stuff
> (different targets, build time), but those benefits can be achieved by
> feature gates too.
> The upside would be out-of-sync crate releases.
>
> Maintenance is important, historically speaking I've seen it solved for
> open source by private companies offering it as a paid service.
> You are right that currently only 3 months of support is provided for free,
> but personally I don't see that as an issue.
> There are professional libraries and software with close to 100% market
> share in their field which support the last or last two versions only
> (Chrome, OS-es, compilers).
> I find it hard to imagine we'd want to do it *better*, that sounds to be an
> illusion, but I'd like to be wrong on this one :)
> Professionally speaking, when picking projects, having Apache (or other)
> governance and community is more important for the businesses I worked
> with, than the release schedule or API stability / versioning.
>
>
> Based on the above and that there are about a dozen active Rust arrow
> contributors, any promise for reliable maintenance over years would be a
> lie in my eyes.
> DataFusion, Polars, odbc2parquet and others had issues with the changes
> being too slow, not too fast.
>
> I'm a big advocate of middle grounds and I still believe that your efforts
> and ideal setup is compatible with arrow-rs, nobody would stop you creating
> a 5.23.0 release next to the 6.1.0 if you'd want to backport anything and
> nobody would stop you cutting an out-of-schedule 6.2 or even 7.0 release if
> it's to ensure security. The frequent Apache release process - which we
> were afraid of - was smooth so far, with surprisingly nice support from
> members of different languages / implementations.
>
> Also I believe that any plan you'd have turning arrow2 into arrow-rs 6.0
> would be more than welcome on a public vote, along with the technical
> chances you propose (eg. cutting a separate arrow-io crate).
>
>
> At least 6 key members showed their excitement for your changes in this
> thread and even more on Slack/GitHub ;)
>
> Best regards,
> Adam Lippai
>
> On Fri, Aug 6, 2021 at 10:07 AM Jorge Cardoso Leitão <
> jorgecarleitao@gmail.com> wrote:
>
> > Hi,
> >
> > Thanks for your input.
> >
> > Every time there is a new major release, all new development shifts
> towards
> > that new API and users of previous APIs are left behind. It is not just a
> > matter of SemVer and size of version numbers, there is a whole
> development
> > shift to be on top of the new API.
> >
> > I disagree that a software that has a major release every 3 months and no
> > maintenance window over previous versions is stable. I alluded to the
> Tokio
> > example because Tokio 1.0 recently became the runtime of rust-based AWS
> > lambda functions [1]; this commitment is only possible by enforcing API
> > stability and maintenance beyond a 3 month period (at least 3 years in
> > their case).
> >
> > Also, imo the current major version number is not meaningless: divided by
> > the software age, it constitutes the historical release pattern and is
> > usually a good predictor of the pattern used in future releases.
> >
> > The evidence is that we haven't been able to support any version for any
> > period of time; recently, Andrew has been doing amazing work at
> supporting
> > the latest version for a period of 3 months. I.e. an application that
> > depends on `arrow = ^5.0` has a support window of 3 months. Given that we
> > have not backported any security fixes to previous versions, it is
> > reasonable to assume that security patches are also applied within a 3
> > month period only.
> >
> > As contributor of arrow2, I would rather not have arrow2 under Apache
> Arrow
> > than having to release it under its current versioning and scheduling
> (this
> > is similar to some of Julia's concerns). As a contributor to the Apache
> > Arrow, I currently cannot guarantee a maintenance window over arrow-rs
> for
> > any period of time because it is unsafe by design and I do not have the
> > motivation to fix it. As both, I am confident that the core arrow2 will
> > soon reach a point where we can live with and develop on top of it for at
> > least a year. This is not true to the whole API surface, though: there
> are
> > APIs that we will need to change more often until stability can be
> > promised.
> >
> > So, I am requesting that we tie the discussion of arrow2 to how it will
> be
> > released.
> >
> > Could a middle ground be somewhere along the lines of splitting the crate
> > in smaller crates that are versioned independently. I.e. continue to
> > release `arrow` under the same versioning and cadence, and create 3 new
> > crates, arrow-core, arrow-compute, and arrow-io (see also [2]) that would
> > have their own versioning at 0.X until stability is achieved, based on
> > arrow2's code base. The migration of the `arrow` crate to arrow2's API
> > would be to re-export from the smaller crates (e.g. `pub use
> > arrow_core::array`).
> >
> > [1] https://crates.io/crates/lambda_runtime/0.3.1/dependencies
> > [2] https://github.com/jorgecarleitao/arrow2/issues/257
> >
> > Best,
> > Jorge
> >
> >
> > On Thu, Aug 5, 2021 at 11:53 PM Adam Lippai <ad...@rigo.sk> wrote:
> >
> > > Not taking sides, just two technical notes below.
> > >
> > > Server.org clearly defines (
> > > https://semver.org/#how-do-i-know-when-to-release-100) the versions
> > > >1.0.0.
> > > * If it's used in production, it's 1.0.0.
> > > * If it provides an API others depend on then it's 1.0.0.
> > > * If you intend to keep backward compatibility, it's 1.0.0.
> > > Tl;Dr 1.0.0 represents a version which from point we guarantee that
> > > non-production releases are marked (alpha, beta, rc) and breaking (API)
> > > changes, backwards incompatible changes result in major version bump.
> > This
> > > we already do, 4x per year.
> > >
> > > The second fact is that arrow2 uses the arrow name, but it doesn't have
> > > apache governance. It's not released from GitHub.com/apache, there are
> no
> > > formal releases, there are no votes. This is not correct or fair usage
> of
> > > the brand (on the same level as DataFuse, or db-benchmark calling a
> > custom
> > > R implementation arrow) even if it's "unofficial". My understanding is
> > that
> > > arrow2 can be an unofficial implementation with a different name or an
> > > arrow-rs experiment with the intention to merge the code, but not both.
> > >
> > > I think both issues could be solved and I really value and like the
> > arrow2
> > > work so far. That's the right way. I hope we'll see it in prod either
> way
> > > as soon as it's ready.
> > >
> > > Best regards,
> > > Adam Lippai
> > >
> > > On Wed, Aug 4, 2021, 08:25 QP Hou <ho...@gmail.com> wrote:
> > >
> > > > Just my two cents.
> > > >
> > > > I think we all have the same goal here, which is to accelerate the
> > > > transitioning of arrow to arrow2 as the official arrow rust
> > > > implementation.
> > > >
> > > > In my opinion, the biggest gain we can get from merging two projects
> > > > into one repo is to have some kind of a policy to enforce that every
> > > > new feature/test added to the current arrow implementation also
> needs
> > > > to be added to the arrow2 implementation. This way, we can make sure
> > > > the gap between arrow and arrow2 is closing on every iteration.
> > > > Without this, I tend to agree with Jorge that merging two repos would
> > > > add more overhead to his work and slow him down.
> > > >
> > > > For those who want to contribute to arrow2 to accelerate the
> > > > transition, I don't think they would have problem sending PRs to the
> > > > arrow2 repo. For those who are not interested in contributing to
> > > > arrow2, merging the arrow2 code base into the current arrow-rs repo
> > > > won't incentivize them to contribute. Merging arrow2 into current
> > > > arrow-rs repo could help with discovery. But I think this can be
> > > > achieved by adding a big note in the current arrow-rs README to
> > > > encourage contributions to the arrow2 repo as well.
> > > >
> > > > At the end of the day, Jorge is currently the sole active contributor
> > > > to the arrow2 implementation, so I think he would have the most say
> on
> > > > what's the most productive way to push arrow2 forward. The only
> > > > concern I have with regards to merging arrow2 into arrow-rs right now
> > > > is Jorge spent all the efforts to do the merge, then it turned out
> > > > that he is still the only active contributor to arrow2 within
> > > > arrow-rs, but with more overhead that he has to deal with.
> > > >
> > > > As for maintaining semantic versioning for arrow2, Andy had a good
> > > > point that we could still release arrow2 with its own versioning even
> > > > if we merge it into the arrow-rs repo. So I don't think we should
> > > > worry/focus too much about versioning in our discussion. Velocity to
> > > > close the gap between arrow-rs and arrow2 is the most important
> thing.
> > > >
> > > > Lastly, I do agree with Andrew that it would be good to only maintain
> > > > a single arrow crate in crates.io in the long run. As he mentioned,
> > > > when the current arrow2 code base becomes stable, we could still
> > > > release it under the arrow namespace in crates.io with a major
> version
> > > > bump. The absolute value in the major version doesn't really matter
> as
> > > > long as we stick to the convention that breaking change will result
> in
> > > > a major version bump.
> > > >
> > > > Thanks,
> > > > QP
> > > >
> > > >
> > > >
> > > > On Tue, Aug 3, 2021 at 5:31 PM paddy horan <pa...@hotmail.com>
> > > wrote:
> > > > >
> > > > > Hi Jorge,
> > > > >
> > > > > I see value in consolidating development in a single repo and
> > releasing
> > > > under the existing arrow crate.  Regarding versioning, I think once
> we
> > > > follow semantic versioning we are fine.  I don't think it's worth
> > > migrating
> > > > to a different repo and crate to comply with the de-facto standard
> you
> > > > mention.
> > > > >
> > > > > Just one person's opinion though,
> > > > > Paddy
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Jorge Cardoso Leitão <jo...@gmail.com>
> > > > > Sent: Tuesday, August 3, 2021 5:23 PM
> > > > > To: dev@arrow.apache.org
> > > > > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> > > > >
> > > > > Hi Paddy,
> > > > >
> > > > > > What do you think about moving Arrow2 into the main Arrow repo
> > where
> > > > > > it
> > > > > is only enabled via an "experimental" feature flag?
> > > > >
> > > > > AFAIK this is already possible:
> > > > > * add `arrow2 = { version = "0.2.0", optional = true }` to
> Cargo.toml
> > > > > * add `#[cfg(feature = "arrow2")]\npub mod arrow2;\n` to lib.rs
> > > > >
> > > > > We do this kind of thing to expose APIs from non-arrow crates such
> as
> > > > parts of the parquet-format-rs crate, and is generally the way to go
> > > when a
> > > > crate wants to expose a third-party API.
> > > > >
> > > > > I would not recommend doing this, though: by exposing arrow2 from
> > > arrow,
> > > > we double the compilation time and binary size of all dependencies
> that
> > > > activate the flag. Furthermore, there are users of arrow2 that do not
> > > need
> > > > the arrow crate, which this model would not support.
> > > > >
> > > > > AFAIK where development happens is unrelated to this aspect, Rust
> > > > enables this by design.
> > > > >
> > > > > > but also this would be a clear signal that Arrow2 is <1.0.
> > > > > > the experimental flag will be a clear signal to the existing
> Arrow
> > > > > community that Arrow2 is the future but that it is <1.0
> > > > >
> > > > > arrow2 is already <1.0 <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bJEw92M9Lz8cxJZ0o3vc0ezpou%2BuQx1S0MYeODKCKmE%3D&amp;reserved=0
> > > >.
> > > > My argument is that the arrow/arrow-flight/parquet are not versioned
> > > > according to the Rust community standards: It is a de facto practice
> in
> > > > Rust to delay major releases until the API is stable. Tokio's blog
> post
> > > > about their 1.0 <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=En8p4k7Etyc%2BnQ3mJC4woQD%2Fkt7Uhmhw%2Bzf8scHhdgQ%3D&amp;reserved=0
> > > >
> > > > (i.e. "[...] we commit to holding back on a Tokio 2.0 release for at
> > > least
> > > > 3 years."). 10 most downloaded
> > > > > crates:
> > > > >
> > > > > *
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sBxp1XYBLl6OIV57nM%2FGsZO0AmbgyBeRaoPANEvdZGE%3D&amp;reserved=0
> > > > (0.8.4)
> > > > > *
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fsyn&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oeQliVwSgrvgART7r49XeiM%2F72TYa7hX8M3QyVDrqsk%3D&amp;reserved=0
> > > > (1.0.74)
> > > > > *
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Flibc&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OULOu9vhaWEgnavRqedebM7ceZRsVnaF7YjYuq1MJ3Y%3D&amp;reserved=0
> > > > (0.2.98)
> > > > > *
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand_core&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mx6X86bNRis6UykbWR%2FWTGEgAjq8h6JylmOSAQlfsh0%3D&amp;reserved=0
> > > > (0.6.3)
> > > > > * quote (1.0.9)
> > > > > * unicode-xid (0.2.2)
> > > > > * proc-macro2 (1.0.28)
> > > > > * cfg-if (1.0.0)
> > > > > *
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fserde&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p%2FNgTB0839C1%2F1Zn4GeEnRtvr0hiFhOuBJ5tF76aW5E%3D&amp;reserved=0
> > > > (1.0.126)
> > > > > * bitflags (1.2.1)
> > > > >
> > > > > These are small crates with a small scope, but even larger projects
> > > > share the same pattern:
> > > > >
> > > > > * crossbeam <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fcrossbeam&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9C%2BX5DnKLpp%2F8aTGrmKNB73Jf5JanlL4OhuC0YKgw9s%3D&amp;reserved=0
> > > >
> > > > (0.8.1)
> > > > > * rocket <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frocket&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Jh93g%2BiXxoeKlTNzhaOKvs3bsBfIJO3DJeetBI3nBV0%3D&amp;reserved=0
> > > >
> > > > (0.5)
> > > > > * polars <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fpolars&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Pdzno7bF3oqviXmv6nxInZemHD1d0SsaxmfdUxJ57T0%3D&amp;reserved=0
> > > >
> > > > (0.14.8)
> > > > > * tower <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftower&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AmUGvrzXd8giphnKq0FNwjnc4a4Ki3T3GJL3P8rvEeM%3D&amp;reserved=0
> > > >
> > > > (0.4.8)
> > > > > * Tokio <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftokio&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z%2FqBVQ%2Fi0BCmSJiBL7E6y%2F%2BbMVGKYXdo3oCRGOjm5UA%3D&amp;reserved=0
> > > >
> > > > (1.9.0)
> > > > > * hyper <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fhyper&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=c%2Fy4eY0BQCXE8XIoSb6UZAVUx4U%2BwcRUKN9jGJs5v3w%3D&amp;reserved=0
> > > >
> > > > (0.14.11)
> > > > >
> > > > > Crates that arrow depends on
> > > > > <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Fblob%2Fmaster%2Farrow%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DdGZFC5Hf7i362%2FmhfFQUVVPnkDBJzw0zM6AzQ4jgcQ%3D&amp;reserved=0
> > > > >,
> > > > > that DataFusion
> > > > > depends on
> > > > > <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-datafusion%2Fblob%2Fmaster%2Fdatafusion%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OXKyW4O6q4hn6ZCHTN2jIvJpI3Iv8JvBBa0zKzBgZag%3D&amp;reserved=0
> > > > >,
> > > > > all share the same pattern of being either 0.X, 1.X when their API
> is
> > > > stable, and 2.X when they needed a large change in the API. This
> > > contrasts
> > > > with Apache Arrow's releases where we are now at 5.0 (and we have yet
> > to
> > > > arrive at a safe design).
> > > > >
> > > > > > existing users will be well supported in this transition
> > > > >
> > > > > How so? imo people either PR to the arrow/arrow2 code base or they
> > > won't.
> > > > > This is largely independent of where the development of either
> arrow2
> > > or
> > > > arrow happens; people google the crate, click on the repository link
> > and
> > > > file an issue or field a PR.
> > > > >
> > > > > > In general, I think the longer that development proceeds in
> > separate
> > > > > repos the harder it will be to eventually merge the two in a way
> that
> > > > supports existing users.
> > > > >
> > > > > How so? I may be mistaken, but API design is unrelated to on which
> > repo
> > > > the development happens: it is primarily driven by who is designing
> it
> > > and
> > > > from where or who they are inspired by. Both arrow and parquet's
> crate
> > > > design are inspired by the C++ implementation and have gradually been
> > > > migrated to "idiomatic" Rust, as "idiomatic" is becoming more well
> > > defined
> > > > in Rust.
> > > > > Arrow2 is inspired by the current crate and the pains of using it
> in
> > > > DataFusion. Datafuse, a fork of datafusion, recently migrated to
> arrow2
> > > > > <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdatafuselabs%2Fdatafuse%2Fpull%2F1239&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0W9AeIxXcAvCrXkOE%2F1h0o%2BWam15PHEP7Pf7U1L84As%3D&amp;reserved=0
> > > >:
> > > > +1,947 −3,484, which shows that the crate is capturing important
> > patterns
> > > > from the arrow crate and exposing ones that are useful / result in
> less
> > > > code for the same or higher performance.
> > > > >
> > > > > On the opposite side, merging the development of crates under the
> > same
> > > > repo leads to: more triagging of PRs; more work for releases and
> > > > changelogging; tagging based on crates; multiple READMEs in subpaths
> of
> > > the
> > > > repo, curation of the CI to accommodate this, a workspace with many
> > > crates
> > > > each with its own set of dependencies, increasing compilation and
> > > > development; mixed commit logs, difficulties in reverts and
> > cherry-picks;
> > > > more difficult to find stuff in the repo. See e.g. how tokio-rs does
> > it:
> > > > >
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nZUiKNr1DmeTNJLqiZgKX5P7nb6jt0OuZlufMywmDBE%3D&amp;reserved=0
> > > ,
> > > > even for small crates like bytes <
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs%2Fbytes&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ltf66TZejbomCtlqvhmDswFfdrunChIz5rDTeZzwyRU%3D&amp;reserved=0
> > > > >.
> > > > >
> > > > > Best,
> > > > > Jorge
> > > > >
> > > > > On Tue, Aug 3, 2021 at 3:13 PM paddy horan <paddyhoran@hotmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Jorge,
> > > > > >
> > > > > > What do you think about moving Arrow2 into the main Arrow repo
> > where
> > > > > > it is only enabled via an "experimental" feature flag?  This
> would
> > > > > > allow development of Arrow2 to proceed in the main repo but also
> > this
> > > > > > would be a clear signal that Arrow2 is <1.0.  When we feel ready
> > > (i.e.
> > > > > > Arrow2 is 1.0) we can release it in the next main release with
> > Arrow2
> > > > > > being the default and move the existing implementation behind a
> > > > "legacy" feature flag.
> > > > > >
> > > > > > Here is why I think this might work well:
> > > > > >  - People contributing to the Arrow project will naturally
> > contribute
> > > > > > to Arrow2.  At the moment, some people will still contribute to
> > Arrow
> > > > > > instead of Arrow2 just by virtue of it being the "official"
> > > > implementation.
> > > > > > However, if both are in one repo people will want to contribute
> to
> > > the
> > > > > > "future", i.e. Arrow2.
> > > > > >  - the experimental flag will be a clear signal to the existing
> > Arrow
> > > > > > community that Arrow2 is the future but that it is <1.0
> > > > > >  - existing users will be well supported in this transition
> > > > > >  - In general, I think the longer that development proceeds in
> > > > > > separate repos the harder it will be to eventually merge the two
> > in a
> > > > > > way that supports existing users.
> > > > > >
> > > > > > Do you think would work?
> > > > > >
> > > > > > Paddy
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jorge Cardoso Leitão <jo...@gmail.com>
> > > > > > Sent: Monday, August 2, 2021 1:59 PM
> > > > > > To: dev@arrow.apache.org
> > > > > > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Sorry for the delay.
> > > > > >
> > > > > > If there is a path towards an official release under a <1.0.0
> > > > > > versioning schema aligned with the rest of the Rust ecosystem and
> > in
> > > > > > line with the stability of the API, then IMO we should move all
> > > > > > development to within Apache experimental asap (I can handle this
> > and
> > > > > > the likely IP clearance round). If we require a release >=1.X.Y
> to
> > it
> > > > > > and/or a schedule, then I prefer to keep expectations aligned and
> > > > postpone any movement.
> > > > > >
> > > > > > Under the move situation, I was thinking in something as follows:
> > > > > >
> > > > > > * gradually stop maintaining "arrow" in crates, offering a
> > > maintenance
> > > > > > window over which we release patches (*)
> > > > > > * work towards achieving feature parity on arrow2/parquet2 on the
> > > > > > experimental repos.
> > > > > > * keep releasing arrow2/parquet2 under a 0.X model during the
> step
> > > > > > above
> > > > > > (**)
> > > > > > * migrate to arrow-rs and archive experimentals (***)
> > > > > > * break arrow2 in smaller crates so that we can version the APIs
> > at a
> > > > > > different cadence
> > > > > > * once a crate reaches some stability (this is always
> opinionated,
> > > but
> > > > > > it is fine), we bump it to 1.0 and announce a maintenance plan
> ala
> > > > > > tokio <
> > > > > >
> > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio
> > > > > >
> > > .rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a7
> > > > > >
> > > 77b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225
> > > > > >
> > > 764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> > > > > >
> > > LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oHPQI8MeSumgLTEsawCkRN
> > > > > > 5hANft%2BkbLTEmLZ3pIDiU%3D&amp;reserved=0
> > > > > > >.
> > > > > >
> > > > > > (*) e.g. "we will continue to patch the arrow crate up to at
> least
> > 6
> > > > > > months starting after the first release of arrow2 that supports
> > > > > > a) nested parquet read and write
> > > > > > b) union array (including IPC integration tests)
> > > > > > c) map array (including IPC integration tests)"
> > > > > >
> > > > > > (**) officially or un-officially (I would suggest officially so
> > that
> > > > > > we can acknowledge everyone's work on it, but no strong feelings)
> > > > > >
> > > > > > (***) something like:
> > > > > > 1. place arrow2 on top of a clear arrow repo so that the full
> > > > > > contribution history up to that point preserved 2. make arrow-rs
> > the
> > > > > > home of arrow2 (i.e. we start releasing arrow2 from
> > > > > > arrow-rs) and archive the experimental repos; create
> > arrow-rs-parquet
> > > > > > or something for parquet2.
> > > > > >
> > > > > > In summary, the core pain point for me is the current versioning
> of
> > > > > > arrow, which I feel is incompatible with my goals for arrow2 and
> > the
> > > > > > ecosystem I envision it supporting :)
> > > > > >
> > > > > > Best,
> > > > > > Jorge
> > > > > >
> > > > > > On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <
> wesmckinn@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I think it would also be fine to push "beta" arrow2 crates out
> > of a
> > > > > > > repo under apache/ so long as they are not marked on crates.io
> > as
> > > > > > > being Apache-official releases. There's a possible slippery
> slope
> > > > > > > there, but as long as we are on a path to formalizing the
> > releases
> > > I
> > > > > > think it is okay.
> > > > > > >
> > > > > > > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <
> > alamb@influxdata.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Jorge -- do you feel like we have a resolution on what to do
> > with
> > > > > > > > arrow2
> > > > > > > in
> > > > > > > > the near term?
> > > > > > > >
> > > > > > > > The current state of affairs seems to me that arrow2 is
> > released
> > > > > > > > from
> > > > > > > >
> > > > > >
> > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > > > > > b.com
> > > %2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a
> > > > > >
> > > 777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C63763622
> > > > > >
> > > 5764541982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> > > > > >
> > > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jNo5puUzWEOmWj3wIs8CN
> > > > > > p44WmsoaRQGfsRdWgrftwE%3D&amp;reserved=0
> > > > > > to crates.io (which is fine).
> > > > > > > > Are
> > > > > > > > you happy with keeping development in the jorgecarleitao repo
> > > > > > > > where you will retain maximal control and flexibility until
> it
> > is
> > > > > > > > ready to start integrating?
> > > > > > > >
> > > > > > > > Or would you prefer to put it into one of the apache repos
> and
> > > > > > > > subject
> > > > > > > its
> > > > > > > > development and release to the normal Arrow governance model
> > > > > > > > (tarball, vote, etc)?
> > > > > > > >
> > > > > > > > Since you are the primary author/architect I think you should
> > > have
> > > > > > > > a substantial say at this stage.
> > > > > > > >
> > > > > > > > Andrew
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <
> > > alamb@influxdata.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I would be happy with this approach. Thank you for the
> > > > > > > > > suggestion
> > > > > > > > >
> > > > > > > > > This hybrid approach of both arrow and arrow2 in the same
> > repo
> > > > > > > > > seems better to me than separate repos.
> > > > > > > > >
> > > > > > > > > What I really care about is ensuring we don't have two
> > > > > > > > > crates/APIs indefinitely -- as long as we are continually
> > > making
> > > > > > > > > progress towards unification that is what is important to
> me.
> > > > > > > > >
> > > > > > > > > Andrew
> > > > > > > > >
> > > > > > > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove
> > > > > > > > > <an...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Apologies for being late to this discussion.
> > > > > > > > >>
> > > > > > > > >> There is a hybrid option to consider here where we add the
> > > > > > > > >> arrow2 code into the arrow crate as a separate module, so
> we
> > > > > > > > >> release one crate
> > > > > > > containing
> > > > > > > > >> the "old" API (which we can mark as deprecated) as well as
> > the
> > > > > > > > >> new
> > > > > > > API.
> > > > > > > > >> Java did a similar thing a long time ago with "java.io"
> > > versus
> > > > > > > > "java.nio"
> > > > > > > > >> (new IO).
> > > > > > > > >>
> > > > > > > > >> I agree that the versioning wouldn't be ideal, but this
> > seems
> > > > > > > > >> like it might be a pragmatic compromise?
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >>
> > > > > > > > >> Andy.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb
> > > > > > > > >> <al...@influxdata.com>
> > > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> > What I meant is that when you decide arrow2 is suitable
> > for
> > > > > > > > >> > release
> > > > > > > to
> > > > > > > > >> > existing arrow users, I stand ready to help you
> > incorporate
> > > > > > > > >> > it into
> > > > > > > > >> arrow.
> > > > > > > > >> >
> > > > > > > > >> > All the feedback I have heard so far from the rest of
> the
> > > > > > > > >> > community
> > > > > > > is
> > > > > > > > >> that
> > > > > > > > >> > we are ready. One might even say we are anxious to do so
> > :)
> > > > > > > > >> >
> > > > > > > > >> > Andrew
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Posted by Adam Lippai <ad...@rigo.sk>.
Hi,

Thanks for the detailed answer.

In contrast to my previous email, my opinionated part:

Generally I like the idea of smaller crates, it helps with a lot of stuff
(different targets, build time), but those benefits can be achieved by
feature gates too.
The upside would be out-of-sync crate releases.

Maintenance is important, historically speaking I've seen it solved for
open source by private companies offering it as a paid service.
You are right that currently only 3 months of support is provided for free,
but personally I don't see that as an issue.
There are professional libraries and software with close to 100% market
share in their field which support the last or last two versions only
(Chrome, OS-es, compilers).
I find it hard to imagine we'd want to do it *better*, that sounds to be an
illusion, but I'd like to be wrong on this one :)
Professionally speaking, when picking projects, having Apache (or other)
governance and community is more important for the businesses I worked
with, than the release schedule or API stability / versioning.


Based on the above and that there are about a dozen active Rust arrow
contributors, any promise for reliable maintenance over years would be a
lie in my eyes.
DataFusion, Polars, odbc2parquet and others had issues with the changes
being too slow, not too fast.

I'm a big advocate of middle grounds and I still believe that your efforts
and ideal setup is compatible with arrow-rs, nobody would stop you creating
a 5.23.0 release next to the 6.1.0 if you'd want to backport anything and
nobody would stop you cutting an out-of-schedule 6.2 or even 7.0 release if
it's to ensure security. The frequent Apache release process - which we
were afraid of - was smooth so far, with surprisingly nice support from
members of different languages / implementations.

Also I believe that any plan you'd have turning arrow2 into arrow-rs 6.0
would be more than welcome on a public vote, along with the technical
chances you propose (eg. cutting a separate arrow-io crate).


At least 6 key members showed their excitement for your changes in this
thread and even more on Slack/GitHub ;)

Best regards,
Adam Lippai

On Fri, Aug 6, 2021 at 10:07 AM Jorge Cardoso Leitão <
jorgecarleitao@gmail.com> wrote:

> Hi,
>
> Thanks for your input.
>
> Every time there is a new major release, all new development shifts towards
> that new API and users of previous APIs are left behind. It is not just a
> matter of SemVer and size of version numbers, there is a whole development
> shift to be on top of the new API.
>
> I disagree that a software that has a major release every 3 months and no
> maintenance window over previous versions is stable. I alluded to the Tokio
> example because Tokio 1.0 recently became the runtime of rust-based AWS
> lambda functions [1]; this commitment is only possible by enforcing API
> stability and maintenance beyond a 3 month period (at least 3 years in
> their case).
>
> Also, imo the current major version number is not meaningless: divided by
> the software age, it constitutes the historical release pattern and is
> usually a good predictor of the pattern used in future releases.
>
> The evidence is that we haven't been able to support any version for any
> period of time; recently, Andrew has been doing amazing work at supporting
> the latest version for a period of 3 months. I.e. an application that
> depends on `arrow = ^5.0` has a support window of 3 months. Given that we
> have not backported any security fixes to previous versions, it is
> reasonable to assume that security patches are also applied within a 3
> month period only.
>
> As contributor of arrow2, I would rather not have arrow2 under Apache Arrow
> than having to release it under its current versioning and scheduling (this
> is similar to some of Julia's concerns). As a contributor to the Apache
> Arrow, I currently cannot guarantee a maintenance window over arrow-rs for
> any period of time because it is unsafe by design and I do not have the
> motivation to fix it. As both, I am confident that the core arrow2 will
> soon reach a point where we can live with and develop on top of it for at
> least a year. This is not true to the whole API surface, though: there are
> APIs that we will need to change more often until stability can be
> promised.
>
> So, I am requesting that we tie the discussion of arrow2 to how it will be
> released.
>
> Could a middle ground be somewhere along the lines of splitting the crate
> in smaller crates that are versioned independently. I.e. continue to
> release `arrow` under the same versioning and cadence, and create 3 new
> crates, arrow-core, arrow-compute, and arrow-io (see also [2]) that would
> have their own versioning at 0.X until stability is achieved, based on
> arrow2's code base. The migration of the `arrow` crate to arrow2's API
> would be to re-export from the smaller crates (e.g. `pub use
> arrow_core::array`).
>
> [1] https://crates.io/crates/lambda_runtime/0.3.1/dependencies
> [2] https://github.com/jorgecarleitao/arrow2/issues/257
>
> Best,
> Jorge
>
>
> On Thu, Aug 5, 2021 at 11:53 PM Adam Lippai <ad...@rigo.sk> wrote:
>
> > Not taking sides, just two technical notes below.
> >
> > Server.org clearly defines (
> > https://semver.org/#how-do-i-know-when-to-release-100) the versions
> > >1.0.0.
> > * If it's used in production, it's 1.0.0.
> > * If it provides an API others depend on then it's 1.0.0.
> > * If you intend to keep backward compatibility, it's 1.0.0.
> > Tl;Dr 1.0.0 represents a version which from point we guarantee that
> > non-production releases are marked (alpha, beta, rc) and breaking (API)
> > changes, backwards incompatible changes result in major version bump.
> This
> > we already do, 4x per year.
> >
> > The second fact is that arrow2 uses the arrow name, but it doesn't have
> > apache governance. It's not released from GitHub.com/apache, there are no
> > formal releases, there are no votes. This is not correct or fair usage of
> > the brand (on the same level as DataFuse, or db-benchmark calling a
> custom
> > R implementation arrow) even if it's "unofficial". My understanding is
> that
> > arrow2 can be an unofficial implementation with a different name or an
> > arrow-rs experiment with the intention to merge the code, but not both.
> >
> > I think both issues could be solved and I really value and like the
> arrow2
> > work so far. That's the right way. I hope we'll see it in prod either way
> > as soon as it's ready.
> >
> > Best regards,
> > Adam Lippai
> >
> > On Wed, Aug 4, 2021, 08:25 QP Hou <ho...@gmail.com> wrote:
> >
> > > Just my two cents.
> > >
> > > I think we all have the same goal here, which is to accelerate the
> > > transitioning of arrow to arrow2 as the official arrow rust
> > > implementation.
> > >
> > > In my opinion, the biggest gain we can get from merging two projects
> > > into one repo is to have some kind of a policy to enforce that every
> > > new feature/test added to the current arrow implementation also  needs
> > > to be added to the arrow2 implementation. This way, we can make sure
> > > the gap between arrow and arrow2 is closing on every iteration.
> > > Without this, I tend to agree with Jorge that merging two repos would
> > > add more overhead to his work and slow him down.
> > >
> > > For those who want to contribute to arrow2 to accelerate the
> > > transition, I don't think they would have problem sending PRs to the
> > > arrow2 repo. For those who are not interested in contributing to
> > > arrow2, merging the arrow2 code base into the current arrow-rs repo
> > > won't incentivize them to contribute. Merging arrow2 into current
> > > arrow-rs repo could help with discovery. But I think this can be
> > > achieved by adding a big note in the current arrow-rs README to
> > > encourage contributions to the arrow2 repo as well.
> > >
> > > At the end of the day, Jorge is currently the sole active contributor
> > > to the arrow2 implementation, so I think he would have the most say on
> > > what's the most productive way to push arrow2 forward. The only
> > > concern I have with regards to merging arrow2 into arrow-rs right now
> > > is Jorge spent all the efforts to do the merge, then it turned out
> > > that he is still the only active contributor to arrow2 within
> > > arrow-rs, but with more overhead that he has to deal with.
> > >
> > > As for maintaining semantic versioning for arrow2, Andy had a good
> > > point that we could still release arrow2 with its own versioning even
> > > if we merge it into the arrow-rs repo. So I don't think we should
> > > worry/focus too much about versioning in our discussion. Velocity to
> > > close the gap between arrow-rs and arrow2 is the most important thing.
> > >
> > > Lastly, I do agree with Andrew that it would be good to only maintain
> > > a single arrow crate in crates.io in the long run. As he mentioned,
> > > when the current arrow2 code base becomes stable, we could still
> > > release it under the arrow namespace in crates.io with a major version
> > > bump. The absolute value in the major version doesn't really matter as
> > > long as we stick to the convention that breaking change will result in
> > > a major version bump.
> > >
> > > Thanks,
> > > QP
> > >
> > >
> > >
> > > On Tue, Aug 3, 2021 at 5:31 PM paddy horan <pa...@hotmail.com>
> > wrote:
> > > >
> > > > Hi Jorge,
> > > >
> > > > I see value in consolidating development in a single repo and
> releasing
> > > under the existing arrow crate.  Regarding versioning, I think once we
> > > follow semantic versioning we are fine.  I don't think it's worth
> > migrating
> > > to a different repo and crate to comply with the de-facto standard you
> > > mention.
> > > >
> > > > Just one person's opinion though,
> > > > Paddy
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Jorge Cardoso Leitão <jo...@gmail.com>
> > > > Sent: Tuesday, August 3, 2021 5:23 PM
> > > > To: dev@arrow.apache.org
> > > > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> > > >
> > > > Hi Paddy,
> > > >
> > > > > What do you think about moving Arrow2 into the main Arrow repo
> where
> > > > > it
> > > > is only enabled via an "experimental" feature flag?
> > > >
> > > > AFAIK this is already possible:
> > > > * add `arrow2 = { version = "0.2.0", optional = true }` to Cargo.toml
> > > > * add `#[cfg(feature = "arrow2")]\npub mod arrow2;\n` to lib.rs
> > > >
> > > > We do this kind of thing to expose APIs from non-arrow crates such as
> > > parts of the parquet-format-rs crate, and is generally the way to go
> > when a
> > > crate wants to expose a third-party API.
> > > >
> > > > I would not recommend doing this, though: by exposing arrow2 from
> > arrow,
> > > we double the compilation time and binary size of all dependencies that
> > > activate the flag. Furthermore, there are users of arrow2 that do not
> > need
> > > the arrow crate, which this model would not support.
> > > >
> > > > AFAIK where development happens is unrelated to this aspect, Rust
> > > enables this by design.
> > > >
> > > > > but also this would be a clear signal that Arrow2 is <1.0.
> > > > > the experimental flag will be a clear signal to the existing Arrow
> > > > community that Arrow2 is the future but that it is <1.0
> > > >
> > > > arrow2 is already <1.0 <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bJEw92M9Lz8cxJZ0o3vc0ezpou%2BuQx1S0MYeODKCKmE%3D&amp;reserved=0
> > >.
> > > My argument is that the arrow/arrow-flight/parquet are not versioned
> > > according to the Rust community standards: It is a de facto practice in
> > > Rust to delay major releases until the API is stable. Tokio's blog post
> > > about their 1.0 <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=En8p4k7Etyc%2BnQ3mJC4woQD%2Fkt7Uhmhw%2Bzf8scHhdgQ%3D&amp;reserved=0
> > >
> > > (i.e. "[...] we commit to holding back on a Tokio 2.0 release for at
> > least
> > > 3 years."). 10 most downloaded
> > > > crates:
> > > >
> > > > *
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sBxp1XYBLl6OIV57nM%2FGsZO0AmbgyBeRaoPANEvdZGE%3D&amp;reserved=0
> > > (0.8.4)
> > > > *
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fsyn&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oeQliVwSgrvgART7r49XeiM%2F72TYa7hX8M3QyVDrqsk%3D&amp;reserved=0
> > > (1.0.74)
> > > > *
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Flibc&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OULOu9vhaWEgnavRqedebM7ceZRsVnaF7YjYuq1MJ3Y%3D&amp;reserved=0
> > > (0.2.98)
> > > > *
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand_core&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mx6X86bNRis6UykbWR%2FWTGEgAjq8h6JylmOSAQlfsh0%3D&amp;reserved=0
> > > (0.6.3)
> > > > * quote (1.0.9)
> > > > * unicode-xid (0.2.2)
> > > > * proc-macro2 (1.0.28)
> > > > * cfg-if (1.0.0)
> > > > *
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fserde&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p%2FNgTB0839C1%2F1Zn4GeEnRtvr0hiFhOuBJ5tF76aW5E%3D&amp;reserved=0
> > > (1.0.126)
> > > > * bitflags (1.2.1)
> > > >
> > > > These are small crates with a small scope, but even larger projects
> > > share the same pattern:
> > > >
> > > > * crossbeam <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fcrossbeam&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9C%2BX5DnKLpp%2F8aTGrmKNB73Jf5JanlL4OhuC0YKgw9s%3D&amp;reserved=0
> > >
> > > (0.8.1)
> > > > * rocket <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frocket&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Jh93g%2BiXxoeKlTNzhaOKvs3bsBfIJO3DJeetBI3nBV0%3D&amp;reserved=0
> > >
> > > (0.5)
> > > > * polars <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fpolars&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Pdzno7bF3oqviXmv6nxInZemHD1d0SsaxmfdUxJ57T0%3D&amp;reserved=0
> > >
> > > (0.14.8)
> > > > * tower <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftower&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AmUGvrzXd8giphnKq0FNwjnc4a4Ki3T3GJL3P8rvEeM%3D&amp;reserved=0
> > >
> > > (0.4.8)
> > > > * Tokio <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftokio&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z%2FqBVQ%2Fi0BCmSJiBL7E6y%2F%2BbMVGKYXdo3oCRGOjm5UA%3D&amp;reserved=0
> > >
> > > (1.9.0)
> > > > * hyper <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fhyper&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=c%2Fy4eY0BQCXE8XIoSb6UZAVUx4U%2BwcRUKN9jGJs5v3w%3D&amp;reserved=0
> > >
> > > (0.14.11)
> > > >
> > > > Crates that arrow depends on
> > > > <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Fblob%2Fmaster%2Farrow%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DdGZFC5Hf7i362%2FmhfFQUVVPnkDBJzw0zM6AzQ4jgcQ%3D&amp;reserved=0
> > > >,
> > > > that DataFusion
> > > > depends on
> > > > <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-datafusion%2Fblob%2Fmaster%2Fdatafusion%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OXKyW4O6q4hn6ZCHTN2jIvJpI3Iv8JvBBa0zKzBgZag%3D&amp;reserved=0
> > > >,
> > > > all share the same pattern of being either 0.X, 1.X when their API is
> > > stable, and 2.X when they needed a large change in the API. This
> > contrasts
> > > with Apache Arrow's releases where we are now at 5.0 (and we have yet
> to
> > > arrive at a safe design).
> > > >
> > > > > existing users will be well supported in this transition
> > > >
> > > > How so? imo people either PR to the arrow/arrow2 code base or they
> > won't.
> > > > This is largely independent of where the development of either arrow2
> > or
> > > arrow happens; people google the crate, click on the repository link
> and
> > > file an issue or field a PR.
> > > >
> > > > > In general, I think the longer that development proceeds in
> separate
> > > > repos the harder it will be to eventually merge the two in a way that
> > > supports existing users.
> > > >
> > > > How so? I may be mistaken, but API design is unrelated to on which
> repo
> > > the development happens: it is primarily driven by who is designing it
> > and
> > > from where or who they are inspired by. Both arrow and parquet's crate
> > > design are inspired by the C++ implementation and have gradually been
> > > migrated to "idiomatic" Rust, as "idiomatic" is becoming more well
> > defined
> > > in Rust.
> > > > Arrow2 is inspired by the current crate and the pains of using it in
> > > DataFusion. Datafuse, a fork of datafusion, recently migrated to arrow2
> > > > <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdatafuselabs%2Fdatafuse%2Fpull%2F1239&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0W9AeIxXcAvCrXkOE%2F1h0o%2BWam15PHEP7Pf7U1L84As%3D&amp;reserved=0
> > >:
> > > +1,947 −3,484, which shows that the crate is capturing important
> patterns
> > > from the arrow crate and exposing ones that are useful / result in less
> > > code for the same or higher performance.
> > > >
> > > > On the opposite side, merging the development of crates under the
> same
> > > repo leads to: more triagging of PRs; more work for releases and
> > > changelogging; tagging based on crates; multiple READMEs in subpaths of
> > the
> > > repo, curation of the CI to accommodate this, a workspace with many
> > crates
> > > each with its own set of dependencies, increasing compilation and
> > > development; mixed commit logs, difficulties in reverts and
> cherry-picks;
> > > more difficult to find stuff in the repo. See e.g. how tokio-rs does
> it:
> > > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nZUiKNr1DmeTNJLqiZgKX5P7nb6jt0OuZlufMywmDBE%3D&amp;reserved=0
> > ,
> > > even for small crates like bytes <
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs%2Fbytes&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ltf66TZejbomCtlqvhmDswFfdrunChIz5rDTeZzwyRU%3D&amp;reserved=0
> > > >.
> > > >
> > > > Best,
> > > > Jorge
> > > >
> > > > On Tue, Aug 3, 2021 at 3:13 PM paddy horan <pa...@hotmail.com>
> > > wrote:
> > > >
> > > > > Hi Jorge,
> > > > >
> > > > > What do you think about moving Arrow2 into the main Arrow repo
> where
> > > > > it is only enabled via an "experimental" feature flag?  This would
> > > > > allow development of Arrow2 to proceed in the main repo but also
> this
> > > > > would be a clear signal that Arrow2 is <1.0.  When we feel ready
> > (i.e.
> > > > > Arrow2 is 1.0) we can release it in the next main release with
> Arrow2
> > > > > being the default and move the existing implementation behind a
> > > "legacy" feature flag.
> > > > >
> > > > > Here is why I think this might work well:
> > > > >  - People contributing to the Arrow project will naturally
> contribute
> > > > > to Arrow2.  At the moment, some people will still contribute to
> Arrow
> > > > > instead of Arrow2 just by virtue of it being the "official"
> > > implementation.
> > > > > However, if both are in one repo people will want to contribute to
> > the
> > > > > "future", i.e. Arrow2.
> > > > >  - the experimental flag will be a clear signal to the existing
> Arrow
> > > > > community that Arrow2 is the future but that it is <1.0
> > > > >  - existing users will be well supported in this transition
> > > > >  - In general, I think the longer that development proceeds in
> > > > > separate repos the harder it will be to eventually merge the two
> in a
> > > > > way that supports existing users.
> > > > >
> > > > > Do you think would work?
> > > > >
> > > > > Paddy
> > > > >
> > > > > -----Original Message-----
> > > > > From: Jorge Cardoso Leitão <jo...@gmail.com>
> > > > > Sent: Monday, August 2, 2021 1:59 PM
> > > > > To: dev@arrow.apache.org
> > > > > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> > > > >
> > > > > Hi,
> > > > >
> > > > > Sorry for the delay.
> > > > >
> > > > > If there is a path towards an official release under a <1.0.0
> > > > > versioning schema aligned with the rest of the Rust ecosystem and
> in
> > > > > line with the stability of the API, then IMO we should move all
> > > > > development to within Apache experimental asap (I can handle this
> and
> > > > > the likely IP clearance round). If we require a release >=1.X.Y to
> it
> > > > > and/or a schedule, then I prefer to keep expectations aligned and
> > > postpone any movement.
> > > > >
> > > > > Under the move situation, I was thinking in something as follows:
> > > > >
> > > > > * gradually stop maintaining "arrow" in crates, offering a
> > maintenance
> > > > > window over which we release patches (*)
> > > > > * work towards achieving feature parity on arrow2/parquet2 on the
> > > > > experimental repos.
> > > > > * keep releasing arrow2/parquet2 under a 0.X model during the step
> > > > > above
> > > > > (**)
> > > > > * migrate to arrow-rs and archive experimentals (***)
> > > > > * break arrow2 in smaller crates so that we can version the APIs
> at a
> > > > > different cadence
> > > > > * once a crate reaches some stability (this is always opinionated,
> > but
> > > > > it is fine), we bump it to 1.0 and announce a maintenance plan ala
> > > > > tokio <
> > > > >
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio
> > > > >
> > .rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a7
> > > > >
> > 77b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225
> > > > >
> > 764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> > > > >
> > LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oHPQI8MeSumgLTEsawCkRN
> > > > > 5hANft%2BkbLTEmLZ3pIDiU%3D&amp;reserved=0
> > > > > >.
> > > > >
> > > > > (*) e.g. "we will continue to patch the arrow crate up to at least
> 6
> > > > > months starting after the first release of arrow2 that supports
> > > > > a) nested parquet read and write
> > > > > b) union array (including IPC integration tests)
> > > > > c) map array (including IPC integration tests)"
> > > > >
> > > > > (**) officially or un-officially (I would suggest officially so
> that
> > > > > we can acknowledge everyone's work on it, but no strong feelings)
> > > > >
> > > > > (***) something like:
> > > > > 1. place arrow2 on top of a clear arrow repo so that the full
> > > > > contribution history up to that point preserved 2. make arrow-rs
> the
> > > > > home of arrow2 (i.e. we start releasing arrow2 from
> > > > > arrow-rs) and archive the experimental repos; create
> arrow-rs-parquet
> > > > > or something for parquet2.
> > > > >
> > > > > In summary, the core pain point for me is the current versioning of
> > > > > arrow, which I feel is incompatible with my goals for arrow2 and
> the
> > > > > ecosystem I envision it supporting :)
> > > > >
> > > > > Best,
> > > > > Jorge
> > > > >
> > > > > On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <we...@gmail.com>
> > > wrote:
> > > > >
> > > > > > I think it would also be fine to push "beta" arrow2 crates out
> of a
> > > > > > repo under apache/ so long as they are not marked on crates.io
> as
> > > > > > being Apache-official releases. There's a possible slippery slope
> > > > > > there, but as long as we are on a path to formalizing the
> releases
> > I
> > > > > think it is okay.
> > > > > >
> > > > > > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <
> alamb@influxdata.com>
> > > > > wrote:
> > > > > >
> > > > > > > Jorge -- do you feel like we have a resolution on what to do
> with
> > > > > > > arrow2
> > > > > > in
> > > > > > > the near term?
> > > > > > >
> > > > > > > The current state of affairs seems to me that arrow2 is
> released
> > > > > > > from
> > > > > > >
> > > > >
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > > > > b.com
> > %2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a
> > > > >
> > 777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C63763622
> > > > >
> > 5764541982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> > > > >
> > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jNo5puUzWEOmWj3wIs8CN
> > > > > p44WmsoaRQGfsRdWgrftwE%3D&amp;reserved=0
> > > > > to crates.io (which is fine).
> > > > > > > Are
> > > > > > > you happy with keeping development in the jorgecarleitao repo
> > > > > > > where you will retain maximal control and flexibility until it
> is
> > > > > > > ready to start integrating?
> > > > > > >
> > > > > > > Or would you prefer to put it into one of the apache repos and
> > > > > > > subject
> > > > > > its
> > > > > > > development and release to the normal Arrow governance model
> > > > > > > (tarball, vote, etc)?
> > > > > > >
> > > > > > > Since you are the primary author/architect I think you should
> > have
> > > > > > > a substantial say at this stage.
> > > > > > >
> > > > > > > Andrew
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <
> > alamb@influxdata.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > I would be happy with this approach. Thank you for the
> > > > > > > > suggestion
> > > > > > > >
> > > > > > > > This hybrid approach of both arrow and arrow2 in the same
> repo
> > > > > > > > seems better to me than separate repos.
> > > > > > > >
> > > > > > > > What I really care about is ensuring we don't have two
> > > > > > > > crates/APIs indefinitely -- as long as we are continually
> > making
> > > > > > > > progress towards unification that is what is important to me.
> > > > > > > >
> > > > > > > > Andrew
> > > > > > > >
> > > > > > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove
> > > > > > > > <an...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Apologies for being late to this discussion.
> > > > > > > >>
> > > > > > > >> There is a hybrid option to consider here where we add the
> > > > > > > >> arrow2 code into the arrow crate as a separate module, so we
> > > > > > > >> release one crate
> > > > > > containing
> > > > > > > >> the "old" API (which we can mark as deprecated) as well as
> the
> > > > > > > >> new
> > > > > > API.
> > > > > > > >> Java did a similar thing a long time ago with "java.io"
> > versus
> > > > > > > "java.nio"
> > > > > > > >> (new IO).
> > > > > > > >>
> > > > > > > >> I agree that the versioning wouldn't be ideal, but this
> seems
> > > > > > > >> like it might be a pragmatic compromise?
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Andy.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb
> > > > > > > >> <al...@influxdata.com>
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> > What I meant is that when you decide arrow2 is suitable
> for
> > > > > > > >> > release
> > > > > > to
> > > > > > > >> > existing arrow users, I stand ready to help you
> incorporate
> > > > > > > >> > it into
> > > > > > > >> arrow.
> > > > > > > >> >
> > > > > > > >> > All the feedback I have heard so far from the rest of the
> > > > > > > >> > community
> > > > > > is
> > > > > > > >> that
> > > > > > > >> > we are ready. One might even say we are anxious to do so
> :)
> > > > > > > >> >
> > > > > > > >> > Andrew
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Posted by Jorge Cardoso Leitão <jo...@gmail.com>.
Hi,

Thanks for your input.

Every time there is a new major release, all new development shifts towards
that new API and users of previous APIs are left behind. It is not just a
matter of SemVer and size of version numbers, there is a whole development
shift to be on top of the new API.

I disagree that a software that has a major release every 3 months and no
maintenance window over previous versions is stable. I alluded to the Tokio
example because Tokio 1.0 recently became the runtime of rust-based AWS
lambda functions [1]; this commitment is only possible by enforcing API
stability and maintenance beyond a 3 month period (at least 3 years in
their case).

Also, imo the current major version number is not meaningless: divided by
the software age, it constitutes the historical release pattern and is
usually a good predictor of the pattern used in future releases.

The evidence is that we haven't been able to support any version for any
period of time; recently, Andrew has been doing amazing work at supporting
the latest version for a period of 3 months. I.e. an application that
depends on `arrow = ^5.0` has a support window of 3 months. Given that we
have not backported any security fixes to previous versions, it is
reasonable to assume that security patches are also applied within a 3
month period only.

As contributor of arrow2, I would rather not have arrow2 under Apache Arrow
than having to release it under its current versioning and scheduling (this
is similar to some of Julia's concerns). As a contributor to the Apache
Arrow, I currently cannot guarantee a maintenance window over arrow-rs for
any period of time because it is unsafe by design and I do not have the
motivation to fix it. As both, I am confident that the core arrow2 will
soon reach a point where we can live with and develop on top of it for at
least a year. This is not true to the whole API surface, though: there are
APIs that we will need to change more often until stability can be promised.

So, I am requesting that we tie the discussion of arrow2 to how it will be
released.

Could a middle ground be somewhere along the lines of splitting the crate
in smaller crates that are versioned independently. I.e. continue to
release `arrow` under the same versioning and cadence, and create 3 new
crates, arrow-core, arrow-compute, and arrow-io (see also [2]) that would
have their own versioning at 0.X until stability is achieved, based on
arrow2's code base. The migration of the `arrow` crate to arrow2's API
would be to re-export from the smaller crates (e.g. `pub use
arrow_core::array`).

[1] https://crates.io/crates/lambda_runtime/0.3.1/dependencies
[2] https://github.com/jorgecarleitao/arrow2/issues/257

Best,
Jorge


On Thu, Aug 5, 2021 at 11:53 PM Adam Lippai <ad...@rigo.sk> wrote:

> Not taking sides, just two technical notes below.
>
> Server.org clearly defines (
> https://semver.org/#how-do-i-know-when-to-release-100) the versions
> >1.0.0.
> * If it's used in production, it's 1.0.0.
> * If it provides an API others depend on then it's 1.0.0.
> * If you intend to keep backward compatibility, it's 1.0.0.
> Tl;Dr 1.0.0 represents a version which from point we guarantee that
> non-production releases are marked (alpha, beta, rc) and breaking (API)
> changes, backwards incompatible changes result in major version bump. This
> we already do, 4x per year.
>
> The second fact is that arrow2 uses the arrow name, but it doesn't have
> apache governance. It's not released from GitHub.com/apache, there are no
> formal releases, there are no votes. This is not correct or fair usage of
> the brand (on the same level as DataFuse, or db-benchmark calling a custom
> R implementation arrow) even if it's "unofficial". My understanding is that
> arrow2 can be an unofficial implementation with a different name or an
> arrow-rs experiment with the intention to merge the code, but not both.
>
> I think both issues could be solved and I really value and like the arrow2
> work so far. That's the right way. I hope we'll see it in prod either way
> as soon as it's ready.
>
> Best regards,
> Adam Lippai
>
> On Wed, Aug 4, 2021, 08:25 QP Hou <ho...@gmail.com> wrote:
>
> > Just my two cents.
> >
> > I think we all have the same goal here, which is to accelerate the
> > transitioning of arrow to arrow2 as the official arrow rust
> > implementation.
> >
> > In my opinion, the biggest gain we can get from merging two projects
> > into one repo is to have some kind of a policy to enforce that every
> > new feature/test added to the current arrow implementation also  needs
> > to be added to the arrow2 implementation. This way, we can make sure
> > the gap between arrow and arrow2 is closing on every iteration.
> > Without this, I tend to agree with Jorge that merging two repos would
> > add more overhead to his work and slow him down.
> >
> > For those who want to contribute to arrow2 to accelerate the
> > transition, I don't think they would have problem sending PRs to the
> > arrow2 repo. For those who are not interested in contributing to
> > arrow2, merging the arrow2 code base into the current arrow-rs repo
> > won't incentivize them to contribute. Merging arrow2 into current
> > arrow-rs repo could help with discovery. But I think this can be
> > achieved by adding a big note in the current arrow-rs README to
> > encourage contributions to the arrow2 repo as well.
> >
> > At the end of the day, Jorge is currently the sole active contributor
> > to the arrow2 implementation, so I think he would have the most say on
> > what's the most productive way to push arrow2 forward. The only
> > concern I have with regards to merging arrow2 into arrow-rs right now
> > is Jorge spent all the efforts to do the merge, then it turned out
> > that he is still the only active contributor to arrow2 within
> > arrow-rs, but with more overhead that he has to deal with.
> >
> > As for maintaining semantic versioning for arrow2, Andy had a good
> > point that we could still release arrow2 with its own versioning even
> > if we merge it into the arrow-rs repo. So I don't think we should
> > worry/focus too much about versioning in our discussion. Velocity to
> > close the gap between arrow-rs and arrow2 is the most important thing.
> >
> > Lastly, I do agree with Andrew that it would be good to only maintain
> > a single arrow crate in crates.io in the long run. As he mentioned,
> > when the current arrow2 code base becomes stable, we could still
> > release it under the arrow namespace in crates.io with a major version
> > bump. The absolute value in the major version doesn't really matter as
> > long as we stick to the convention that breaking change will result in
> > a major version bump.
> >
> > Thanks,
> > QP
> >
> >
> >
> > On Tue, Aug 3, 2021 at 5:31 PM paddy horan <pa...@hotmail.com>
> wrote:
> > >
> > > Hi Jorge,
> > >
> > > I see value in consolidating development in a single repo and releasing
> > under the existing arrow crate.  Regarding versioning, I think once we
> > follow semantic versioning we are fine.  I don't think it's worth
> migrating
> > to a different repo and crate to comply with the de-facto standard you
> > mention.
> > >
> > > Just one person's opinion though,
> > > Paddy
> > >
> > >
> > > -----Original Message-----
> > > From: Jorge Cardoso Leitão <jo...@gmail.com>
> > > Sent: Tuesday, August 3, 2021 5:23 PM
> > > To: dev@arrow.apache.org
> > > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> > >
> > > Hi Paddy,
> > >
> > > > What do you think about moving Arrow2 into the main Arrow repo where
> > > > it
> > > is only enabled via an "experimental" feature flag?
> > >
> > > AFAIK this is already possible:
> > > * add `arrow2 = { version = "0.2.0", optional = true }` to Cargo.toml
> > > * add `#[cfg(feature = "arrow2")]\npub mod arrow2;\n` to lib.rs
> > >
> > > We do this kind of thing to expose APIs from non-arrow crates such as
> > parts of the parquet-format-rs crate, and is generally the way to go
> when a
> > crate wants to expose a third-party API.
> > >
> > > I would not recommend doing this, though: by exposing arrow2 from
> arrow,
> > we double the compilation time and binary size of all dependencies that
> > activate the flag. Furthermore, there are users of arrow2 that do not
> need
> > the arrow crate, which this model would not support.
> > >
> > > AFAIK where development happens is unrelated to this aspect, Rust
> > enables this by design.
> > >
> > > > but also this would be a clear signal that Arrow2 is <1.0.
> > > > the experimental flag will be a clear signal to the existing Arrow
> > > community that Arrow2 is the future but that it is <1.0
> > >
> > > arrow2 is already <1.0 <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bJEw92M9Lz8cxJZ0o3vc0ezpou%2BuQx1S0MYeODKCKmE%3D&amp;reserved=0
> >.
> > My argument is that the arrow/arrow-flight/parquet are not versioned
> > according to the Rust community standards: It is a de facto practice in
> > Rust to delay major releases until the API is stable. Tokio's blog post
> > about their 1.0 <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=En8p4k7Etyc%2BnQ3mJC4woQD%2Fkt7Uhmhw%2Bzf8scHhdgQ%3D&amp;reserved=0
> >
> > (i.e. "[...] we commit to holding back on a Tokio 2.0 release for at
> least
> > 3 years."). 10 most downloaded
> > > crates:
> > >
> > > *
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sBxp1XYBLl6OIV57nM%2FGsZO0AmbgyBeRaoPANEvdZGE%3D&amp;reserved=0
> > (0.8.4)
> > > *
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fsyn&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oeQliVwSgrvgART7r49XeiM%2F72TYa7hX8M3QyVDrqsk%3D&amp;reserved=0
> > (1.0.74)
> > > *
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Flibc&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OULOu9vhaWEgnavRqedebM7ceZRsVnaF7YjYuq1MJ3Y%3D&amp;reserved=0
> > (0.2.98)
> > > *
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand_core&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mx6X86bNRis6UykbWR%2FWTGEgAjq8h6JylmOSAQlfsh0%3D&amp;reserved=0
> > (0.6.3)
> > > * quote (1.0.9)
> > > * unicode-xid (0.2.2)
> > > * proc-macro2 (1.0.28)
> > > * cfg-if (1.0.0)
> > > *
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fserde&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p%2FNgTB0839C1%2F1Zn4GeEnRtvr0hiFhOuBJ5tF76aW5E%3D&amp;reserved=0
> > (1.0.126)
> > > * bitflags (1.2.1)
> > >
> > > These are small crates with a small scope, but even larger projects
> > share the same pattern:
> > >
> > > * crossbeam <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fcrossbeam&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9C%2BX5DnKLpp%2F8aTGrmKNB73Jf5JanlL4OhuC0YKgw9s%3D&amp;reserved=0
> >
> > (0.8.1)
> > > * rocket <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frocket&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Jh93g%2BiXxoeKlTNzhaOKvs3bsBfIJO3DJeetBI3nBV0%3D&amp;reserved=0
> >
> > (0.5)
> > > * polars <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fpolars&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Pdzno7bF3oqviXmv6nxInZemHD1d0SsaxmfdUxJ57T0%3D&amp;reserved=0
> >
> > (0.14.8)
> > > * tower <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftower&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AmUGvrzXd8giphnKq0FNwjnc4a4Ki3T3GJL3P8rvEeM%3D&amp;reserved=0
> >
> > (0.4.8)
> > > * Tokio <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftokio&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z%2FqBVQ%2Fi0BCmSJiBL7E6y%2F%2BbMVGKYXdo3oCRGOjm5UA%3D&amp;reserved=0
> >
> > (1.9.0)
> > > * hyper <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fhyper&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=c%2Fy4eY0BQCXE8XIoSb6UZAVUx4U%2BwcRUKN9jGJs5v3w%3D&amp;reserved=0
> >
> > (0.14.11)
> > >
> > > Crates that arrow depends on
> > > <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Fblob%2Fmaster%2Farrow%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DdGZFC5Hf7i362%2FmhfFQUVVPnkDBJzw0zM6AzQ4jgcQ%3D&amp;reserved=0
> > >,
> > > that DataFusion
> > > depends on
> > > <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-datafusion%2Fblob%2Fmaster%2Fdatafusion%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OXKyW4O6q4hn6ZCHTN2jIvJpI3Iv8JvBBa0zKzBgZag%3D&amp;reserved=0
> > >,
> > > all share the same pattern of being either 0.X, 1.X when their API is
> > stable, and 2.X when they needed a large change in the API. This
> contrasts
> > with Apache Arrow's releases where we are now at 5.0 (and we have yet to
> > arrive at a safe design).
> > >
> > > > existing users will be well supported in this transition
> > >
> > > How so? imo people either PR to the arrow/arrow2 code base or they
> won't.
> > > This is largely independent of where the development of either arrow2
> or
> > arrow happens; people google the crate, click on the repository link and
> > file an issue or field a PR.
> > >
> > > > In general, I think the longer that development proceeds in separate
> > > repos the harder it will be to eventually merge the two in a way that
> > supports existing users.
> > >
> > > How so? I may be mistaken, but API design is unrelated to on which repo
> > the development happens: it is primarily driven by who is designing it
> and
> > from where or who they are inspired by. Both arrow and parquet's crate
> > design are inspired by the C++ implementation and have gradually been
> > migrated to "idiomatic" Rust, as "idiomatic" is becoming more well
> defined
> > in Rust.
> > > Arrow2 is inspired by the current crate and the pains of using it in
> > DataFusion. Datafuse, a fork of datafusion, recently migrated to arrow2
> > > <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdatafuselabs%2Fdatafuse%2Fpull%2F1239&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0W9AeIxXcAvCrXkOE%2F1h0o%2BWam15PHEP7Pf7U1L84As%3D&amp;reserved=0
> >:
> > +1,947 −3,484, which shows that the crate is capturing important patterns
> > from the arrow crate and exposing ones that are useful / result in less
> > code for the same or higher performance.
> > >
> > > On the opposite side, merging the development of crates under the same
> > repo leads to: more triagging of PRs; more work for releases and
> > changelogging; tagging based on crates; multiple READMEs in subpaths of
> the
> > repo, curation of the CI to accommodate this, a workspace with many
> crates
> > each with its own set of dependencies, increasing compilation and
> > development; mixed commit logs, difficulties in reverts and cherry-picks;
> > more difficult to find stuff in the repo. See e.g. how tokio-rs does it:
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nZUiKNr1DmeTNJLqiZgKX5P7nb6jt0OuZlufMywmDBE%3D&amp;reserved=0
> ,
> > even for small crates like bytes <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs%2Fbytes&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ltf66TZejbomCtlqvhmDswFfdrunChIz5rDTeZzwyRU%3D&amp;reserved=0
> > >.
> > >
> > > Best,
> > > Jorge
> > >
> > > On Tue, Aug 3, 2021 at 3:13 PM paddy horan <pa...@hotmail.com>
> > wrote:
> > >
> > > > Hi Jorge,
> > > >
> > > > What do you think about moving Arrow2 into the main Arrow repo where
> > > > it is only enabled via an "experimental" feature flag?  This would
> > > > allow development of Arrow2 to proceed in the main repo but also this
> > > > would be a clear signal that Arrow2 is <1.0.  When we feel ready
> (i.e.
> > > > Arrow2 is 1.0) we can release it in the next main release with Arrow2
> > > > being the default and move the existing implementation behind a
> > "legacy" feature flag.
> > > >
> > > > Here is why I think this might work well:
> > > >  - People contributing to the Arrow project will naturally contribute
> > > > to Arrow2.  At the moment, some people will still contribute to Arrow
> > > > instead of Arrow2 just by virtue of it being the "official"
> > implementation.
> > > > However, if both are in one repo people will want to contribute to
> the
> > > > "future", i.e. Arrow2.
> > > >  - the experimental flag will be a clear signal to the existing Arrow
> > > > community that Arrow2 is the future but that it is <1.0
> > > >  - existing users will be well supported in this transition
> > > >  - In general, I think the longer that development proceeds in
> > > > separate repos the harder it will be to eventually merge the two in a
> > > > way that supports existing users.
> > > >
> > > > Do you think would work?
> > > >
> > > > Paddy
> > > >
> > > > -----Original Message-----
> > > > From: Jorge Cardoso Leitão <jo...@gmail.com>
> > > > Sent: Monday, August 2, 2021 1:59 PM
> > > > To: dev@arrow.apache.org
> > > > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> > > >
> > > > Hi,
> > > >
> > > > Sorry for the delay.
> > > >
> > > > If there is a path towards an official release under a <1.0.0
> > > > versioning schema aligned with the rest of the Rust ecosystem and in
> > > > line with the stability of the API, then IMO we should move all
> > > > development to within Apache experimental asap (I can handle this and
> > > > the likely IP clearance round). If we require a release >=1.X.Y to it
> > > > and/or a schedule, then I prefer to keep expectations aligned and
> > postpone any movement.
> > > >
> > > > Under the move situation, I was thinking in something as follows:
> > > >
> > > > * gradually stop maintaining "arrow" in crates, offering a
> maintenance
> > > > window over which we release patches (*)
> > > > * work towards achieving feature parity on arrow2/parquet2 on the
> > > > experimental repos.
> > > > * keep releasing arrow2/parquet2 under a 0.X model during the step
> > > > above
> > > > (**)
> > > > * migrate to arrow-rs and archive experimentals (***)
> > > > * break arrow2 in smaller crates so that we can version the APIs at a
> > > > different cadence
> > > > * once a crate reaches some stability (this is always opinionated,
> but
> > > > it is fine), we bump it to 1.0 and announce a maintenance plan ala
> > > > tokio <
> > > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio
> > > >
> .rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a7
> > > >
> 77b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225
> > > >
> 764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> > > >
> LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oHPQI8MeSumgLTEsawCkRN
> > > > 5hANft%2BkbLTEmLZ3pIDiU%3D&amp;reserved=0
> > > > >.
> > > >
> > > > (*) e.g. "we will continue to patch the arrow crate up to at least 6
> > > > months starting after the first release of arrow2 that supports
> > > > a) nested parquet read and write
> > > > b) union array (including IPC integration tests)
> > > > c) map array (including IPC integration tests)"
> > > >
> > > > (**) officially or un-officially (I would suggest officially so that
> > > > we can acknowledge everyone's work on it, but no strong feelings)
> > > >
> > > > (***) something like:
> > > > 1. place arrow2 on top of a clear arrow repo so that the full
> > > > contribution history up to that point preserved 2. make arrow-rs the
> > > > home of arrow2 (i.e. we start releasing arrow2 from
> > > > arrow-rs) and archive the experimental repos; create arrow-rs-parquet
> > > > or something for parquet2.
> > > >
> > > > In summary, the core pain point for me is the current versioning of
> > > > arrow, which I feel is incompatible with my goals for arrow2 and the
> > > > ecosystem I envision it supporting :)
> > > >
> > > > Best,
> > > > Jorge
> > > >
> > > > On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <we...@gmail.com>
> > wrote:
> > > >
> > > > > I think it would also be fine to push "beta" arrow2 crates out of a
> > > > > repo under apache/ so long as they are not marked on crates.io as
> > > > > being Apache-official releases. There's a possible slippery slope
> > > > > there, but as long as we are on a path to formalizing the releases
> I
> > > > think it is okay.
> > > > >
> > > > > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com>
> > > > wrote:
> > > > >
> > > > > > Jorge -- do you feel like we have a resolution on what to do with
> > > > > > arrow2
> > > > > in
> > > > > > the near term?
> > > > > >
> > > > > > The current state of affairs seems to me that arrow2 is released
> > > > > > from
> > > > > >
> > > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > > > b.com
> %2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a
> > > >
> 777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C63763622
> > > >
> 5764541982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> > > >
> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jNo5puUzWEOmWj3wIs8CN
> > > > p44WmsoaRQGfsRdWgrftwE%3D&amp;reserved=0
> > > > to crates.io (which is fine).
> > > > > > Are
> > > > > > you happy with keeping development in the jorgecarleitao repo
> > > > > > where you will retain maximal control and flexibility until it is
> > > > > > ready to start integrating?
> > > > > >
> > > > > > Or would you prefer to put it into one of the apache repos and
> > > > > > subject
> > > > > its
> > > > > > development and release to the normal Arrow governance model
> > > > > > (tarball, vote, etc)?
> > > > > >
> > > > > > Since you are the primary author/architect I think you should
> have
> > > > > > a substantial say at this stage.
> > > > > >
> > > > > > Andrew
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <
> alamb@influxdata.com>
> > > > > wrote:
> > > > > >
> > > > > > > I would be happy with this approach. Thank you for the
> > > > > > > suggestion
> > > > > > >
> > > > > > > This hybrid approach of both arrow and arrow2 in the same repo
> > > > > > > seems better to me than separate repos.
> > > > > > >
> > > > > > > What I really care about is ensuring we don't have two
> > > > > > > crates/APIs indefinitely -- as long as we are continually
> making
> > > > > > > progress towards unification that is what is important to me.
> > > > > > >
> > > > > > > Andrew
> > > > > > >
> > > > > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove
> > > > > > > <an...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Apologies for being late to this discussion.
> > > > > > >>
> > > > > > >> There is a hybrid option to consider here where we add the
> > > > > > >> arrow2 code into the arrow crate as a separate module, so we
> > > > > > >> release one crate
> > > > > containing
> > > > > > >> the "old" API (which we can mark as deprecated) as well as the
> > > > > > >> new
> > > > > API.
> > > > > > >> Java did a similar thing a long time ago with "java.io"
> versus
> > > > > > "java.nio"
> > > > > > >> (new IO).
> > > > > > >>
> > > > > > >> I agree that the versioning wouldn't be ideal, but this seems
> > > > > > >> like it might be a pragmatic compromise?
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Andy.
> > > > > > >>
> > > > > > >>
> > > > > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb
> > > > > > >> <al...@influxdata.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > What I meant is that when you decide arrow2 is suitable for
> > > > > > >> > release
> > > > > to
> > > > > > >> > existing arrow users, I stand ready to help you incorporate
> > > > > > >> > it into
> > > > > > >> arrow.
> > > > > > >> >
> > > > > > >> > All the feedback I have heard so far from the rest of the
> > > > > > >> > community
> > > > > is
> > > > > > >> that
> > > > > > >> > we are ready. One might even say we are anxious to do so :)
> > > > > > >> >
> > > > > > >> > Andrew
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Posted by Adam Lippai <ad...@rigo.sk>.
Not taking sides, just two technical notes below.

Server.org clearly defines (
https://semver.org/#how-do-i-know-when-to-release-100) the versions >1.0.0.
* If it's used in production, it's 1.0.0.
* If it provides an API others depend on then it's 1.0.0.
* If you intend to keep backward compatibility, it's 1.0.0.
Tl;Dr 1.0.0 represents a version which from point we guarantee that
non-production releases are marked (alpha, beta, rc) and breaking (API)
changes, backwards incompatible changes result in major version bump. This
we already do, 4x per year.

The second fact is that arrow2 uses the arrow name, but it doesn't have
apache governance. It's not released from GitHub.com/apache, there are no
formal releases, there are no votes. This is not correct or fair usage of
the brand (on the same level as DataFuse, or db-benchmark calling a custom
R implementation arrow) even if it's "unofficial". My understanding is that
arrow2 can be an unofficial implementation with a different name or an
arrow-rs experiment with the intention to merge the code, but not both.

I think both issues could be solved and I really value and like the arrow2
work so far. That's the right way. I hope we'll see it in prod either way
as soon as it's ready.

Best regards,
Adam Lippai

On Wed, Aug 4, 2021, 08:25 QP Hou <ho...@gmail.com> wrote:

> Just my two cents.
>
> I think we all have the same goal here, which is to accelerate the
> transitioning of arrow to arrow2 as the official arrow rust
> implementation.
>
> In my opinion, the biggest gain we can get from merging two projects
> into one repo is to have some kind of a policy to enforce that every
> new feature/test added to the current arrow implementation also  needs
> to be added to the arrow2 implementation. This way, we can make sure
> the gap between arrow and arrow2 is closing on every iteration.
> Without this, I tend to agree with Jorge that merging two repos would
> add more overhead to his work and slow him down.
>
> For those who want to contribute to arrow2 to accelerate the
> transition, I don't think they would have problem sending PRs to the
> arrow2 repo. For those who are not interested in contributing to
> arrow2, merging the arrow2 code base into the current arrow-rs repo
> won't incentivize them to contribute. Merging arrow2 into current
> arrow-rs repo could help with discovery. But I think this can be
> achieved by adding a big note in the current arrow-rs README to
> encourage contributions to the arrow2 repo as well.
>
> At the end of the day, Jorge is currently the sole active contributor
> to the arrow2 implementation, so I think he would have the most say on
> what's the most productive way to push arrow2 forward. The only
> concern I have with regards to merging arrow2 into arrow-rs right now
> is Jorge spent all the efforts to do the merge, then it turned out
> that he is still the only active contributor to arrow2 within
> arrow-rs, but with more overhead that he has to deal with.
>
> As for maintaining semantic versioning for arrow2, Andy had a good
> point that we could still release arrow2 with its own versioning even
> if we merge it into the arrow-rs repo. So I don't think we should
> worry/focus too much about versioning in our discussion. Velocity to
> close the gap between arrow-rs and arrow2 is the most important thing.
>
> Lastly, I do agree with Andrew that it would be good to only maintain
> a single arrow crate in crates.io in the long run. As he mentioned,
> when the current arrow2 code base becomes stable, we could still
> release it under the arrow namespace in crates.io with a major version
> bump. The absolute value in the major version doesn't really matter as
> long as we stick to the convention that breaking change will result in
> a major version bump.
>
> Thanks,
> QP
>
>
>
> On Tue, Aug 3, 2021 at 5:31 PM paddy horan <pa...@hotmail.com> wrote:
> >
> > Hi Jorge,
> >
> > I see value in consolidating development in a single repo and releasing
> under the existing arrow crate.  Regarding versioning, I think once we
> follow semantic versioning we are fine.  I don't think it's worth migrating
> to a different repo and crate to comply with the de-facto standard you
> mention.
> >
> > Just one person's opinion though,
> > Paddy
> >
> >
> > -----Original Message-----
> > From: Jorge Cardoso Leitão <jo...@gmail.com>
> > Sent: Tuesday, August 3, 2021 5:23 PM
> > To: dev@arrow.apache.org
> > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> >
> > Hi Paddy,
> >
> > > What do you think about moving Arrow2 into the main Arrow repo where
> > > it
> > is only enabled via an "experimental" feature flag?
> >
> > AFAIK this is already possible:
> > * add `arrow2 = { version = "0.2.0", optional = true }` to Cargo.toml
> > * add `#[cfg(feature = "arrow2")]\npub mod arrow2;\n` to lib.rs
> >
> > We do this kind of thing to expose APIs from non-arrow crates such as
> parts of the parquet-format-rs crate, and is generally the way to go when a
> crate wants to expose a third-party API.
> >
> > I would not recommend doing this, though: by exposing arrow2 from arrow,
> we double the compilation time and binary size of all dependencies that
> activate the flag. Furthermore, there are users of arrow2 that do not need
> the arrow crate, which this model would not support.
> >
> > AFAIK where development happens is unrelated to this aspect, Rust
> enables this by design.
> >
> > > but also this would be a clear signal that Arrow2 is <1.0.
> > > the experimental flag will be a clear signal to the existing Arrow
> > community that Arrow2 is the future but that it is <1.0
> >
> > arrow2 is already <1.0 <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bJEw92M9Lz8cxJZ0o3vc0ezpou%2BuQx1S0MYeODKCKmE%3D&amp;reserved=0>.
> My argument is that the arrow/arrow-flight/parquet are not versioned
> according to the Rust community standards: It is a de facto practice in
> Rust to delay major releases until the API is stable. Tokio's blog post
> about their 1.0 <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=En8p4k7Etyc%2BnQ3mJC4woQD%2Fkt7Uhmhw%2Bzf8scHhdgQ%3D&amp;reserved=0>
> (i.e. "[...] we commit to holding back on a Tokio 2.0 release for at least
> 3 years."). 10 most downloaded
> > crates:
> >
> > *
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sBxp1XYBLl6OIV57nM%2FGsZO0AmbgyBeRaoPANEvdZGE%3D&amp;reserved=0
> (0.8.4)
> > *
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fsyn&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oeQliVwSgrvgART7r49XeiM%2F72TYa7hX8M3QyVDrqsk%3D&amp;reserved=0
> (1.0.74)
> > *
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Flibc&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OULOu9vhaWEgnavRqedebM7ceZRsVnaF7YjYuq1MJ3Y%3D&amp;reserved=0
> (0.2.98)
> > *
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand_core&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mx6X86bNRis6UykbWR%2FWTGEgAjq8h6JylmOSAQlfsh0%3D&amp;reserved=0
> (0.6.3)
> > * quote (1.0.9)
> > * unicode-xid (0.2.2)
> > * proc-macro2 (1.0.28)
> > * cfg-if (1.0.0)
> > *
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fserde&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p%2FNgTB0839C1%2F1Zn4GeEnRtvr0hiFhOuBJ5tF76aW5E%3D&amp;reserved=0
> (1.0.126)
> > * bitflags (1.2.1)
> >
> > These are small crates with a small scope, but even larger projects
> share the same pattern:
> >
> > * crossbeam <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fcrossbeam&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9C%2BX5DnKLpp%2F8aTGrmKNB73Jf5JanlL4OhuC0YKgw9s%3D&amp;reserved=0>
> (0.8.1)
> > * rocket <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frocket&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Jh93g%2BiXxoeKlTNzhaOKvs3bsBfIJO3DJeetBI3nBV0%3D&amp;reserved=0>
> (0.5)
> > * polars <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fpolars&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Pdzno7bF3oqviXmv6nxInZemHD1d0SsaxmfdUxJ57T0%3D&amp;reserved=0>
> (0.14.8)
> > * tower <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftower&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AmUGvrzXd8giphnKq0FNwjnc4a4Ki3T3GJL3P8rvEeM%3D&amp;reserved=0>
> (0.4.8)
> > * Tokio <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftokio&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z%2FqBVQ%2Fi0BCmSJiBL7E6y%2F%2BbMVGKYXdo3oCRGOjm5UA%3D&amp;reserved=0>
> (1.9.0)
> > * hyper <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fhyper&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=c%2Fy4eY0BQCXE8XIoSb6UZAVUx4U%2BwcRUKN9jGJs5v3w%3D&amp;reserved=0>
> (0.14.11)
> >
> > Crates that arrow depends on
> > <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Fblob%2Fmaster%2Farrow%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DdGZFC5Hf7i362%2FmhfFQUVVPnkDBJzw0zM6AzQ4jgcQ%3D&amp;reserved=0
> >,
> > that DataFusion
> > depends on
> > <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-datafusion%2Fblob%2Fmaster%2Fdatafusion%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OXKyW4O6q4hn6ZCHTN2jIvJpI3Iv8JvBBa0zKzBgZag%3D&amp;reserved=0
> >,
> > all share the same pattern of being either 0.X, 1.X when their API is
> stable, and 2.X when they needed a large change in the API. This contrasts
> with Apache Arrow's releases where we are now at 5.0 (and we have yet to
> arrive at a safe design).
> >
> > > existing users will be well supported in this transition
> >
> > How so? imo people either PR to the arrow/arrow2 code base or they won't.
> > This is largely independent of where the development of either arrow2 or
> arrow happens; people google the crate, click on the repository link and
> file an issue or field a PR.
> >
> > > In general, I think the longer that development proceeds in separate
> > repos the harder it will be to eventually merge the two in a way that
> supports existing users.
> >
> > How so? I may be mistaken, but API design is unrelated to on which repo
> the development happens: it is primarily driven by who is designing it and
> from where or who they are inspired by. Both arrow and parquet's crate
> design are inspired by the C++ implementation and have gradually been
> migrated to "idiomatic" Rust, as "idiomatic" is becoming more well defined
> in Rust.
> > Arrow2 is inspired by the current crate and the pains of using it in
> DataFusion. Datafuse, a fork of datafusion, recently migrated to arrow2
> > <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdatafuselabs%2Fdatafuse%2Fpull%2F1239&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0W9AeIxXcAvCrXkOE%2F1h0o%2BWam15PHEP7Pf7U1L84As%3D&amp;reserved=0>:
> +1,947 −3,484, which shows that the crate is capturing important patterns
> from the arrow crate and exposing ones that are useful / result in less
> code for the same or higher performance.
> >
> > On the opposite side, merging the development of crates under the same
> repo leads to: more triagging of PRs; more work for releases and
> changelogging; tagging based on crates; multiple READMEs in subpaths of the
> repo, curation of the CI to accommodate this, a workspace with many crates
> each with its own set of dependencies, increasing compilation and
> development; mixed commit logs, difficulties in reverts and cherry-picks;
> more difficult to find stuff in the repo. See e.g. how tokio-rs does it:
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nZUiKNr1DmeTNJLqiZgKX5P7nb6jt0OuZlufMywmDBE%3D&amp;reserved=0,
> even for small crates like bytes <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs%2Fbytes&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ltf66TZejbomCtlqvhmDswFfdrunChIz5rDTeZzwyRU%3D&amp;reserved=0
> >.
> >
> > Best,
> > Jorge
> >
> > On Tue, Aug 3, 2021 at 3:13 PM paddy horan <pa...@hotmail.com>
> wrote:
> >
> > > Hi Jorge,
> > >
> > > What do you think about moving Arrow2 into the main Arrow repo where
> > > it is only enabled via an "experimental" feature flag?  This would
> > > allow development of Arrow2 to proceed in the main repo but also this
> > > would be a clear signal that Arrow2 is <1.0.  When we feel ready (i.e.
> > > Arrow2 is 1.0) we can release it in the next main release with Arrow2
> > > being the default and move the existing implementation behind a
> "legacy" feature flag.
> > >
> > > Here is why I think this might work well:
> > >  - People contributing to the Arrow project will naturally contribute
> > > to Arrow2.  At the moment, some people will still contribute to Arrow
> > > instead of Arrow2 just by virtue of it being the "official"
> implementation.
> > > However, if both are in one repo people will want to contribute to the
> > > "future", i.e. Arrow2.
> > >  - the experimental flag will be a clear signal to the existing Arrow
> > > community that Arrow2 is the future but that it is <1.0
> > >  - existing users will be well supported in this transition
> > >  - In general, I think the longer that development proceeds in
> > > separate repos the harder it will be to eventually merge the two in a
> > > way that supports existing users.
> > >
> > > Do you think would work?
> > >
> > > Paddy
> > >
> > > -----Original Message-----
> > > From: Jorge Cardoso Leitão <jo...@gmail.com>
> > > Sent: Monday, August 2, 2021 1:59 PM
> > > To: dev@arrow.apache.org
> > > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> > >
> > > Hi,
> > >
> > > Sorry for the delay.
> > >
> > > If there is a path towards an official release under a <1.0.0
> > > versioning schema aligned with the rest of the Rust ecosystem and in
> > > line with the stability of the API, then IMO we should move all
> > > development to within Apache experimental asap (I can handle this and
> > > the likely IP clearance round). If we require a release >=1.X.Y to it
> > > and/or a schedule, then I prefer to keep expectations aligned and
> postpone any movement.
> > >
> > > Under the move situation, I was thinking in something as follows:
> > >
> > > * gradually stop maintaining "arrow" in crates, offering a maintenance
> > > window over which we release patches (*)
> > > * work towards achieving feature parity on arrow2/parquet2 on the
> > > experimental repos.
> > > * keep releasing arrow2/parquet2 under a 0.X model during the step
> > > above
> > > (**)
> > > * migrate to arrow-rs and archive experimentals (***)
> > > * break arrow2 in smaller crates so that we can version the APIs at a
> > > different cadence
> > > * once a crate reaches some stability (this is always opinionated, but
> > > it is fine), we bump it to 1.0 and announce a maintenance plan ala
> > > tokio <
> > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio
> > > .rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a7
> > > 77b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225
> > > 764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> > > LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oHPQI8MeSumgLTEsawCkRN
> > > 5hANft%2BkbLTEmLZ3pIDiU%3D&amp;reserved=0
> > > >.
> > >
> > > (*) e.g. "we will continue to patch the arrow crate up to at least 6
> > > months starting after the first release of arrow2 that supports
> > > a) nested parquet read and write
> > > b) union array (including IPC integration tests)
> > > c) map array (including IPC integration tests)"
> > >
> > > (**) officially or un-officially (I would suggest officially so that
> > > we can acknowledge everyone's work on it, but no strong feelings)
> > >
> > > (***) something like:
> > > 1. place arrow2 on top of a clear arrow repo so that the full
> > > contribution history up to that point preserved 2. make arrow-rs the
> > > home of arrow2 (i.e. we start releasing arrow2 from
> > > arrow-rs) and archive the experimental repos; create arrow-rs-parquet
> > > or something for parquet2.
> > >
> > > In summary, the core pain point for me is the current versioning of
> > > arrow, which I feel is incompatible with my goals for arrow2 and the
> > > ecosystem I envision it supporting :)
> > >
> > > Best,
> > > Jorge
> > >
> > > On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > > I think it would also be fine to push "beta" arrow2 crates out of a
> > > > repo under apache/ so long as they are not marked on crates.io as
> > > > being Apache-official releases. There's a possible slippery slope
> > > > there, but as long as we are on a path to formalizing the releases I
> > > think it is okay.
> > > >
> > > > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com>
> > > wrote:
> > > >
> > > > > Jorge -- do you feel like we have a resolution on what to do with
> > > > > arrow2
> > > > in
> > > > > the near term?
> > > > >
> > > > > The current state of affairs seems to me that arrow2 is released
> > > > > from
> > > > >
> > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > > b.com%2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a
> > > 777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C63763622
> > > 5764541982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> > > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jNo5puUzWEOmWj3wIs8CN
> > > p44WmsoaRQGfsRdWgrftwE%3D&amp;reserved=0
> > > to crates.io (which is fine).
> > > > > Are
> > > > > you happy with keeping development in the jorgecarleitao repo
> > > > > where you will retain maximal control and flexibility until it is
> > > > > ready to start integrating?
> > > > >
> > > > > Or would you prefer to put it into one of the apache repos and
> > > > > subject
> > > > its
> > > > > development and release to the normal Arrow governance model
> > > > > (tarball, vote, etc)?
> > > > >
> > > > > Since you are the primary author/architect I think you should have
> > > > > a substantial say at this stage.
> > > > >
> > > > > Andrew
> > > > >
> > > > >
> > > > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <al...@influxdata.com>
> > > > wrote:
> > > > >
> > > > > > I would be happy with this approach. Thank you for the
> > > > > > suggestion
> > > > > >
> > > > > > This hybrid approach of both arrow and arrow2 in the same repo
> > > > > > seems better to me than separate repos.
> > > > > >
> > > > > > What I really care about is ensuring we don't have two
> > > > > > crates/APIs indefinitely -- as long as we are continually making
> > > > > > progress towards unification that is what is important to me.
> > > > > >
> > > > > > Andrew
> > > > > >
> > > > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove
> > > > > > <an...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> Apologies for being late to this discussion.
> > > > > >>
> > > > > >> There is a hybrid option to consider here where we add the
> > > > > >> arrow2 code into the arrow crate as a separate module, so we
> > > > > >> release one crate
> > > > containing
> > > > > >> the "old" API (which we can mark as deprecated) as well as the
> > > > > >> new
> > > > API.
> > > > > >> Java did a similar thing a long time ago with "java.io" versus
> > > > > "java.nio"
> > > > > >> (new IO).
> > > > > >>
> > > > > >> I agree that the versioning wouldn't be ideal, but this seems
> > > > > >> like it might be a pragmatic compromise?
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Andy.
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb
> > > > > >> <al...@influxdata.com>
> > > > > wrote:
> > > > > >>
> > > > > >> > What I meant is that when you decide arrow2 is suitable for
> > > > > >> > release
> > > > to
> > > > > >> > existing arrow users, I stand ready to help you incorporate
> > > > > >> > it into
> > > > > >> arrow.
> > > > > >> >
> > > > > >> > All the feedback I have heard so far from the rest of the
> > > > > >> > community
> > > > is
> > > > > >> that
> > > > > >> > we are ready. One might even say we are anxious to do so :)
> > > > > >> >
> > > > > >> > Andrew
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
>

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Posted by QP Hou <ho...@gmail.com>.
Just my two cents.

I think we all have the same goal here, which is to accelerate the
transitioning of arrow to arrow2 as the official arrow rust
implementation.

In my opinion, the biggest gain we can get from merging two projects
into one repo is to have some kind of a policy to enforce that every
new feature/test added to the current arrow implementation also  needs
to be added to the arrow2 implementation. This way, we can make sure
the gap between arrow and arrow2 is closing on every iteration.
Without this, I tend to agree with Jorge that merging two repos would
add more overhead to his work and slow him down.

For those who want to contribute to arrow2 to accelerate the
transition, I don't think they would have problem sending PRs to the
arrow2 repo. For those who are not interested in contributing to
arrow2, merging the arrow2 code base into the current arrow-rs repo
won't incentivize them to contribute. Merging arrow2 into current
arrow-rs repo could help with discovery. But I think this can be
achieved by adding a big note in the current arrow-rs README to
encourage contributions to the arrow2 repo as well.

At the end of the day, Jorge is currently the sole active contributor
to the arrow2 implementation, so I think he would have the most say on
what's the most productive way to push arrow2 forward. The only
concern I have with regards to merging arrow2 into arrow-rs right now
is Jorge spent all the efforts to do the merge, then it turned out
that he is still the only active contributor to arrow2 within
arrow-rs, but with more overhead that he has to deal with.

As for maintaining semantic versioning for arrow2, Andy had a good
point that we could still release arrow2 with its own versioning even
if we merge it into the arrow-rs repo. So I don't think we should
worry/focus too much about versioning in our discussion. Velocity to
close the gap between arrow-rs and arrow2 is the most important thing.

Lastly, I do agree with Andrew that it would be good to only maintain
a single arrow crate in crates.io in the long run. As he mentioned,
when the current arrow2 code base becomes stable, we could still
release it under the arrow namespace in crates.io with a major version
bump. The absolute value in the major version doesn't really matter as
long as we stick to the convention that breaking change will result in
a major version bump.

Thanks,
QP



On Tue, Aug 3, 2021 at 5:31 PM paddy horan <pa...@hotmail.com> wrote:
>
> Hi Jorge,
>
> I see value in consolidating development in a single repo and releasing under the existing arrow crate.  Regarding versioning, I think once we follow semantic versioning we are fine.  I don't think it's worth migrating to a different repo and crate to comply with the de-facto standard you mention.
>
> Just one person's opinion though,
> Paddy
>
>
> -----Original Message-----
> From: Jorge Cardoso Leitão <jo...@gmail.com>
> Sent: Tuesday, August 3, 2021 5:23 PM
> To: dev@arrow.apache.org
> Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
>
> Hi Paddy,
>
> > What do you think about moving Arrow2 into the main Arrow repo where
> > it
> is only enabled via an "experimental" feature flag?
>
> AFAIK this is already possible:
> * add `arrow2 = { version = "0.2.0", optional = true }` to Cargo.toml
> * add `#[cfg(feature = "arrow2")]\npub mod arrow2;\n` to lib.rs
>
> We do this kind of thing to expose APIs from non-arrow crates such as parts of the parquet-format-rs crate, and is generally the way to go when a crate wants to expose a third-party API.
>
> I would not recommend doing this, though: by exposing arrow2 from arrow, we double the compilation time and binary size of all dependencies that activate the flag. Furthermore, there are users of arrow2 that do not need the arrow crate, which this model would not support.
>
> AFAIK where development happens is unrelated to this aspect, Rust enables this by design.
>
> > but also this would be a clear signal that Arrow2 is <1.0.
> > the experimental flag will be a clear signal to the existing Arrow
> community that Arrow2 is the future but that it is <1.0
>
> arrow2 is already <1.0 <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bJEw92M9Lz8cxJZ0o3vc0ezpou%2BuQx1S0MYeODKCKmE%3D&amp;reserved=0>. My argument is that the arrow/arrow-flight/parquet are not versioned according to the Rust community standards: It is a de facto practice in Rust to delay major releases until the API is stable. Tokio's blog post about their 1.0 <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=En8p4k7Etyc%2BnQ3mJC4woQD%2Fkt7Uhmhw%2Bzf8scHhdgQ%3D&amp;reserved=0> (i.e. "[...] we commit to holding back on a Tokio 2.0 release for at least 3 years."). 10 most downloaded
> crates:
>
> * https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sBxp1XYBLl6OIV57nM%2FGsZO0AmbgyBeRaoPANEvdZGE%3D&amp;reserved=0 (0.8.4)
> * https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fsyn&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oeQliVwSgrvgART7r49XeiM%2F72TYa7hX8M3QyVDrqsk%3D&amp;reserved=0 (1.0.74)
> * https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Flibc&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OULOu9vhaWEgnavRqedebM7ceZRsVnaF7YjYuq1MJ3Y%3D&amp;reserved=0 (0.2.98)
> * https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand_core&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mx6X86bNRis6UykbWR%2FWTGEgAjq8h6JylmOSAQlfsh0%3D&amp;reserved=0 (0.6.3)
> * quote (1.0.9)
> * unicode-xid (0.2.2)
> * proc-macro2 (1.0.28)
> * cfg-if (1.0.0)
> * https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fserde&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p%2FNgTB0839C1%2F1Zn4GeEnRtvr0hiFhOuBJ5tF76aW5E%3D&amp;reserved=0 (1.0.126)
> * bitflags (1.2.1)
>
> These are small crates with a small scope, but even larger projects share the same pattern:
>
> * crossbeam <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fcrossbeam&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9C%2BX5DnKLpp%2F8aTGrmKNB73Jf5JanlL4OhuC0YKgw9s%3D&amp;reserved=0> (0.8.1)
> * rocket <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frocket&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Jh93g%2BiXxoeKlTNzhaOKvs3bsBfIJO3DJeetBI3nBV0%3D&amp;reserved=0> (0.5)
> * polars <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fpolars&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Pdzno7bF3oqviXmv6nxInZemHD1d0SsaxmfdUxJ57T0%3D&amp;reserved=0> (0.14.8)
> * tower <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftower&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AmUGvrzXd8giphnKq0FNwjnc4a4Ki3T3GJL3P8rvEeM%3D&amp;reserved=0> (0.4.8)
> * Tokio <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftokio&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z%2FqBVQ%2Fi0BCmSJiBL7E6y%2F%2BbMVGKYXdo3oCRGOjm5UA%3D&amp;reserved=0> (1.9.0)
> * hyper <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fhyper&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=c%2Fy4eY0BQCXE8XIoSb6UZAVUx4U%2BwcRUKN9jGJs5v3w%3D&amp;reserved=0> (0.14.11)
>
> Crates that arrow depends on
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Fblob%2Fmaster%2Farrow%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DdGZFC5Hf7i362%2FmhfFQUVVPnkDBJzw0zM6AzQ4jgcQ%3D&amp;reserved=0>,
> that DataFusion
> depends on
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-datafusion%2Fblob%2Fmaster%2Fdatafusion%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OXKyW4O6q4hn6ZCHTN2jIvJpI3Iv8JvBBa0zKzBgZag%3D&amp;reserved=0>,
> all share the same pattern of being either 0.X, 1.X when their API is stable, and 2.X when they needed a large change in the API. This contrasts with Apache Arrow's releases where we are now at 5.0 (and we have yet to arrive at a safe design).
>
> > existing users will be well supported in this transition
>
> How so? imo people either PR to the arrow/arrow2 code base or they won't.
> This is largely independent of where the development of either arrow2 or arrow happens; people google the crate, click on the repository link and file an issue or field a PR.
>
> > In general, I think the longer that development proceeds in separate
> repos the harder it will be to eventually merge the two in a way that supports existing users.
>
> How so? I may be mistaken, but API design is unrelated to on which repo the development happens: it is primarily driven by who is designing it and from where or who they are inspired by. Both arrow and parquet's crate design are inspired by the C++ implementation and have gradually been migrated to "idiomatic" Rust, as "idiomatic" is becoming more well defined in Rust.
> Arrow2 is inspired by the current crate and the pains of using it in DataFusion. Datafuse, a fork of datafusion, recently migrated to arrow2
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdatafuselabs%2Fdatafuse%2Fpull%2F1239&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0W9AeIxXcAvCrXkOE%2F1h0o%2BWam15PHEP7Pf7U1L84As%3D&amp;reserved=0>: +1,947 −3,484, which shows that the crate is capturing important patterns from the arrow crate and exposing ones that are useful / result in less code for the same or higher performance.
>
> On the opposite side, merging the development of crates under the same repo leads to: more triagging of PRs; more work for releases and changelogging; tagging based on crates; multiple READMEs in subpaths of the repo, curation of the CI to accommodate this, a workspace with many crates each with its own set of dependencies, increasing compilation and development; mixed commit logs, difficulties in reverts and cherry-picks; more difficult to find stuff in the repo. See e.g. how tokio-rs does it:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nZUiKNr1DmeTNJLqiZgKX5P7nb6jt0OuZlufMywmDBE%3D&amp;reserved=0, even for small crates like bytes <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs%2Fbytes&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ltf66TZejbomCtlqvhmDswFfdrunChIz5rDTeZzwyRU%3D&amp;reserved=0>.
>
> Best,
> Jorge
>
> On Tue, Aug 3, 2021 at 3:13 PM paddy horan <pa...@hotmail.com> wrote:
>
> > Hi Jorge,
> >
> > What do you think about moving Arrow2 into the main Arrow repo where
> > it is only enabled via an "experimental" feature flag?  This would
> > allow development of Arrow2 to proceed in the main repo but also this
> > would be a clear signal that Arrow2 is <1.0.  When we feel ready (i.e.
> > Arrow2 is 1.0) we can release it in the next main release with Arrow2
> > being the default and move the existing implementation behind a "legacy" feature flag.
> >
> > Here is why I think this might work well:
> >  - People contributing to the Arrow project will naturally contribute
> > to Arrow2.  At the moment, some people will still contribute to Arrow
> > instead of Arrow2 just by virtue of it being the "official" implementation.
> > However, if both are in one repo people will want to contribute to the
> > "future", i.e. Arrow2.
> >  - the experimental flag will be a clear signal to the existing Arrow
> > community that Arrow2 is the future but that it is <1.0
> >  - existing users will be well supported in this transition
> >  - In general, I think the longer that development proceeds in
> > separate repos the harder it will be to eventually merge the two in a
> > way that supports existing users.
> >
> > Do you think would work?
> >
> > Paddy
> >
> > -----Original Message-----
> > From: Jorge Cardoso Leitão <jo...@gmail.com>
> > Sent: Monday, August 2, 2021 1:59 PM
> > To: dev@arrow.apache.org
> > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> >
> > Hi,
> >
> > Sorry for the delay.
> >
> > If there is a path towards an official release under a <1.0.0
> > versioning schema aligned with the rest of the Rust ecosystem and in
> > line with the stability of the API, then IMO we should move all
> > development to within Apache experimental asap (I can handle this and
> > the likely IP clearance round). If we require a release >=1.X.Y to it
> > and/or a schedule, then I prefer to keep expectations aligned and postpone any movement.
> >
> > Under the move situation, I was thinking in something as follows:
> >
> > * gradually stop maintaining "arrow" in crates, offering a maintenance
> > window over which we release patches (*)
> > * work towards achieving feature parity on arrow2/parquet2 on the
> > experimental repos.
> > * keep releasing arrow2/parquet2 under a 0.X model during the step
> > above
> > (**)
> > * migrate to arrow-rs and archive experimentals (***)
> > * break arrow2 in smaller crates so that we can version the APIs at a
> > different cadence
> > * once a crate reaches some stability (this is always opinionated, but
> > it is fine), we bump it to 1.0 and announce a maintenance plan ala
> > tokio <
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio
> > .rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a7
> > 77b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225
> > 764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> > LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oHPQI8MeSumgLTEsawCkRN
> > 5hANft%2BkbLTEmLZ3pIDiU%3D&amp;reserved=0
> > >.
> >
> > (*) e.g. "we will continue to patch the arrow crate up to at least 6
> > months starting after the first release of arrow2 that supports
> > a) nested parquet read and write
> > b) union array (including IPC integration tests)
> > c) map array (including IPC integration tests)"
> >
> > (**) officially or un-officially (I would suggest officially so that
> > we can acknowledge everyone's work on it, but no strong feelings)
> >
> > (***) something like:
> > 1. place arrow2 on top of a clear arrow repo so that the full
> > contribution history up to that point preserved 2. make arrow-rs the
> > home of arrow2 (i.e. we start releasing arrow2 from
> > arrow-rs) and archive the experimental repos; create arrow-rs-parquet
> > or something for parquet2.
> >
> > In summary, the core pain point for me is the current versioning of
> > arrow, which I feel is incompatible with my goals for arrow2 and the
> > ecosystem I envision it supporting :)
> >
> > Best,
> > Jorge
> >
> > On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > > I think it would also be fine to push "beta" arrow2 crates out of a
> > > repo under apache/ so long as they are not marked on crates.io as
> > > being Apache-official releases. There's a possible slippery slope
> > > there, but as long as we are on a path to formalizing the releases I
> > think it is okay.
> > >
> > > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com>
> > wrote:
> > >
> > > > Jorge -- do you feel like we have a resolution on what to do with
> > > > arrow2
> > > in
> > > > the near term?
> > > >
> > > > The current state of affairs seems to me that arrow2 is released
> > > > from
> > > >
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > b.com%2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a
> > 777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C63763622
> > 5764541982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jNo5puUzWEOmWj3wIs8CN
> > p44WmsoaRQGfsRdWgrftwE%3D&amp;reserved=0
> > to crates.io (which is fine).
> > > > Are
> > > > you happy with keeping development in the jorgecarleitao repo
> > > > where you will retain maximal control and flexibility until it is
> > > > ready to start integrating?
> > > >
> > > > Or would you prefer to put it into one of the apache repos and
> > > > subject
> > > its
> > > > development and release to the normal Arrow governance model
> > > > (tarball, vote, etc)?
> > > >
> > > > Since you are the primary author/architect I think you should have
> > > > a substantial say at this stage.
> > > >
> > > > Andrew
> > > >
> > > >
> > > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <al...@influxdata.com>
> > > wrote:
> > > >
> > > > > I would be happy with this approach. Thank you for the
> > > > > suggestion
> > > > >
> > > > > This hybrid approach of both arrow and arrow2 in the same repo
> > > > > seems better to me than separate repos.
> > > > >
> > > > > What I really care about is ensuring we don't have two
> > > > > crates/APIs indefinitely -- as long as we are continually making
> > > > > progress towards unification that is what is important to me.
> > > > >
> > > > > Andrew
> > > > >
> > > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove
> > > > > <an...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Apologies for being late to this discussion.
> > > > >>
> > > > >> There is a hybrid option to consider here where we add the
> > > > >> arrow2 code into the arrow crate as a separate module, so we
> > > > >> release one crate
> > > containing
> > > > >> the "old" API (which we can mark as deprecated) as well as the
> > > > >> new
> > > API.
> > > > >> Java did a similar thing a long time ago with "java.io" versus
> > > > "java.nio"
> > > > >> (new IO).
> > > > >>
> > > > >> I agree that the versioning wouldn't be ideal, but this seems
> > > > >> like it might be a pragmatic compromise?
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Andy.
> > > > >>
> > > > >>
> > > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb
> > > > >> <al...@influxdata.com>
> > > > wrote:
> > > > >>
> > > > >> > What I meant is that when you decide arrow2 is suitable for
> > > > >> > release
> > > to
> > > > >> > existing arrow users, I stand ready to help you incorporate
> > > > >> > it into
> > > > >> arrow.
> > > > >> >
> > > > >> > All the feedback I have heard so far from the rest of the
> > > > >> > community
> > > is
> > > > >> that
> > > > >> > we are ready. One might even say we are anxious to do so :)
> > > > >> >
> > > > >> > Andrew
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >

RE: [Discuss] [Rust] Arrow2/parquet2 going foward

Posted by paddy horan <pa...@hotmail.com>.
Hi Jorge,

I see value in consolidating development in a single repo and releasing under the existing arrow crate.  Regarding versioning, I think once we follow semantic versioning we are fine.  I don't think it's worth migrating to a different repo and crate to comply with the de-facto standard you mention.

Just one person's opinion though,
Paddy


-----Original Message-----
From: Jorge Cardoso Leitão <jo...@gmail.com> 
Sent: Tuesday, August 3, 2021 5:23 PM
To: dev@arrow.apache.org
Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Hi Paddy,

> What do you think about moving Arrow2 into the main Arrow repo where 
> it
is only enabled via an "experimental" feature flag?

AFAIK this is already possible:
* add `arrow2 = { version = "0.2.0", optional = true }` to Cargo.toml
* add `#[cfg(feature = "arrow2")]\npub mod arrow2;\n` to lib.rs

We do this kind of thing to expose APIs from non-arrow crates such as parts of the parquet-format-rs crate, and is generally the way to go when a crate wants to expose a third-party API.

I would not recommend doing this, though: by exposing arrow2 from arrow, we double the compilation time and binary size of all dependencies that activate the flag. Furthermore, there are users of arrow2 that do not need the arrow crate, which this model would not support.

AFAIK where development happens is unrelated to this aspect, Rust enables this by design.

> but also this would be a clear signal that Arrow2 is <1.0.
> the experimental flag will be a clear signal to the existing Arrow
community that Arrow2 is the future but that it is <1.0

arrow2 is already <1.0 <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bJEw92M9Lz8cxJZ0o3vc0ezpou%2BuQx1S0MYeODKCKmE%3D&amp;reserved=0>. My argument is that the arrow/arrow-flight/parquet are not versioned according to the Rust community standards: It is a de facto practice in Rust to delay major releases until the API is stable. Tokio's blog post about their 1.0 <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=En8p4k7Etyc%2BnQ3mJC4woQD%2Fkt7Uhmhw%2Bzf8scHhdgQ%3D&amp;reserved=0> (i.e. "[...] we commit to holding back on a Tokio 2.0 release for at least 3 years."). 10 most downloaded
crates:

* https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sBxp1XYBLl6OIV57nM%2FGsZO0AmbgyBeRaoPANEvdZGE%3D&amp;reserved=0 (0.8.4)
* https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fsyn&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oeQliVwSgrvgART7r49XeiM%2F72TYa7hX8M3QyVDrqsk%3D&amp;reserved=0 (1.0.74)
* https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Flibc&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OULOu9vhaWEgnavRqedebM7ceZRsVnaF7YjYuq1MJ3Y%3D&amp;reserved=0 (0.2.98)
* https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand_core&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mx6X86bNRis6UykbWR%2FWTGEgAjq8h6JylmOSAQlfsh0%3D&amp;reserved=0 (0.6.3)
* quote (1.0.9)
* unicode-xid (0.2.2)
* proc-macro2 (1.0.28)
* cfg-if (1.0.0)
* https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fserde&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p%2FNgTB0839C1%2F1Zn4GeEnRtvr0hiFhOuBJ5tF76aW5E%3D&amp;reserved=0 (1.0.126)
* bitflags (1.2.1)

These are small crates with a small scope, but even larger projects share the same pattern:

* crossbeam <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fcrossbeam&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9C%2BX5DnKLpp%2F8aTGrmKNB73Jf5JanlL4OhuC0YKgw9s%3D&amp;reserved=0> (0.8.1)
* rocket <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frocket&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Jh93g%2BiXxoeKlTNzhaOKvs3bsBfIJO3DJeetBI3nBV0%3D&amp;reserved=0> (0.5)
* polars <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fpolars&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Pdzno7bF3oqviXmv6nxInZemHD1d0SsaxmfdUxJ57T0%3D&amp;reserved=0> (0.14.8)
* tower <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftower&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AmUGvrzXd8giphnKq0FNwjnc4a4Ki3T3GJL3P8rvEeM%3D&amp;reserved=0> (0.4.8)
* Tokio <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftokio&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z%2FqBVQ%2Fi0BCmSJiBL7E6y%2F%2BbMVGKYXdo3oCRGOjm5UA%3D&amp;reserved=0> (1.9.0)
* hyper <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fhyper&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=c%2Fy4eY0BQCXE8XIoSb6UZAVUx4U%2BwcRUKN9jGJs5v3w%3D&amp;reserved=0> (0.14.11)

Crates that arrow depends on
<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Fblob%2Fmaster%2Farrow%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DdGZFC5Hf7i362%2FmhfFQUVVPnkDBJzw0zM6AzQ4jgcQ%3D&amp;reserved=0>,
that DataFusion
depends on
<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-datafusion%2Fblob%2Fmaster%2Fdatafusion%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OXKyW4O6q4hn6ZCHTN2jIvJpI3Iv8JvBBa0zKzBgZag%3D&amp;reserved=0>,
all share the same pattern of being either 0.X, 1.X when their API is stable, and 2.X when they needed a large change in the API. This contrasts with Apache Arrow's releases where we are now at 5.0 (and we have yet to arrive at a safe design).

> existing users will be well supported in this transition

How so? imo people either PR to the arrow/arrow2 code base or they won't.
This is largely independent of where the development of either arrow2 or arrow happens; people google the crate, click on the repository link and file an issue or field a PR.

> In general, I think the longer that development proceeds in separate
repos the harder it will be to eventually merge the two in a way that supports existing users.

How so? I may be mistaken, but API design is unrelated to on which repo the development happens: it is primarily driven by who is designing it and from where or who they are inspired by. Both arrow and parquet's crate design are inspired by the C++ implementation and have gradually been migrated to "idiomatic" Rust, as "idiomatic" is becoming more well defined in Rust.
Arrow2 is inspired by the current crate and the pains of using it in DataFusion. Datafuse, a fork of datafusion, recently migrated to arrow2
<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdatafuselabs%2Fdatafuse%2Fpull%2F1239&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0W9AeIxXcAvCrXkOE%2F1h0o%2BWam15PHEP7Pf7U1L84As%3D&amp;reserved=0>: +1,947 −3,484, which shows that the crate is capturing important patterns from the arrow crate and exposing ones that are useful / result in less code for the same or higher performance.

On the opposite side, merging the development of crates under the same repo leads to: more triagging of PRs; more work for releases and changelogging; tagging based on crates; multiple READMEs in subpaths of the repo, curation of the CI to accommodate this, a workspace with many crates each with its own set of dependencies, increasing compilation and development; mixed commit logs, difficulties in reverts and cherry-picks; more difficult to find stuff in the repo. See e.g. how tokio-rs does it:
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nZUiKNr1DmeTNJLqiZgKX5P7nb6jt0OuZlufMywmDBE%3D&amp;reserved=0, even for small crates like bytes <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs%2Fbytes&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ltf66TZejbomCtlqvhmDswFfdrunChIz5rDTeZzwyRU%3D&amp;reserved=0>.

Best,
Jorge

On Tue, Aug 3, 2021 at 3:13 PM paddy horan <pa...@hotmail.com> wrote:

> Hi Jorge,
>
> What do you think about moving Arrow2 into the main Arrow repo where 
> it is only enabled via an "experimental" feature flag?  This would 
> allow development of Arrow2 to proceed in the main repo but also this 
> would be a clear signal that Arrow2 is <1.0.  When we feel ready (i.e. 
> Arrow2 is 1.0) we can release it in the next main release with Arrow2 
> being the default and move the existing implementation behind a "legacy" feature flag.
>
> Here is why I think this might work well:
>  - People contributing to the Arrow project will naturally contribute 
> to Arrow2.  At the moment, some people will still contribute to Arrow 
> instead of Arrow2 just by virtue of it being the "official" implementation.
> However, if both are in one repo people will want to contribute to the 
> "future", i.e. Arrow2.
>  - the experimental flag will be a clear signal to the existing Arrow 
> community that Arrow2 is the future but that it is <1.0
>  - existing users will be well supported in this transition
>  - In general, I think the longer that development proceeds in 
> separate repos the harder it will be to eventually merge the two in a 
> way that supports existing users.
>
> Do you think would work?
>
> Paddy
>
> -----Original Message-----
> From: Jorge Cardoso Leitão <jo...@gmail.com>
> Sent: Monday, August 2, 2021 1:59 PM
> To: dev@arrow.apache.org
> Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
>
> Hi,
>
> Sorry for the delay.
>
> If there is a path towards an official release under a <1.0.0 
> versioning schema aligned with the rest of the Rust ecosystem and in 
> line with the stability of the API, then IMO we should move all 
> development to within Apache experimental asap (I can handle this and 
> the likely IP clearance round). If we require a release >=1.X.Y to it 
> and/or a schedule, then I prefer to keep expectations aligned and postpone any movement.
>
> Under the move situation, I was thinking in something as follows:
>
> * gradually stop maintaining "arrow" in crates, offering a maintenance 
> window over which we release patches (*)
> * work towards achieving feature parity on arrow2/parquet2 on the 
> experimental repos.
> * keep releasing arrow2/parquet2 under a 0.X model during the step 
> above
> (**)
> * migrate to arrow-rs and archive experimentals (***)
> * break arrow2 in smaller crates so that we can version the APIs at a 
> different cadence
> * once a crate reaches some stability (this is always opinionated, but 
> it is fine), we bump it to 1.0 and announce a maintenance plan ala 
> tokio <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio
> .rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a7
> 77b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225
> 764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oHPQI8MeSumgLTEsawCkRN
> 5hANft%2BkbLTEmLZ3pIDiU%3D&amp;reserved=0
> >.
>
> (*) e.g. "we will continue to patch the arrow crate up to at least 6 
> months starting after the first release of arrow2 that supports
> a) nested parquet read and write
> b) union array (including IPC integration tests)
> c) map array (including IPC integration tests)"
>
> (**) officially or un-officially (I would suggest officially so that 
> we can acknowledge everyone's work on it, but no strong feelings)
>
> (***) something like:
> 1. place arrow2 on top of a clear arrow repo so that the full 
> contribution history up to that point preserved 2. make arrow-rs the 
> home of arrow2 (i.e. we start releasing arrow2 from
> arrow-rs) and archive the experimental repos; create arrow-rs-parquet 
> or something for parquet2.
>
> In summary, the core pain point for me is the current versioning of 
> arrow, which I feel is incompatible with my goals for arrow2 and the 
> ecosystem I envision it supporting :)
>
> Best,
> Jorge
>
> On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <we...@gmail.com> wrote:
>
> > I think it would also be fine to push "beta" arrow2 crates out of a 
> > repo under apache/ so long as they are not marked on crates.io as 
> > being Apache-official releases. There's a possible slippery slope 
> > there, but as long as we are on a path to formalizing the releases I
> think it is okay.
> >
> > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> >
> > > Jorge -- do you feel like we have a resolution on what to do with
> > > arrow2
> > in
> > > the near term?
> > >
> > > The current state of affairs seems to me that arrow2 is released 
> > > from
> > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> b.com%2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a
> 777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C63763622
> 5764541982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jNo5puUzWEOmWj3wIs8CN
> p44WmsoaRQGfsRdWgrftwE%3D&amp;reserved=0
> to crates.io (which is fine).
> > > Are
> > > you happy with keeping development in the jorgecarleitao repo 
> > > where you will retain maximal control and flexibility until it is 
> > > ready to start integrating?
> > >
> > > Or would you prefer to put it into one of the apache repos and 
> > > subject
> > its
> > > development and release to the normal Arrow governance model 
> > > (tarball, vote, etc)?
> > >
> > > Since you are the primary author/architect I think you should have 
> > > a substantial say at this stage.
> > >
> > > Andrew
> > >
> > >
> > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <al...@influxdata.com>
> > wrote:
> > >
> > > > I would be happy with this approach. Thank you for the 
> > > > suggestion
> > > >
> > > > This hybrid approach of both arrow and arrow2 in the same repo 
> > > > seems better to me than separate repos.
> > > >
> > > > What I really care about is ensuring we don't have two 
> > > > crates/APIs indefinitely -- as long as we are continually making 
> > > > progress towards unification that is what is important to me.
> > > >
> > > > Andrew
> > > >
> > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove 
> > > > <an...@gmail.com>
> > > wrote:
> > > >
> > > >> Apologies for being late to this discussion.
> > > >>
> > > >> There is a hybrid option to consider here where we add the 
> > > >> arrow2 code into the arrow crate as a separate module, so we 
> > > >> release one crate
> > containing
> > > >> the "old" API (which we can mark as deprecated) as well as the 
> > > >> new
> > API.
> > > >> Java did a similar thing a long time ago with "java.io" versus
> > > "java.nio"
> > > >> (new IO).
> > > >>
> > > >> I agree that the versioning wouldn't be ideal, but this seems 
> > > >> like it might be a pragmatic compromise?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Andy.
> > > >>
> > > >>
> > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb 
> > > >> <al...@influxdata.com>
> > > wrote:
> > > >>
> > > >> > What I meant is that when you decide arrow2 is suitable for 
> > > >> > release
> > to
> > > >> > existing arrow users, I stand ready to help you incorporate 
> > > >> > it into
> > > >> arrow.
> > > >> >
> > > >> > All the feedback I have heard so far from the rest of the 
> > > >> > community
> > is
> > > >> that
> > > >> > we are ready. One might even say we are anxious to do so :)
> > > >> >
> > > >> > Andrew
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Posted by Jorge Cardoso Leitão <jo...@gmail.com>.
Hi Paddy,

> What do you think about moving Arrow2 into the main Arrow repo where it
is only enabled via an "experimental" feature flag?

AFAIK this is already possible:
* add `arrow2 = { version = "0.2.0", optional = true }` to Cargo.toml
* add `#[cfg(feature = "arrow2")]\npub mod arrow2;\n` to lib.rs

We do this kind of thing to expose APIs from non-arrow crates such as parts
of the parquet-format-rs crate, and is generally the way to go when a crate
wants to expose a third-party API.

I would not recommend doing this, though: by exposing arrow2 from arrow, we
double the compilation time and binary size of all dependencies that
activate the flag. Furthermore, there are users of arrow2 that do not need
the arrow crate, which this model would not support.

AFAIK where development happens is unrelated to this aspect, Rust enables
this by design.

> but also this would be a clear signal that Arrow2 is <1.0.
> the experimental flag will be a clear signal to the existing Arrow
community that Arrow2 is the future but that it is <1.0

arrow2 is already <1.0 <https://crates.io/crates/arrow2>. My argument is
that the arrow/arrow-flight/parquet are not versioned according to the Rust
community standards: It is a de facto practice in Rust to delay major
releases until the API is stable. Tokio's blog post about their 1.0
<https://tokio.rs/blog/2020-12-tokio-1-0> (i.e. "[...] we commit to holding
back on a Tokio 2.0 release for at least 3 years."). 10 most downloaded
crates:

* https://crates.io/crates/rand (0.8.4)
* https://crates.io/crates/syn (1.0.74)
* https://crates.io/crates/libc (0.2.98)
* https://crates.io/crates/rand_core (0.6.3)
* quote (1.0.9)
* unicode-xid (0.2.2)
* proc-macro2 (1.0.28)
* cfg-if (1.0.0)
* https://crates.io/crates/serde (1.0.126)
* bitflags (1.2.1)

These are small crates with a small scope, but even larger projects share
the same pattern:

* crossbeam <https://crates.io/crates/crossbeam> (0.8.1)
* rocket <https://crates.io/crates/rocket> (0.5)
* polars <https://crates.io/crates/polars> (0.14.8)
* tower <https://crates.io/crates/tower> (0.4.8)
* Tokio <https://crates.io/crates/tokio> (1.9.0)
* hyper <https://crates.io/crates/hyper> (0.14.11)

Crates that arrow depends on
<https://github.com/apache/arrow-rs/blob/master/arrow/Cargo.toml>,
that DataFusion
depends on
<https://github.com/apache/arrow-datafusion/blob/master/datafusion/Cargo.toml>,
all share the same pattern of being either 0.X, 1.X when their API is
stable, and 2.X when they needed a large change in the API. This contrasts
with Apache Arrow's releases where we are now at 5.0 (and we have yet to
arrive at a safe design).

> existing users will be well supported in this transition

How so? imo people either PR to the arrow/arrow2 code base or they won't.
This is largely independent of where the development of either arrow2 or
arrow happens; people google the crate, click on the repository link and
file an issue or field a PR.

> In general, I think the longer that development proceeds in separate
repos the harder it will be to eventually merge the two in a way that
supports existing users.

How so? I may be mistaken, but API design is unrelated to on which repo the
development happens: it is primarily driven by who is designing it and from
where or who they are inspired by. Both arrow and parquet's crate design
are inspired by the C++ implementation and have gradually been migrated to
"idiomatic" Rust, as "idiomatic" is becoming more well defined in Rust.
Arrow2 is inspired by the current crate and the pains of using it in
DataFusion. Datafuse, a fork of datafusion, recently migrated to arrow2
<https://github.com/datafuselabs/datafuse/pull/1239>: +1,947 −3,484, which
shows that the crate is capturing important patterns from the arrow crate
and exposing ones that are useful / result in less code for the same or
higher performance.

On the opposite side, merging the development of crates under the same repo
leads to: more triagging of PRs; more work for releases and changelogging;
tagging based on crates; multiple READMEs in subpaths of the repo, curation
of the CI to accommodate this, a workspace with many crates each with its
own set of dependencies, increasing compilation and development; mixed
commit logs, difficulties in reverts and cherry-picks; more difficult to
find stuff in the repo. See e.g. how tokio-rs does it:
https://github.com/tokio-rs, even for small crates like bytes
<https://github.com/tokio-rs/bytes>.

Best,
Jorge

On Tue, Aug 3, 2021 at 3:13 PM paddy horan <pa...@hotmail.com> wrote:

> Hi Jorge,
>
> What do you think about moving Arrow2 into the main Arrow repo where it is
> only enabled via an "experimental" feature flag?  This would allow
> development of Arrow2 to proceed in the main repo but also this would be a
> clear signal that Arrow2 is <1.0.  When we feel ready (i.e. Arrow2 is 1.0)
> we can release it in the next main release with Arrow2 being the default
> and move the existing implementation behind a "legacy" feature flag.
>
> Here is why I think this might work well:
>  - People contributing to the Arrow project will naturally contribute to
> Arrow2.  At the moment, some people will still contribute to Arrow instead
> of Arrow2 just by virtue of it being the "official" implementation.
> However, if both are in one repo people will want to contribute to the
> "future", i.e. Arrow2.
>  - the experimental flag will be a clear signal to the existing Arrow
> community that Arrow2 is the future but that it is <1.0
>  - existing users will be well supported in this transition
>  - In general, I think the longer that development proceeds in separate
> repos the harder it will be to eventually merge the two in a way that
> supports existing users.
>
> Do you think would work?
>
> Paddy
>
> -----Original Message-----
> From: Jorge Cardoso Leitão <jo...@gmail.com>
> Sent: Monday, August 2, 2021 1:59 PM
> To: dev@arrow.apache.org
> Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
>
> Hi,
>
> Sorry for the delay.
>
> If there is a path towards an official release under a <1.0.0 versioning
> schema aligned with the rest of the Rust ecosystem and in line with the
> stability of the API, then IMO we should move all development to within
> Apache experimental asap (I can handle this and the likely IP clearance
> round). If we require a release >=1.X.Y to it and/or a schedule, then I
> prefer to keep expectations aligned and postpone any movement.
>
> Under the move situation, I was thinking in something as follows:
>
> * gradually stop maintaining "arrow" in crates, offering a maintenance
> window over which we release patches (*)
> * work towards achieving feature parity on arrow2/parquet2 on the
> experimental repos.
> * keep releasing arrow2/parquet2 under a 0.X model during the step above
> (**)
> * migrate to arrow-rs and archive experimentals (***)
> * break arrow2 in smaller crates so that we can version the APIs at a
> different cadence
> * once a crate reaches some stability (this is always opinionated, but it
> is fine), we bump it to 1.0 and announce a maintenance plan ala tokio <
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lpj8KTpf3c3t0zxo28dSqtuJ82xfMtPssmxzNkrj%2BBQ%3D&amp;reserved=0
> >.
>
> (*) e.g. "we will continue to patch the arrow crate up to at least 6
> months starting after the first release of arrow2 that supports
> a) nested parquet read and write
> b) union array (including IPC integration tests)
> c) map array (including IPC integration tests)"
>
> (**) officially or un-officially (I would suggest officially so that we
> can acknowledge everyone's work on it, but no strong feelings)
>
> (***) something like:
> 1. place arrow2 on top of a clear arrow repo so that the full contribution
> history up to that point preserved 2. make arrow-rs the home of arrow2
> (i.e. we start releasing arrow2 from
> arrow-rs) and archive the experimental repos; create arrow-rs-parquet or
> something for parquet2.
>
> In summary, the core pain point for me is the current versioning of arrow,
> which I feel is incompatible with my goals for arrow2 and the ecosystem I
> envision it supporting :)
>
> Best,
> Jorge
>
> On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <we...@gmail.com> wrote:
>
> > I think it would also be fine to push "beta" arrow2 crates out of a
> > repo under apache/ so long as they are not marked on crates.io as
> > being Apache-official releases. There's a possible slippery slope
> > there, but as long as we are on a path to formalizing the releases I
> think it is okay.
> >
> > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> >
> > > Jorge -- do you feel like we have a resolution on what to do with
> > > arrow2
> > in
> > > the near term?
> > >
> > > The current state of affairs seems to me that arrow2 is released
> > > from
> > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=W1TaT%2BFVGrGL1Oay9QclLozhkfNS78jPdrkZFIFRtjA%3D&amp;reserved=0
> to crates.io (which is fine).
> > > Are
> > > you happy with keeping development in the jorgecarleitao repo where
> > > you will retain maximal control and flexibility until it is ready to
> > > start integrating?
> > >
> > > Or would you prefer to put it into one of the apache repos and
> > > subject
> > its
> > > development and release to the normal Arrow governance model
> > > (tarball, vote, etc)?
> > >
> > > Since you are the primary author/architect I think you should have a
> > > substantial say at this stage.
> > >
> > > Andrew
> > >
> > >
> > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <al...@influxdata.com>
> > wrote:
> > >
> > > > I would be happy with this approach. Thank you for the suggestion
> > > >
> > > > This hybrid approach of both arrow and arrow2 in the same repo
> > > > seems better to me than separate repos.
> > > >
> > > > What I really care about is ensuring we don't have two crates/APIs
> > > > indefinitely -- as long as we are continually making progress
> > > > towards unification that is what is important to me.
> > > >
> > > > Andrew
> > > >
> > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove <an...@gmail.com>
> > > wrote:
> > > >
> > > >> Apologies for being late to this discussion.
> > > >>
> > > >> There is a hybrid option to consider here where we add the arrow2
> > > >> code into the arrow crate as a separate module, so we release one
> > > >> crate
> > containing
> > > >> the "old" API (which we can mark as deprecated) as well as the
> > > >> new
> > API.
> > > >> Java did a similar thing a long time ago with "java.io" versus
> > > "java.nio"
> > > >> (new IO).
> > > >>
> > > >> I agree that the versioning wouldn't be ideal, but this seems
> > > >> like it might be a pragmatic compromise?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Andy.
> > > >>
> > > >>
> > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb
> > > >> <al...@influxdata.com>
> > > wrote:
> > > >>
> > > >> > What I meant is that when you decide arrow2 is suitable for
> > > >> > release
> > to
> > > >> > existing arrow users, I stand ready to help you incorporate it
> > > >> > into
> > > >> arrow.
> > > >> >
> > > >> > All the feedback I have heard so far from the rest of the
> > > >> > community
> > is
> > > >> that
> > > >> > we are ready. One might even say we are anxious to do so :)
> > > >> >
> > > >> > Andrew
> > > >> >
> > > >>
> > > >
> > >
> >
>

RE: [Discuss] [Rust] Arrow2/parquet2 going foward

Posted by paddy horan <pa...@hotmail.com>.
Hi Jorge,

What do you think about moving Arrow2 into the main Arrow repo where it is only enabled via an "experimental" feature flag?  This would allow development of Arrow2 to proceed in the main repo but also this would be a clear signal that Arrow2 is <1.0.  When we feel ready (i.e. Arrow2 is 1.0) we can release it in the next main release with Arrow2 being the default and move the existing implementation behind a "legacy" feature flag.

Here is why I think this might work well:
 - People contributing to the Arrow project will naturally contribute to Arrow2.  At the moment, some people will still contribute to Arrow instead of Arrow2 just by virtue of it being the "official" implementation.  However, if both are in one repo people will want to contribute to the "future", i.e. Arrow2.
 - the experimental flag will be a clear signal to the existing Arrow community that Arrow2 is the future but that it is <1.0
 - existing users will be well supported in this transition
 - In general, I think the longer that development proceeds in separate repos the harder it will be to eventually merge the two in a way that supports existing users. 

Do you think would work?

Paddy

-----Original Message-----
From: Jorge Cardoso Leitão <jo...@gmail.com> 
Sent: Monday, August 2, 2021 1:59 PM
To: dev@arrow.apache.org
Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward

Hi,

Sorry for the delay.

If there is a path towards an official release under a <1.0.0 versioning schema aligned with the rest of the Rust ecosystem and in line with the stability of the API, then IMO we should move all development to within Apache experimental asap (I can handle this and the likely IP clearance round). If we require a release >=1.X.Y to it and/or a schedule, then I prefer to keep expectations aligned and postpone any movement.

Under the move situation, I was thinking in something as follows:

* gradually stop maintaining "arrow" in crates, offering a maintenance window over which we release patches (*)
* work towards achieving feature parity on arrow2/parquet2 on the experimental repos.
* keep releasing arrow2/parquet2 under a 0.X model during the step above
(**)
* migrate to arrow-rs and archive experimentals (***)
* break arrow2 in smaller crates so that we can version the APIs at a different cadence
* once a crate reaches some stability (this is always opinionated, but it is fine), we bump it to 1.0 and announce a maintenance plan ala tokio <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lpj8KTpf3c3t0zxo28dSqtuJ82xfMtPssmxzNkrj%2BBQ%3D&amp;reserved=0>.

(*) e.g. "we will continue to patch the arrow crate up to at least 6 months starting after the first release of arrow2 that supports
a) nested parquet read and write
b) union array (including IPC integration tests)
c) map array (including IPC integration tests)"

(**) officially or un-officially (I would suggest officially so that we can acknowledge everyone's work on it, but no strong feelings)

(***) something like:
1. place arrow2 on top of a clear arrow repo so that the full contribution history up to that point preserved 2. make arrow-rs the home of arrow2 (i.e. we start releasing arrow2 from
arrow-rs) and archive the experimental repos; create arrow-rs-parquet or something for parquet2.

In summary, the core pain point for me is the current versioning of arrow, which I feel is incompatible with my goals for arrow2 and the ecosystem I envision it supporting :)

Best,
Jorge

On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <we...@gmail.com> wrote:

> I think it would also be fine to push "beta" arrow2 crates out of a 
> repo under apache/ so long as they are not marked on crates.io as 
> being Apache-official releases. There's a possible slippery slope 
> there, but as long as we are on a path to formalizing the releases I think it is okay.
>
> On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com> wrote:
>
> > Jorge -- do you feel like we have a resolution on what to do with 
> > arrow2
> in
> > the near term?
> >
> > The current state of affairs seems to me that arrow2 is released 
> > from
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7C1b3176da8b6b45407c4208d955df3394%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637635239391364824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=W1TaT%2BFVGrGL1Oay9QclLozhkfNS78jPdrkZFIFRtjA%3D&amp;reserved=0 to crates.io (which is fine).
> > Are
> > you happy with keeping development in the jorgecarleitao repo where 
> > you will retain maximal control and flexibility until it is ready to 
> > start integrating?
> >
> > Or would you prefer to put it into one of the apache repos and 
> > subject
> its
> > development and release to the normal Arrow governance model 
> > (tarball, vote, etc)?
> >
> > Since you are the primary author/architect I think you should have a 
> > substantial say at this stage.
> >
> > Andrew
> >
> >
> > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> >
> > > I would be happy with this approach. Thank you for the suggestion
> > >
> > > This hybrid approach of both arrow and arrow2 in the same repo 
> > > seems better to me than separate repos.
> > >
> > > What I really care about is ensuring we don't have two crates/APIs 
> > > indefinitely -- as long as we are continually making progress 
> > > towards unification that is what is important to me.
> > >
> > > Andrew
> > >
> > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove <an...@gmail.com>
> > wrote:
> > >
> > >> Apologies for being late to this discussion.
> > >>
> > >> There is a hybrid option to consider here where we add the arrow2 
> > >> code into the arrow crate as a separate module, so we release one 
> > >> crate
> containing
> > >> the "old" API (which we can mark as deprecated) as well as the 
> > >> new
> API.
> > >> Java did a similar thing a long time ago with "java.io" versus
> > "java.nio"
> > >> (new IO).
> > >>
> > >> I agree that the versioning wouldn't be ideal, but this seems 
> > >> like it might be a pragmatic compromise?
> > >>
> > >> Thanks,
> > >>
> > >> Andy.
> > >>
> > >>
> > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb 
> > >> <al...@influxdata.com>
> > wrote:
> > >>
> > >> > What I meant is that when you decide arrow2 is suitable for 
> > >> > release
> to
> > >> > existing arrow users, I stand ready to help you incorporate it 
> > >> > into
> > >> arrow.
> > >> >
> > >> > All the feedback I have heard so far from the rest of the 
> > >> > community
> is
> > >> that
> > >> > we are ready. One might even say we are anxious to do so :)
> > >> >
> > >> > Andrew
> > >> >
> > >>
> > >
> >
>