You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2020/05/04 03:41:41 UTC

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

>
> I'm not saying that I think the work definitely will not be completed,
> but rather that we should put a date on the calendar as the target
> date for 1.0.0 and stick to it. If the work gets done, that's great.

I'm okay, with this as well, I just want to make sure we have the a path
forward to ultimately get two reference implementations.

On Wed, Apr 22, 2020 at 5:50 AM Wes McKinney <we...@gmail.com> wrote:

> hi Micah,
>
> I'm not saying that I think the work definitely will not be completed,
> but rather that we should put a date on the calendar as the target
> date for 1.0.0 and stick to it. If the work gets done, that's great.
>
> 10 to 12 weeks from now would mean releasing 1.0.0 either the week of
> June 29 or July 6. That is about 1 year since we discussed and adopted
> our SemVer policy [1]
>
> > I would propose that if there isn't an implementation in any language we
> > might drop it as part of the specification.  The main feature that I
> think
> > meets this criteria is the Dictionary of Dictionary columns (Is this
> > supported in C++)?
>
> I don't have a strong view on this, but IIUC this is implemented in
> JavaScript and probably not far off in C++.
>
> - Wes
>
> [1]:
> https://lists.apache.org/thread.html/2a630234214e590eb184c24bbf9dac4a8d8f7677d85a75fa49d70ba8%40%3Cdev.arrow.apache.org%3E
>
> On Wed, Apr 22, 2020 at 12:26 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > Hi Wes,
> > I think we might be closer than we think on the Java side to having the
> > functionality listed (I've added comments inline at the end with the
> > features you listed in the original e-mail).
> >
> > My biggest concern is I don't think there is a clear path forward for
> > Sparse Unions.  Getting compatibility for Sparse unions would be more
> > invasive/breaking changes to the java code base.  [1] is the last thread
> on
> > the issue.  I sadly have not had time to get back to this, nor will I
> > probably have time before the next release.
> >
> > I would propose that if there isn't an implementation in any language we
> > might drop it as part of the specification.  The main feature that I
> think
> > meets this criteria is the Dictionary of Dictionary columns (Is this
> > supported in C++)?
> >
> > Thanks,
> > Micah
> >
> >
> > * custom_metadata fields
> >
> > Not sure about this one.
> >
> > > * Extension Types
> >
> > There is an implementation already in Java, probably. needs more work for
> > integration testing.
> >
> > * Large (64-bit offset) variable size types
> >
> > there is an open PR for string/binary types.  LargeList is of more
> > questionable value until Java supports vectors/arrays with more than 2^32
> > elements.
> >
> > * Delta and Replacement Dictionaries
> >
> > There is an implementation already in Java, probably needs more work for
> > specifically for integration testing.
> >
> > > * Unions
> >
> > There is an implementation for dense unions (likely needs more work for
> > integration testing).
> >
> > On Tue, Apr 21, 2020 at 11:26 AM Neal Richardson <
> > neal.p.richardson@gmail.com> wrote:
> >
> > > I'm all for making our next release be 1.0. Everything is about
> tradeoffs,
> > > and while I too would like to see a complete Java implementation, I
> think
> > > the costs of further delaying 1.0 outweigh the benefits of holding it
> > > indefinitely in hopes that there will be enough availability of Java
> > > developers to finish integration testing.
> > >
> > > Neal
> > >
> > > On Tue, Apr 21, 2020 at 10:55 AM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > > hi Bryan -- with the way that things are going, if we were to block
> > > > the 1.0.0 release on completing the Java work, it could be a very
> long
> > > > time to wait (long time = more than 6 months from now). I don't think
> > > > that's acceptable. The Versioning document was formally adopted last
> > > > August and so a year will have soon elapsed since we previously said
> > > > we wanted to have everything integration tested.
> > > >
> > > > With what I'm proposing the primary things that would not be tested
> > > > (if no progress in Java):
> > > >
> > > > * custom_metadata fields
> > > > * Extension Types
> > > > * Large (64-bit offset) variable size types
> > > > * Delta and Replacement Dictionaries
> > > > * Unions
> > > >
> > > > These do not seem like huge sacrifices, or at least not ones that
> > > > compromise the stability of the columnar format. Of course, if some
> of
> > > > them are completed in the next 10-12 weeks, then that's great.
> > > >
> > > > - Wes
> > > >
> > > > On Tue, Apr 21, 2020 at 12:12 PM Bryan Cutler <cu...@gmail.com>
> wrote:
> > > > >
> > > > > I really would like to see a 1.0.0 release with complete
> > > implementations
> > > > > for C++ and Java. From my experience, that interoperability has
> been a
> > > > > major selling point for the project. That being said, my time for
> > > > > contributions has been pretty limited lately and I know that Java
> has
> > > > been
> > > > > lagging, so if the rest of the community would like to push forward
> > > with
> > > > a
> > > > > reduced scope, that is okay with me. I'll still continue to do
> what I
> > > can
> > > > > on Java to fill in the gaps.
> > > > >
> > > > > Bryan
> > > > >
> > > > > On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney <we...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi all -- are there some opinions about this?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney <
> wesmckinn@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > hi folks,
> > > > > > >
> > > > > > > Previously we had discussed a plan for making a 1.0.0 release
> based
> > > > on
> > > > > > > completeness of columnar format integration tests and making
> > > > > > > forward/backward compatibility guarantees as formalized in
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst
> > > > > > >
> > > > > > > In particular, we wanted to demonstrate comprehensive Java/C++
> > > > > > interoperability.
> > > > > > >
> > > > > > > As time has passed we have stalled out a bit on completing
> > > > integration
> > > > > > > tests for the "long tail" of data types and columnar format
> > > features.
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing
> > > > > > >
> > > > > > > As such I wanted to propose a reduction in scope so that we can
> > > make
> > > > a
> > > > > > > 1.0.0 release sooner. The plan would be as follows:
> > > > > > >
> > > > > > > * Endeavor to have integration tests implemented and working
> in at
> > > > > > > least one reference implementation (likely to be the C++
> library).
> > > It
> > > > > > > seems important to verify that what's in Columnar.rst is able
> to be
> > > > > > > unambiguously implemented.
> > > > > > > * Indicate in Versioning.rst or another place in the
> documentation
> > > > the
> > > > > > > list of data types or advanced columnar format features (like
> > > > > > > delta/replacement dictionaries) that are not yet fully
> integration
> > > > > > > tested.
> > > > > > >
> > > > > > > Some of the essential protocol stability details and all of the
> > > most
> > > > > > > commonly used data types have been stable for a long time now,
> > > > > > > particularly after the recent alignment change. The current
> list of
> > > > > > > features that aren't being tested for cross-implementation
> > > > > > > compatibility should not pose risk to downstream users.
> > > > > > >
> > > > > > > Thoughts about this? The 1.0.0 release is an important
> milestone
> > > for
> > > > > > > the project and will help build continued momentum in
> developer and
> > > > > > > user community growth.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Wes
> > > > > >
> > > >
> > >
>