You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2020/04/16 22:30:14 UTC

[DISCUSS] Reducing scope of work for Arrow 1.0.0 release

hi folks,

Previously we had discussed a plan for making a 1.0.0 release based on
completeness of columnar format integration tests and making
forward/backward compatibility guarantees as formalized in

https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst

In particular, we wanted to demonstrate comprehensive Java/C++ interoperability.

As time has passed we have stalled out a bit on completing integration
tests for the "long tail" of data types and columnar format features.

https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing

As such I wanted to propose a reduction in scope so that we can make a
1.0.0 release sooner. The plan would be as follows:

* Endeavor to have integration tests implemented and working in at
least one reference implementation (likely to be the C++ library). It
seems important to verify that what's in Columnar.rst is able to be
unambiguously implemented.
* Indicate in Versioning.rst or another place in the documentation the
list of data types or advanced columnar format features (like
delta/replacement dictionaries) that are not yet fully integration
tested.

Some of the essential protocol stability details and all of the most
commonly used data types have been stable for a long time now,
particularly after the recent alignment change. The current list of
features that aren't being tested for cross-implementation
compatibility should not pose risk to downstream users.

Thoughts about this? The 1.0.0 release is an important milestone for
the project and will help build continued momentum in developer and
user community growth.

Thanks
Wes

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

Posted by Micah Kornfield <em...@gmail.com>.
>
> I'm not saying that I think the work definitely will not be completed,
> but rather that we should put a date on the calendar as the target
> date for 1.0.0 and stick to it. If the work gets done, that's great.

I'm okay, with this as well, I just want to make sure we have the a path
forward to ultimately get two reference implementations.

On Wed, Apr 22, 2020 at 5:50 AM Wes McKinney <we...@gmail.com> wrote:

> hi Micah,
>
> I'm not saying that I think the work definitely will not be completed,
> but rather that we should put a date on the calendar as the target
> date for 1.0.0 and stick to it. If the work gets done, that's great.
>
> 10 to 12 weeks from now would mean releasing 1.0.0 either the week of
> June 29 or July 6. That is about 1 year since we discussed and adopted
> our SemVer policy [1]
>
> > I would propose that if there isn't an implementation in any language we
> > might drop it as part of the specification.  The main feature that I
> think
> > meets this criteria is the Dictionary of Dictionary columns (Is this
> > supported in C++)?
>
> I don't have a strong view on this, but IIUC this is implemented in
> JavaScript and probably not far off in C++.
>
> - Wes
>
> [1]:
> https://lists.apache.org/thread.html/2a630234214e590eb184c24bbf9dac4a8d8f7677d85a75fa49d70ba8%40%3Cdev.arrow.apache.org%3E
>
> On Wed, Apr 22, 2020 at 12:26 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > Hi Wes,
> > I think we might be closer than we think on the Java side to having the
> > functionality listed (I've added comments inline at the end with the
> > features you listed in the original e-mail).
> >
> > My biggest concern is I don't think there is a clear path forward for
> > Sparse Unions.  Getting compatibility for Sparse unions would be more
> > invasive/breaking changes to the java code base.  [1] is the last thread
> on
> > the issue.  I sadly have not had time to get back to this, nor will I
> > probably have time before the next release.
> >
> > I would propose that if there isn't an implementation in any language we
> > might drop it as part of the specification.  The main feature that I
> think
> > meets this criteria is the Dictionary of Dictionary columns (Is this
> > supported in C++)?
> >
> > Thanks,
> > Micah
> >
> >
> > * custom_metadata fields
> >
> > Not sure about this one.
> >
> > > * Extension Types
> >
> > There is an implementation already in Java, probably. needs more work for
> > integration testing.
> >
> > * Large (64-bit offset) variable size types
> >
> > there is an open PR for string/binary types.  LargeList is of more
> > questionable value until Java supports vectors/arrays with more than 2^32
> > elements.
> >
> > * Delta and Replacement Dictionaries
> >
> > There is an implementation already in Java, probably needs more work for
> > specifically for integration testing.
> >
> > > * Unions
> >
> > There is an implementation for dense unions (likely needs more work for
> > integration testing).
> >
> > On Tue, Apr 21, 2020 at 11:26 AM Neal Richardson <
> > neal.p.richardson@gmail.com> wrote:
> >
> > > I'm all for making our next release be 1.0. Everything is about
> tradeoffs,
> > > and while I too would like to see a complete Java implementation, I
> think
> > > the costs of further delaying 1.0 outweigh the benefits of holding it
> > > indefinitely in hopes that there will be enough availability of Java
> > > developers to finish integration testing.
> > >
> > > Neal
> > >
> > > On Tue, Apr 21, 2020 at 10:55 AM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > > hi Bryan -- with the way that things are going, if we were to block
> > > > the 1.0.0 release on completing the Java work, it could be a very
> long
> > > > time to wait (long time = more than 6 months from now). I don't think
> > > > that's acceptable. The Versioning document was formally adopted last
> > > > August and so a year will have soon elapsed since we previously said
> > > > we wanted to have everything integration tested.
> > > >
> > > > With what I'm proposing the primary things that would not be tested
> > > > (if no progress in Java):
> > > >
> > > > * custom_metadata fields
> > > > * Extension Types
> > > > * Large (64-bit offset) variable size types
> > > > * Delta and Replacement Dictionaries
> > > > * Unions
> > > >
> > > > These do not seem like huge sacrifices, or at least not ones that
> > > > compromise the stability of the columnar format. Of course, if some
> of
> > > > them are completed in the next 10-12 weeks, then that's great.
> > > >
> > > > - Wes
> > > >
> > > > On Tue, Apr 21, 2020 at 12:12 PM Bryan Cutler <cu...@gmail.com>
> wrote:
> > > > >
> > > > > I really would like to see a 1.0.0 release with complete
> > > implementations
> > > > > for C++ and Java. From my experience, that interoperability has
> been a
> > > > > major selling point for the project. That being said, my time for
> > > > > contributions has been pretty limited lately and I know that Java
> has
> > > > been
> > > > > lagging, so if the rest of the community would like to push forward
> > > with
> > > > a
> > > > > reduced scope, that is okay with me. I'll still continue to do
> what I
> > > can
> > > > > on Java to fill in the gaps.
> > > > >
> > > > > Bryan
> > > > >
> > > > > On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney <we...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi all -- are there some opinions about this?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney <
> wesmckinn@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > hi folks,
> > > > > > >
> > > > > > > Previously we had discussed a plan for making a 1.0.0 release
> based
> > > > on
> > > > > > > completeness of columnar format integration tests and making
> > > > > > > forward/backward compatibility guarantees as formalized in
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst
> > > > > > >
> > > > > > > In particular, we wanted to demonstrate comprehensive Java/C++
> > > > > > interoperability.
> > > > > > >
> > > > > > > As time has passed we have stalled out a bit on completing
> > > > integration
> > > > > > > tests for the "long tail" of data types and columnar format
> > > features.
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing
> > > > > > >
> > > > > > > As such I wanted to propose a reduction in scope so that we can
> > > make
> > > > a
> > > > > > > 1.0.0 release sooner. The plan would be as follows:
> > > > > > >
> > > > > > > * Endeavor to have integration tests implemented and working
> in at
> > > > > > > least one reference implementation (likely to be the C++
> library).
> > > It
> > > > > > > seems important to verify that what's in Columnar.rst is able
> to be
> > > > > > > unambiguously implemented.
> > > > > > > * Indicate in Versioning.rst or another place in the
> documentation
> > > > the
> > > > > > > list of data types or advanced columnar format features (like
> > > > > > > delta/replacement dictionaries) that are not yet fully
> integration
> > > > > > > tested.
> > > > > > >
> > > > > > > Some of the essential protocol stability details and all of the
> > > most
> > > > > > > commonly used data types have been stable for a long time now,
> > > > > > > particularly after the recent alignment change. The current
> list of
> > > > > > > features that aren't being tested for cross-implementation
> > > > > > > compatibility should not pose risk to downstream users.
> > > > > > >
> > > > > > > Thoughts about this? The 1.0.0 release is an important
> milestone
> > > for
> > > > > > > the project and will help build continued momentum in
> developer and
> > > > > > > user community growth.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Wes
> > > > > >
> > > >
> > >
>

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

Posted by Wes McKinney <we...@gmail.com>.
hi Micah,

I'm not saying that I think the work definitely will not be completed,
but rather that we should put a date on the calendar as the target
date for 1.0.0 and stick to it. If the work gets done, that's great.

10 to 12 weeks from now would mean releasing 1.0.0 either the week of
June 29 or July 6. That is about 1 year since we discussed and adopted
our SemVer policy [1]

> I would propose that if there isn't an implementation in any language we
> might drop it as part of the specification.  The main feature that I think
> meets this criteria is the Dictionary of Dictionary columns (Is this
> supported in C++)?

I don't have a strong view on this, but IIUC this is implemented in
JavaScript and probably not far off in C++.

- Wes

[1]: https://lists.apache.org/thread.html/2a630234214e590eb184c24bbf9dac4a8d8f7677d85a75fa49d70ba8%40%3Cdev.arrow.apache.org%3E

On Wed, Apr 22, 2020 at 12:26 AM Micah Kornfield <em...@gmail.com> wrote:
>
> Hi Wes,
> I think we might be closer than we think on the Java side to having the
> functionality listed (I've added comments inline at the end with the
> features you listed in the original e-mail).
>
> My biggest concern is I don't think there is a clear path forward for
> Sparse Unions.  Getting compatibility for Sparse unions would be more
> invasive/breaking changes to the java code base.  [1] is the last thread on
> the issue.  I sadly have not had time to get back to this, nor will I
> probably have time before the next release.
>
> I would propose that if there isn't an implementation in any language we
> might drop it as part of the specification.  The main feature that I think
> meets this criteria is the Dictionary of Dictionary columns (Is this
> supported in C++)?
>
> Thanks,
> Micah
>
>
> * custom_metadata fields
>
> Not sure about this one.
>
> > * Extension Types
>
> There is an implementation already in Java, probably. needs more work for
> integration testing.
>
> * Large (64-bit offset) variable size types
>
> there is an open PR for string/binary types.  LargeList is of more
> questionable value until Java supports vectors/arrays with more than 2^32
> elements.
>
> * Delta and Replacement Dictionaries
>
> There is an implementation already in Java, probably needs more work for
> specifically for integration testing.
>
> > * Unions
>
> There is an implementation for dense unions (likely needs more work for
> integration testing).
>
> On Tue, Apr 21, 2020 at 11:26 AM Neal Richardson <
> neal.p.richardson@gmail.com> wrote:
>
> > I'm all for making our next release be 1.0. Everything is about tradeoffs,
> > and while I too would like to see a complete Java implementation, I think
> > the costs of further delaying 1.0 outweigh the benefits of holding it
> > indefinitely in hopes that there will be enough availability of Java
> > developers to finish integration testing.
> >
> > Neal
> >
> > On Tue, Apr 21, 2020 at 10:55 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > > hi Bryan -- with the way that things are going, if we were to block
> > > the 1.0.0 release on completing the Java work, it could be a very long
> > > time to wait (long time = more than 6 months from now). I don't think
> > > that's acceptable. The Versioning document was formally adopted last
> > > August and so a year will have soon elapsed since we previously said
> > > we wanted to have everything integration tested.
> > >
> > > With what I'm proposing the primary things that would not be tested
> > > (if no progress in Java):
> > >
> > > * custom_metadata fields
> > > * Extension Types
> > > * Large (64-bit offset) variable size types
> > > * Delta and Replacement Dictionaries
> > > * Unions
> > >
> > > These do not seem like huge sacrifices, or at least not ones that
> > > compromise the stability of the columnar format. Of course, if some of
> > > them are completed in the next 10-12 weeks, then that's great.
> > >
> > > - Wes
> > >
> > > On Tue, Apr 21, 2020 at 12:12 PM Bryan Cutler <cu...@gmail.com> wrote:
> > > >
> > > > I really would like to see a 1.0.0 release with complete
> > implementations
> > > > for C++ and Java. From my experience, that interoperability has been a
> > > > major selling point for the project. That being said, my time for
> > > > contributions has been pretty limited lately and I know that Java has
> > > been
> > > > lagging, so if the rest of the community would like to push forward
> > with
> > > a
> > > > reduced scope, that is okay with me. I'll still continue to do what I
> > can
> > > > on Java to fill in the gaps.
> > > >
> > > > Bryan
> > > >
> > > > On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney <we...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi all -- are there some opinions about this?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney <we...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > hi folks,
> > > > > >
> > > > > > Previously we had discussed a plan for making a 1.0.0 release based
> > > on
> > > > > > completeness of columnar format integration tests and making
> > > > > > forward/backward compatibility guarantees as formalized in
> > > > > >
> > > > > >
> > > > >
> > >
> > https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst
> > > > > >
> > > > > > In particular, we wanted to demonstrate comprehensive Java/C++
> > > > > interoperability.
> > > > > >
> > > > > > As time has passed we have stalled out a bit on completing
> > > integration
> > > > > > tests for the "long tail" of data types and columnar format
> > features.
> > > > > >
> > > > > >
> > > > >
> > >
> > https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing
> > > > > >
> > > > > > As such I wanted to propose a reduction in scope so that we can
> > make
> > > a
> > > > > > 1.0.0 release sooner. The plan would be as follows:
> > > > > >
> > > > > > * Endeavor to have integration tests implemented and working in at
> > > > > > least one reference implementation (likely to be the C++ library).
> > It
> > > > > > seems important to verify that what's in Columnar.rst is able to be
> > > > > > unambiguously implemented.
> > > > > > * Indicate in Versioning.rst or another place in the documentation
> > > the
> > > > > > list of data types or advanced columnar format features (like
> > > > > > delta/replacement dictionaries) that are not yet fully integration
> > > > > > tested.
> > > > > >
> > > > > > Some of the essential protocol stability details and all of the
> > most
> > > > > > commonly used data types have been stable for a long time now,
> > > > > > particularly after the recent alignment change. The current list of
> > > > > > features that aren't being tested for cross-implementation
> > > > > > compatibility should not pose risk to downstream users.
> > > > > >
> > > > > > Thoughts about this? The 1.0.0 release is an important milestone
> > for
> > > > > > the project and will help build continued momentum in developer and
> > > > > > user community growth.
> > > > > >
> > > > > > Thanks
> > > > > > Wes
> > > > >
> > >
> >

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

Posted by Micah Kornfield <em...@gmail.com>.
Hi Wes,
I think we might be closer than we think on the Java side to having the
functionality listed (I've added comments inline at the end with the
features you listed in the original e-mail).

My biggest concern is I don't think there is a clear path forward for
Sparse Unions.  Getting compatibility for Sparse unions would be more
invasive/breaking changes to the java code base.  [1] is the last thread on
the issue.  I sadly have not had time to get back to this, nor will I
probably have time before the next release.

I would propose that if there isn't an implementation in any language we
might drop it as part of the specification.  The main feature that I think
meets this criteria is the Dictionary of Dictionary columns (Is this
supported in C++)?

Thanks,
Micah


* custom_metadata fields

Not sure about this one.

> * Extension Types

There is an implementation already in Java, probably. needs more work for
integration testing.

* Large (64-bit offset) variable size types

there is an open PR for string/binary types.  LargeList is of more
questionable value until Java supports vectors/arrays with more than 2^32
elements.

* Delta and Replacement Dictionaries

There is an implementation already in Java, probably needs more work for
specifically for integration testing.

> * Unions

There is an implementation for dense unions (likely needs more work for
integration testing).

On Tue, Apr 21, 2020 at 11:26 AM Neal Richardson <
neal.p.richardson@gmail.com> wrote:

> I'm all for making our next release be 1.0. Everything is about tradeoffs,
> and while I too would like to see a complete Java implementation, I think
> the costs of further delaying 1.0 outweigh the benefits of holding it
> indefinitely in hopes that there will be enough availability of Java
> developers to finish integration testing.
>
> Neal
>
> On Tue, Apr 21, 2020 at 10:55 AM Wes McKinney <we...@gmail.com> wrote:
>
> > hi Bryan -- with the way that things are going, if we were to block
> > the 1.0.0 release on completing the Java work, it could be a very long
> > time to wait (long time = more than 6 months from now). I don't think
> > that's acceptable. The Versioning document was formally adopted last
> > August and so a year will have soon elapsed since we previously said
> > we wanted to have everything integration tested.
> >
> > With what I'm proposing the primary things that would not be tested
> > (if no progress in Java):
> >
> > * custom_metadata fields
> > * Extension Types
> > * Large (64-bit offset) variable size types
> > * Delta and Replacement Dictionaries
> > * Unions
> >
> > These do not seem like huge sacrifices, or at least not ones that
> > compromise the stability of the columnar format. Of course, if some of
> > them are completed in the next 10-12 weeks, then that's great.
> >
> > - Wes
> >
> > On Tue, Apr 21, 2020 at 12:12 PM Bryan Cutler <cu...@gmail.com> wrote:
> > >
> > > I really would like to see a 1.0.0 release with complete
> implementations
> > > for C++ and Java. From my experience, that interoperability has been a
> > > major selling point for the project. That being said, my time for
> > > contributions has been pretty limited lately and I know that Java has
> > been
> > > lagging, so if the rest of the community would like to push forward
> with
> > a
> > > reduced scope, that is okay with me. I'll still continue to do what I
> can
> > > on Java to fill in the gaps.
> > >
> > > Bryan
> > >
> > > On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney <we...@gmail.com>
> > wrote:
> > >
> > > > Hi all -- are there some opinions about this?
> > > >
> > > > Thanks
> > > >
> > > > On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney <we...@gmail.com>
> > wrote:
> > > > >
> > > > > hi folks,
> > > > >
> > > > > Previously we had discussed a plan for making a 1.0.0 release based
> > on
> > > > > completeness of columnar format integration tests and making
> > > > > forward/backward compatibility guarantees as formalized in
> > > > >
> > > > >
> > > >
> >
> https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst
> > > > >
> > > > > In particular, we wanted to demonstrate comprehensive Java/C++
> > > > interoperability.
> > > > >
> > > > > As time has passed we have stalled out a bit on completing
> > integration
> > > > > tests for the "long tail" of data types and columnar format
> features.
> > > > >
> > > > >
> > > >
> >
> https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing
> > > > >
> > > > > As such I wanted to propose a reduction in scope so that we can
> make
> > a
> > > > > 1.0.0 release sooner. The plan would be as follows:
> > > > >
> > > > > * Endeavor to have integration tests implemented and working in at
> > > > > least one reference implementation (likely to be the C++ library).
> It
> > > > > seems important to verify that what's in Columnar.rst is able to be
> > > > > unambiguously implemented.
> > > > > * Indicate in Versioning.rst or another place in the documentation
> > the
> > > > > list of data types or advanced columnar format features (like
> > > > > delta/replacement dictionaries) that are not yet fully integration
> > > > > tested.
> > > > >
> > > > > Some of the essential protocol stability details and all of the
> most
> > > > > commonly used data types have been stable for a long time now,
> > > > > particularly after the recent alignment change. The current list of
> > > > > features that aren't being tested for cross-implementation
> > > > > compatibility should not pose risk to downstream users.
> > > > >
> > > > > Thoughts about this? The 1.0.0 release is an important milestone
> for
> > > > > the project and will help build continued momentum in developer and
> > > > > user community growth.
> > > > >
> > > > > Thanks
> > > > > Wes
> > > >
> >
>

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

Posted by Neal Richardson <ne...@gmail.com>.
I'm all for making our next release be 1.0. Everything is about tradeoffs,
and while I too would like to see a complete Java implementation, I think
the costs of further delaying 1.0 outweigh the benefits of holding it
indefinitely in hopes that there will be enough availability of Java
developers to finish integration testing.

Neal

On Tue, Apr 21, 2020 at 10:55 AM Wes McKinney <we...@gmail.com> wrote:

> hi Bryan -- with the way that things are going, if we were to block
> the 1.0.0 release on completing the Java work, it could be a very long
> time to wait (long time = more than 6 months from now). I don't think
> that's acceptable. The Versioning document was formally adopted last
> August and so a year will have soon elapsed since we previously said
> we wanted to have everything integration tested.
>
> With what I'm proposing the primary things that would not be tested
> (if no progress in Java):
>
> * custom_metadata fields
> * Extension Types
> * Large (64-bit offset) variable size types
> * Delta and Replacement Dictionaries
> * Unions
>
> These do not seem like huge sacrifices, or at least not ones that
> compromise the stability of the columnar format. Of course, if some of
> them are completed in the next 10-12 weeks, then that's great.
>
> - Wes
>
> On Tue, Apr 21, 2020 at 12:12 PM Bryan Cutler <cu...@gmail.com> wrote:
> >
> > I really would like to see a 1.0.0 release with complete implementations
> > for C++ and Java. From my experience, that interoperability has been a
> > major selling point for the project. That being said, my time for
> > contributions has been pretty limited lately and I know that Java has
> been
> > lagging, so if the rest of the community would like to push forward with
> a
> > reduced scope, that is okay with me. I'll still continue to do what I can
> > on Java to fill in the gaps.
> >
> > Bryan
> >
> > On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > Hi all -- are there some opinions about this?
> > >
> > > Thanks
> > >
> > > On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > > hi folks,
> > > >
> > > > Previously we had discussed a plan for making a 1.0.0 release based
> on
> > > > completeness of columnar format integration tests and making
> > > > forward/backward compatibility guarantees as formalized in
> > > >
> > > >
> > >
> https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst
> > > >
> > > > In particular, we wanted to demonstrate comprehensive Java/C++
> > > interoperability.
> > > >
> > > > As time has passed we have stalled out a bit on completing
> integration
> > > > tests for the "long tail" of data types and columnar format features.
> > > >
> > > >
> > >
> https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing
> > > >
> > > > As such I wanted to propose a reduction in scope so that we can make
> a
> > > > 1.0.0 release sooner. The plan would be as follows:
> > > >
> > > > * Endeavor to have integration tests implemented and working in at
> > > > least one reference implementation (likely to be the C++ library). It
> > > > seems important to verify that what's in Columnar.rst is able to be
> > > > unambiguously implemented.
> > > > * Indicate in Versioning.rst or another place in the documentation
> the
> > > > list of data types or advanced columnar format features (like
> > > > delta/replacement dictionaries) that are not yet fully integration
> > > > tested.
> > > >
> > > > Some of the essential protocol stability details and all of the most
> > > > commonly used data types have been stable for a long time now,
> > > > particularly after the recent alignment change. The current list of
> > > > features that aren't being tested for cross-implementation
> > > > compatibility should not pose risk to downstream users.
> > > >
> > > > Thoughts about this? The 1.0.0 release is an important milestone for
> > > > the project and will help build continued momentum in developer and
> > > > user community growth.
> > > >
> > > > Thanks
> > > > Wes
> > >
>

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

Posted by Wes McKinney <we...@gmail.com>.
hi Bryan -- with the way that things are going, if we were to block
the 1.0.0 release on completing the Java work, it could be a very long
time to wait (long time = more than 6 months from now). I don't think
that's acceptable. The Versioning document was formally adopted last
August and so a year will have soon elapsed since we previously said
we wanted to have everything integration tested.

With what I'm proposing the primary things that would not be tested
(if no progress in Java):

* custom_metadata fields
* Extension Types
* Large (64-bit offset) variable size types
* Delta and Replacement Dictionaries
* Unions

These do not seem like huge sacrifices, or at least not ones that
compromise the stability of the columnar format. Of course, if some of
them are completed in the next 10-12 weeks, then that's great.

- Wes

On Tue, Apr 21, 2020 at 12:12 PM Bryan Cutler <cu...@gmail.com> wrote:
>
> I really would like to see a 1.0.0 release with complete implementations
> for C++ and Java. From my experience, that interoperability has been a
> major selling point for the project. That being said, my time for
> contributions has been pretty limited lately and I know that Java has been
> lagging, so if the rest of the community would like to push forward with a
> reduced scope, that is okay with me. I'll still continue to do what I can
> on Java to fill in the gaps.
>
> Bryan
>
> On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney <we...@gmail.com> wrote:
>
> > Hi all -- are there some opinions about this?
> >
> > Thanks
> >
> > On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > hi folks,
> > >
> > > Previously we had discussed a plan for making a 1.0.0 release based on
> > > completeness of columnar format integration tests and making
> > > forward/backward compatibility guarantees as formalized in
> > >
> > >
> > https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst
> > >
> > > In particular, we wanted to demonstrate comprehensive Java/C++
> > interoperability.
> > >
> > > As time has passed we have stalled out a bit on completing integration
> > > tests for the "long tail" of data types and columnar format features.
> > >
> > >
> > https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing
> > >
> > > As such I wanted to propose a reduction in scope so that we can make a
> > > 1.0.0 release sooner. The plan would be as follows:
> > >
> > > * Endeavor to have integration tests implemented and working in at
> > > least one reference implementation (likely to be the C++ library). It
> > > seems important to verify that what's in Columnar.rst is able to be
> > > unambiguously implemented.
> > > * Indicate in Versioning.rst or another place in the documentation the
> > > list of data types or advanced columnar format features (like
> > > delta/replacement dictionaries) that are not yet fully integration
> > > tested.
> > >
> > > Some of the essential protocol stability details and all of the most
> > > commonly used data types have been stable for a long time now,
> > > particularly after the recent alignment change. The current list of
> > > features that aren't being tested for cross-implementation
> > > compatibility should not pose risk to downstream users.
> > >
> > > Thoughts about this? The 1.0.0 release is an important milestone for
> > > the project and will help build continued momentum in developer and
> > > user community growth.
> > >
> > > Thanks
> > > Wes
> >

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

Posted by Bryan Cutler <cu...@gmail.com>.
I really would like to see a 1.0.0 release with complete implementations
for C++ and Java. From my experience, that interoperability has been a
major selling point for the project. That being said, my time for
contributions has been pretty limited lately and I know that Java has been
lagging, so if the rest of the community would like to push forward with a
reduced scope, that is okay with me. I'll still continue to do what I can
on Java to fill in the gaps.

Bryan

On Tue, Apr 21, 2020 at 8:47 AM Wes McKinney <we...@gmail.com> wrote:

> Hi all -- are there some opinions about this?
>
> Thanks
>
> On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > hi folks,
> >
> > Previously we had discussed a plan for making a 1.0.0 release based on
> > completeness of columnar format integration tests and making
> > forward/backward compatibility guarantees as formalized in
> >
> >
> https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst
> >
> > In particular, we wanted to demonstrate comprehensive Java/C++
> interoperability.
> >
> > As time has passed we have stalled out a bit on completing integration
> > tests for the "long tail" of data types and columnar format features.
> >
> >
> https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing
> >
> > As such I wanted to propose a reduction in scope so that we can make a
> > 1.0.0 release sooner. The plan would be as follows:
> >
> > * Endeavor to have integration tests implemented and working in at
> > least one reference implementation (likely to be the C++ library). It
> > seems important to verify that what's in Columnar.rst is able to be
> > unambiguously implemented.
> > * Indicate in Versioning.rst or another place in the documentation the
> > list of data types or advanced columnar format features (like
> > delta/replacement dictionaries) that are not yet fully integration
> > tested.
> >
> > Some of the essential protocol stability details and all of the most
> > commonly used data types have been stable for a long time now,
> > particularly after the recent alignment change. The current list of
> > features that aren't being tested for cross-implementation
> > compatibility should not pose risk to downstream users.
> >
> > Thoughts about this? The 1.0.0 release is an important milestone for
> > the project and will help build continued momentum in developer and
> > user community growth.
> >
> > Thanks
> > Wes
>

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

Posted by Wes McKinney <we...@gmail.com>.
Hi all -- are there some opinions about this?

Thanks

On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney <we...@gmail.com> wrote:
>
> hi folks,
>
> Previously we had discussed a plan for making a 1.0.0 release based on
> completeness of columnar format integration tests and making
> forward/backward compatibility guarantees as formalized in
>
> https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst
>
> In particular, we wanted to demonstrate comprehensive Java/C++ interoperability.
>
> As time has passed we have stalled out a bit on completing integration
> tests for the "long tail" of data types and columnar format features.
>
> https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing
>
> As such I wanted to propose a reduction in scope so that we can make a
> 1.0.0 release sooner. The plan would be as follows:
>
> * Endeavor to have integration tests implemented and working in at
> least one reference implementation (likely to be the C++ library). It
> seems important to verify that what's in Columnar.rst is able to be
> unambiguously implemented.
> * Indicate in Versioning.rst or another place in the documentation the
> list of data types or advanced columnar format features (like
> delta/replacement dictionaries) that are not yet fully integration
> tested.
>
> Some of the essential protocol stability details and all of the most
> commonly used data types have been stable for a long time now,
> particularly after the recent alignment change. The current list of
> features that aren't being tested for cross-implementation
> compatibility should not pose risk to downstream users.
>
> Thoughts about this? The 1.0.0 release is an important milestone for
> the project and will help build continued momentum in developer and
> user community growth.
>
> Thanks
> Wes