You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2020/06/24 22:40:56 UTC

[DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

hi folks,

This has come up in some other contexts, but I believe it would be a
good idea to increment the version number in Schema.fbs starting with
1.0.0 to separate the pre-1.0 and post-1.0 worlds

https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22

Given that we are contemplating a number of changes to assist with
forward compatibility and a breaking serialization change for unions,
this would seem prudent so that we do not risk breaking compatibility
with 0.17.1 and prior.

Given that there are no major backwards incompatibilities, there
should be no problem with 1.0.0 readers reading data generated by
libraries <= 0.17.1.

However, in order to accommodate existing applications that are
deployed with < 1.0.0 already (forward compatibility), I would suggest
in C++ to add an option for IpcWriteOptions to target the V4 format
which would disable the use of certain features (like unions) where
there is an issue. This could also be opted in to with an environment
variable. It's basically the same issue as the
"ARROW_PRE_0_15_IPC_FORMAT" environment variable that we added because
of the IPC alignment change. I admit it's kind of an eyesore but in a
couple of years I suspect we could drop these forward compatibility
crutches.

Thanks,
Wes

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by David Li <li...@gmail.com>.
I just filed https://issues.apache.org/jira/browse/ARROW-9362 and plan
to work on it tomorrow.

David

On 7/7/20, Wes McKinney <we...@gmail.com> wrote:
> I don't recall a ticket for the Java work but you're certainly a good
> candidate to take the lead on it.
>
> On Tue, Jul 7, 2020 at 3:16 PM David Li <li...@gmail.com> wrote:
>>
>> I see there's ARROW-9258 to do the backwards compatibility work for
>> C++ and ARROW-9333 to expose this for Python; is there any ticket or
>> anyone planning on doing this for Java? Otherwise I'm willing to look
>> at it so that we can do some testing with Flight.
>>
>> Best,
>> David
>>
>> On 6/29/20, Wes McKinney <we...@gmail.com> wrote:
>> > Thanks David. Indeed it seems that exposing IpcWriteOptions is going
>> > to be critical here. I'd like to avoid an "environment variable"
>> > workaround at the C++ level instead only providing such things in e.g.
>> > Python like we did for the alignment patch
>> >
>> > On Mon, Jun 29, 2020 at 9:30 AM David Li <li...@gmail.com> wrote:
>> >>
>> >> This would cause compatibility issues for Flight servers/clients
>> >> between versions as well. The situation is a little worse since
>> >> IpcWriteOptions isn't exposed and so you can't control what version
>> >> you write. But just exposing them in lieu of a full negotiation (which
>> >> we should start thinking about) should be enough to work through this.
>> >>
>> >> I see there's https://issues.apache.org/jira/browse/ARROW-8190 so I'll
>> >> try to tackle this soon (and do the same for Java) since it should be
>> >> independent of whether the format change goes through.
>> >>
>> >> Best,
>> >> David
>> >>
>> >> On 6/28/20, Wes McKinney <we...@gmail.com> wrote:
>> >> > I opened a PR https://github.com/apache/arrow/pull/7566
>> >> >
>> >> > We should prioritize getting through the other format changes, but
>> >> > we
>> >> > can vote on this in the meantime if there is consensus
>> >> >
>> >> > On Fri, Jun 26, 2020 at 2:58 PM Micah Kornfield
>> >> > <em...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> I agree I think we have to do this given the number of changes in
>> >> >> flight
>> >> >> (especially union types).
>> >> >>
>> >> >> On Fri, Jun 26, 2020 at 7:29 AM Wes McKinney <we...@gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >> > I created a JIRA about this
>> >> >> >
>> >> >> > https://issues.apache.org/jira/browse/ARROW-9231
>> >> >> >
>> >> >> > This issue is quite important so please take a look.
>> >> >> >
>> >> >> > On Thu, Jun 25, 2020 at 8:53 AM Wes McKinney
>> >> >> > <we...@gmail.com>
>> >> >> > wrote:
>> >> >> > >
>> >> >> > > On Thu, Jun 25, 2020 at 5:31 AM Antoine Pitrou
>> >> >> > > <an...@python.org>
>> >> >> > wrote:
>> >> >> > > >
>> >> >> > > >
>> >> >> > > > Le 25/06/2020 à 12:18, Antoine Pitrou a écrit :
>> >> >> > > > >
>> >> >> > > > > Le 25/06/2020 à 00:40, Wes McKinney a écrit :
>> >> >> > > > >> hi folks,
>> >> >> > > > >>
>> >> >> > > > >> This has come up in some other contexts, but I believe it
>> >> >> > > > >> would
>> >> >> > > > >> be a
>> >> >> > > > >> good idea to increment the version number in Schema.fbs
>> >> >> > > > >> starting
>> >> >> > with
>> >> >> > > > >> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
>> >> >> > > > >>
>> >> >> > > > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
>> >> >> > > > >>
>> >> >> > > > >> Given that we are contemplating a number of changes to
>> >> >> > > > >> assist
>> >> >> > > > >> with
>> >> >> > > > >> forward compatibility and a breaking serialization change
>> >> >> > > > >> for
>> >> >> > unions,
>> >> >> > > > >> this would seem prudent so that we do not risk breaking
>> >> >> > compatibility
>> >> >> > > > >> with 0.17.1 and prior.
>> >> >> > > > >>
>> >> >> > > > >> Given that there are no major backwards incompatibilities,
>> >> >> > > > >> there
>> >> >> > > > >> should be no problem with 1.0.0 readers reading data
>> >> >> > > > >> generated
>> >> >> > > > >> by
>> >> >> > > > >> libraries <= 0.17.1.
>> >> >> > > > >
>> >> >> > > > > Actually, it seems that a dense array with top-level null
>> >> >> > > > > values
>> >> >> > > > > (represented in 0.17.1 fashion) would need non-trivial
>> >> >> > > > > rewriting
>> >> >> > > > > of
>> >> >> > its
>> >> >> > > > > offsets and child arrays (at least one child array) to
>> >> >> > > > > represent
>> >> >> > > > > the
>> >> >> > > > > nulls at the child level.
>> >> >> > > > >
>> >> >> > > > > This is unless we keep the top-level union null bitmap in
>> >> >> > > > > C++
>> >> >> > > > > and
>> >> >> > only
>> >> >> > > > > avoid emitting it on the IPC side.  Which would be a
>> >> >> > > > > slightly
>> >> >> > > > > weird
>> >> >> > > > > arrangement, but would limit incompatibilites on the C++
>> >> >> > > > > API
>> >> >> > > > > side.
>> >> >> > > >
>> >> >> > > > Actually, if we do this, the same problem will appear on the
>> >> >> > > > IPC
>> >> >> > > > write
>> >> >> > > > side (C++-created dense union arrays with a top-level null
>> >> >> > > > bitmap
>> >> >> > > > will
>> >> >> > > > need regenerating some of the child buffers).
>> >> >> > >
>> >> >> > > I see. Well I think we can shut down this issue by giving up on
>> >> >> > > Union
>> >> >> > > forward compatibility V4 / pre-1.0 libraries.
>> >> >> > >
>> >> >> > > > Regards
>> >> >> > > >
>> >> >> > > > Antoine.
>> >> >> >
>> >> >
>> >
>

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by Wes McKinney <we...@gmail.com>.
I don't recall a ticket for the Java work but you're certainly a good
candidate to take the lead on it.

On Tue, Jul 7, 2020 at 3:16 PM David Li <li...@gmail.com> wrote:
>
> I see there's ARROW-9258 to do the backwards compatibility work for
> C++ and ARROW-9333 to expose this for Python; is there any ticket or
> anyone planning on doing this for Java? Otherwise I'm willing to look
> at it so that we can do some testing with Flight.
>
> Best,
> David
>
> On 6/29/20, Wes McKinney <we...@gmail.com> wrote:
> > Thanks David. Indeed it seems that exposing IpcWriteOptions is going
> > to be critical here. I'd like to avoid an "environment variable"
> > workaround at the C++ level instead only providing such things in e.g.
> > Python like we did for the alignment patch
> >
> > On Mon, Jun 29, 2020 at 9:30 AM David Li <li...@gmail.com> wrote:
> >>
> >> This would cause compatibility issues for Flight servers/clients
> >> between versions as well. The situation is a little worse since
> >> IpcWriteOptions isn't exposed and so you can't control what version
> >> you write. But just exposing them in lieu of a full negotiation (which
> >> we should start thinking about) should be enough to work through this.
> >>
> >> I see there's https://issues.apache.org/jira/browse/ARROW-8190 so I'll
> >> try to tackle this soon (and do the same for Java) since it should be
> >> independent of whether the format change goes through.
> >>
> >> Best,
> >> David
> >>
> >> On 6/28/20, Wes McKinney <we...@gmail.com> wrote:
> >> > I opened a PR https://github.com/apache/arrow/pull/7566
> >> >
> >> > We should prioritize getting through the other format changes, but we
> >> > can vote on this in the meantime if there is consensus
> >> >
> >> > On Fri, Jun 26, 2020 at 2:58 PM Micah Kornfield <em...@gmail.com>
> >> > wrote:
> >> >>
> >> >> I agree I think we have to do this given the number of changes in
> >> >> flight
> >> >> (especially union types).
> >> >>
> >> >> On Fri, Jun 26, 2020 at 7:29 AM Wes McKinney <we...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > I created a JIRA about this
> >> >> >
> >> >> > https://issues.apache.org/jira/browse/ARROW-9231
> >> >> >
> >> >> > This issue is quite important so please take a look.
> >> >> >
> >> >> > On Thu, Jun 25, 2020 at 8:53 AM Wes McKinney <we...@gmail.com>
> >> >> > wrote:
> >> >> > >
> >> >> > > On Thu, Jun 25, 2020 at 5:31 AM Antoine Pitrou
> >> >> > > <an...@python.org>
> >> >> > wrote:
> >> >> > > >
> >> >> > > >
> >> >> > > > Le 25/06/2020 à 12:18, Antoine Pitrou a écrit :
> >> >> > > > >
> >> >> > > > > Le 25/06/2020 à 00:40, Wes McKinney a écrit :
> >> >> > > > >> hi folks,
> >> >> > > > >>
> >> >> > > > >> This has come up in some other contexts, but I believe it
> >> >> > > > >> would
> >> >> > > > >> be a
> >> >> > > > >> good idea to increment the version number in Schema.fbs
> >> >> > > > >> starting
> >> >> > with
> >> >> > > > >> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
> >> >> > > > >>
> >> >> > > > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
> >> >> > > > >>
> >> >> > > > >> Given that we are contemplating a number of changes to assist
> >> >> > > > >> with
> >> >> > > > >> forward compatibility and a breaking serialization change for
> >> >> > unions,
> >> >> > > > >> this would seem prudent so that we do not risk breaking
> >> >> > compatibility
> >> >> > > > >> with 0.17.1 and prior.
> >> >> > > > >>
> >> >> > > > >> Given that there are no major backwards incompatibilities,
> >> >> > > > >> there
> >> >> > > > >> should be no problem with 1.0.0 readers reading data
> >> >> > > > >> generated
> >> >> > > > >> by
> >> >> > > > >> libraries <= 0.17.1.
> >> >> > > > >
> >> >> > > > > Actually, it seems that a dense array with top-level null
> >> >> > > > > values
> >> >> > > > > (represented in 0.17.1 fashion) would need non-trivial
> >> >> > > > > rewriting
> >> >> > > > > of
> >> >> > its
> >> >> > > > > offsets and child arrays (at least one child array) to
> >> >> > > > > represent
> >> >> > > > > the
> >> >> > > > > nulls at the child level.
> >> >> > > > >
> >> >> > > > > This is unless we keep the top-level union null bitmap in C++
> >> >> > > > > and
> >> >> > only
> >> >> > > > > avoid emitting it on the IPC side.  Which would be a slightly
> >> >> > > > > weird
> >> >> > > > > arrangement, but would limit incompatibilites on the C++ API
> >> >> > > > > side.
> >> >> > > >
> >> >> > > > Actually, if we do this, the same problem will appear on the IPC
> >> >> > > > write
> >> >> > > > side (C++-created dense union arrays with a top-level null
> >> >> > > > bitmap
> >> >> > > > will
> >> >> > > > need regenerating some of the child buffers).
> >> >> > >
> >> >> > > I see. Well I think we can shut down this issue by giving up on
> >> >> > > Union
> >> >> > > forward compatibility V4 / pre-1.0 libraries.
> >> >> > >
> >> >> > > > Regards
> >> >> > > >
> >> >> > > > Antoine.
> >> >> >
> >> >
> >

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by David Li <li...@gmail.com>.
I see there's ARROW-9258 to do the backwards compatibility work for
C++ and ARROW-9333 to expose this for Python; is there any ticket or
anyone planning on doing this for Java? Otherwise I'm willing to look
at it so that we can do some testing with Flight.

Best,
David

On 6/29/20, Wes McKinney <we...@gmail.com> wrote:
> Thanks David. Indeed it seems that exposing IpcWriteOptions is going
> to be critical here. I'd like to avoid an "environment variable"
> workaround at the C++ level instead only providing such things in e.g.
> Python like we did for the alignment patch
>
> On Mon, Jun 29, 2020 at 9:30 AM David Li <li...@gmail.com> wrote:
>>
>> This would cause compatibility issues for Flight servers/clients
>> between versions as well. The situation is a little worse since
>> IpcWriteOptions isn't exposed and so you can't control what version
>> you write. But just exposing them in lieu of a full negotiation (which
>> we should start thinking about) should be enough to work through this.
>>
>> I see there's https://issues.apache.org/jira/browse/ARROW-8190 so I'll
>> try to tackle this soon (and do the same for Java) since it should be
>> independent of whether the format change goes through.
>>
>> Best,
>> David
>>
>> On 6/28/20, Wes McKinney <we...@gmail.com> wrote:
>> > I opened a PR https://github.com/apache/arrow/pull/7566
>> >
>> > We should prioritize getting through the other format changes, but we
>> > can vote on this in the meantime if there is consensus
>> >
>> > On Fri, Jun 26, 2020 at 2:58 PM Micah Kornfield <em...@gmail.com>
>> > wrote:
>> >>
>> >> I agree I think we have to do this given the number of changes in
>> >> flight
>> >> (especially union types).
>> >>
>> >> On Fri, Jun 26, 2020 at 7:29 AM Wes McKinney <we...@gmail.com>
>> >> wrote:
>> >>
>> >> > I created a JIRA about this
>> >> >
>> >> > https://issues.apache.org/jira/browse/ARROW-9231
>> >> >
>> >> > This issue is quite important so please take a look.
>> >> >
>> >> > On Thu, Jun 25, 2020 at 8:53 AM Wes McKinney <we...@gmail.com>
>> >> > wrote:
>> >> > >
>> >> > > On Thu, Jun 25, 2020 at 5:31 AM Antoine Pitrou
>> >> > > <an...@python.org>
>> >> > wrote:
>> >> > > >
>> >> > > >
>> >> > > > Le 25/06/2020 à 12:18, Antoine Pitrou a écrit :
>> >> > > > >
>> >> > > > > Le 25/06/2020 à 00:40, Wes McKinney a écrit :
>> >> > > > >> hi folks,
>> >> > > > >>
>> >> > > > >> This has come up in some other contexts, but I believe it
>> >> > > > >> would
>> >> > > > >> be a
>> >> > > > >> good idea to increment the version number in Schema.fbs
>> >> > > > >> starting
>> >> > with
>> >> > > > >> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
>> >> > > > >>
>> >> > > > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
>> >> > > > >>
>> >> > > > >> Given that we are contemplating a number of changes to assist
>> >> > > > >> with
>> >> > > > >> forward compatibility and a breaking serialization change for
>> >> > unions,
>> >> > > > >> this would seem prudent so that we do not risk breaking
>> >> > compatibility
>> >> > > > >> with 0.17.1 and prior.
>> >> > > > >>
>> >> > > > >> Given that there are no major backwards incompatibilities,
>> >> > > > >> there
>> >> > > > >> should be no problem with 1.0.0 readers reading data
>> >> > > > >> generated
>> >> > > > >> by
>> >> > > > >> libraries <= 0.17.1.
>> >> > > > >
>> >> > > > > Actually, it seems that a dense array with top-level null
>> >> > > > > values
>> >> > > > > (represented in 0.17.1 fashion) would need non-trivial
>> >> > > > > rewriting
>> >> > > > > of
>> >> > its
>> >> > > > > offsets and child arrays (at least one child array) to
>> >> > > > > represent
>> >> > > > > the
>> >> > > > > nulls at the child level.
>> >> > > > >
>> >> > > > > This is unless we keep the top-level union null bitmap in C++
>> >> > > > > and
>> >> > only
>> >> > > > > avoid emitting it on the IPC side.  Which would be a slightly
>> >> > > > > weird
>> >> > > > > arrangement, but would limit incompatibilites on the C++ API
>> >> > > > > side.
>> >> > > >
>> >> > > > Actually, if we do this, the same problem will appear on the IPC
>> >> > > > write
>> >> > > > side (C++-created dense union arrays with a top-level null
>> >> > > > bitmap
>> >> > > > will
>> >> > > > need regenerating some of the child buffers).
>> >> > >
>> >> > > I see. Well I think we can shut down this issue by giving up on
>> >> > > Union
>> >> > > forward compatibility V4 / pre-1.0 libraries.
>> >> > >
>> >> > > > Regards
>> >> > > >
>> >> > > > Antoine.
>> >> >
>> >
>

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by Wes McKinney <we...@gmail.com>.
Thanks David. Indeed it seems that exposing IpcWriteOptions is going
to be critical here. I'd like to avoid an "environment variable"
workaround at the C++ level instead only providing such things in e.g.
Python like we did for the alignment patch

On Mon, Jun 29, 2020 at 9:30 AM David Li <li...@gmail.com> wrote:
>
> This would cause compatibility issues for Flight servers/clients
> between versions as well. The situation is a little worse since
> IpcWriteOptions isn't exposed and so you can't control what version
> you write. But just exposing them in lieu of a full negotiation (which
> we should start thinking about) should be enough to work through this.
>
> I see there's https://issues.apache.org/jira/browse/ARROW-8190 so I'll
> try to tackle this soon (and do the same for Java) since it should be
> independent of whether the format change goes through.
>
> Best,
> David
>
> On 6/28/20, Wes McKinney <we...@gmail.com> wrote:
> > I opened a PR https://github.com/apache/arrow/pull/7566
> >
> > We should prioritize getting through the other format changes, but we
> > can vote on this in the meantime if there is consensus
> >
> > On Fri, Jun 26, 2020 at 2:58 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> >>
> >> I agree I think we have to do this given the number of changes in flight
> >> (especially union types).
> >>
> >> On Fri, Jun 26, 2020 at 7:29 AM Wes McKinney <we...@gmail.com> wrote:
> >>
> >> > I created a JIRA about this
> >> >
> >> > https://issues.apache.org/jira/browse/ARROW-9231
> >> >
> >> > This issue is quite important so please take a look.
> >> >
> >> > On Thu, Jun 25, 2020 at 8:53 AM Wes McKinney <we...@gmail.com>
> >> > wrote:
> >> > >
> >> > > On Thu, Jun 25, 2020 at 5:31 AM Antoine Pitrou <an...@python.org>
> >> > wrote:
> >> > > >
> >> > > >
> >> > > > Le 25/06/2020 à 12:18, Antoine Pitrou a écrit :
> >> > > > >
> >> > > > > Le 25/06/2020 à 00:40, Wes McKinney a écrit :
> >> > > > >> hi folks,
> >> > > > >>
> >> > > > >> This has come up in some other contexts, but I believe it would
> >> > > > >> be a
> >> > > > >> good idea to increment the version number in Schema.fbs starting
> >> > with
> >> > > > >> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
> >> > > > >>
> >> > > > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
> >> > > > >>
> >> > > > >> Given that we are contemplating a number of changes to assist
> >> > > > >> with
> >> > > > >> forward compatibility and a breaking serialization change for
> >> > unions,
> >> > > > >> this would seem prudent so that we do not risk breaking
> >> > compatibility
> >> > > > >> with 0.17.1 and prior.
> >> > > > >>
> >> > > > >> Given that there are no major backwards incompatibilities, there
> >> > > > >> should be no problem with 1.0.0 readers reading data generated
> >> > > > >> by
> >> > > > >> libraries <= 0.17.1.
> >> > > > >
> >> > > > > Actually, it seems that a dense array with top-level null values
> >> > > > > (represented in 0.17.1 fashion) would need non-trivial rewriting
> >> > > > > of
> >> > its
> >> > > > > offsets and child arrays (at least one child array) to represent
> >> > > > > the
> >> > > > > nulls at the child level.
> >> > > > >
> >> > > > > This is unless we keep the top-level union null bitmap in C++ and
> >> > only
> >> > > > > avoid emitting it on the IPC side.  Which would be a slightly
> >> > > > > weird
> >> > > > > arrangement, but would limit incompatibilites on the C++ API
> >> > > > > side.
> >> > > >
> >> > > > Actually, if we do this, the same problem will appear on the IPC
> >> > > > write
> >> > > > side (C++-created dense union arrays with a top-level null bitmap
> >> > > > will
> >> > > > need regenerating some of the child buffers).
> >> > >
> >> > > I see. Well I think we can shut down this issue by giving up on Union
> >> > > forward compatibility V4 / pre-1.0 libraries.
> >> > >
> >> > > > Regards
> >> > > >
> >> > > > Antoine.
> >> >
> >

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by David Li <li...@gmail.com>.
This would cause compatibility issues for Flight servers/clients
between versions as well. The situation is a little worse since
IpcWriteOptions isn't exposed and so you can't control what version
you write. But just exposing them in lieu of a full negotiation (which
we should start thinking about) should be enough to work through this.

I see there's https://issues.apache.org/jira/browse/ARROW-8190 so I'll
try to tackle this soon (and do the same for Java) since it should be
independent of whether the format change goes through.

Best,
David

On 6/28/20, Wes McKinney <we...@gmail.com> wrote:
> I opened a PR https://github.com/apache/arrow/pull/7566
>
> We should prioritize getting through the other format changes, but we
> can vote on this in the meantime if there is consensus
>
> On Fri, Jun 26, 2020 at 2:58 PM Micah Kornfield <em...@gmail.com>
> wrote:
>>
>> I agree I think we have to do this given the number of changes in flight
>> (especially union types).
>>
>> On Fri, Jun 26, 2020 at 7:29 AM Wes McKinney <we...@gmail.com> wrote:
>>
>> > I created a JIRA about this
>> >
>> > https://issues.apache.org/jira/browse/ARROW-9231
>> >
>> > This issue is quite important so please take a look.
>> >
>> > On Thu, Jun 25, 2020 at 8:53 AM Wes McKinney <we...@gmail.com>
>> > wrote:
>> > >
>> > > On Thu, Jun 25, 2020 at 5:31 AM Antoine Pitrou <an...@python.org>
>> > wrote:
>> > > >
>> > > >
>> > > > Le 25/06/2020 à 12:18, Antoine Pitrou a écrit :
>> > > > >
>> > > > > Le 25/06/2020 à 00:40, Wes McKinney a écrit :
>> > > > >> hi folks,
>> > > > >>
>> > > > >> This has come up in some other contexts, but I believe it would
>> > > > >> be a
>> > > > >> good idea to increment the version number in Schema.fbs starting
>> > with
>> > > > >> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
>> > > > >>
>> > > > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
>> > > > >>
>> > > > >> Given that we are contemplating a number of changes to assist
>> > > > >> with
>> > > > >> forward compatibility and a breaking serialization change for
>> > unions,
>> > > > >> this would seem prudent so that we do not risk breaking
>> > compatibility
>> > > > >> with 0.17.1 and prior.
>> > > > >>
>> > > > >> Given that there are no major backwards incompatibilities, there
>> > > > >> should be no problem with 1.0.0 readers reading data generated
>> > > > >> by
>> > > > >> libraries <= 0.17.1.
>> > > > >
>> > > > > Actually, it seems that a dense array with top-level null values
>> > > > > (represented in 0.17.1 fashion) would need non-trivial rewriting
>> > > > > of
>> > its
>> > > > > offsets and child arrays (at least one child array) to represent
>> > > > > the
>> > > > > nulls at the child level.
>> > > > >
>> > > > > This is unless we keep the top-level union null bitmap in C++ and
>> > only
>> > > > > avoid emitting it on the IPC side.  Which would be a slightly
>> > > > > weird
>> > > > > arrangement, but would limit incompatibilites on the C++ API
>> > > > > side.
>> > > >
>> > > > Actually, if we do this, the same problem will appear on the IPC
>> > > > write
>> > > > side (C++-created dense union arrays with a top-level null bitmap
>> > > > will
>> > > > need regenerating some of the child buffers).
>> > >
>> > > I see. Well I think we can shut down this issue by giving up on Union
>> > > forward compatibility V4 / pre-1.0 libraries.
>> > >
>> > > > Regards
>> > > >
>> > > > Antoine.
>> >
>

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by Wes McKinney <we...@gmail.com>.
I opened a PR https://github.com/apache/arrow/pull/7566

We should prioritize getting through the other format changes, but we
can vote on this in the meantime if there is consensus

On Fri, Jun 26, 2020 at 2:58 PM Micah Kornfield <em...@gmail.com> wrote:
>
> I agree I think we have to do this given the number of changes in flight
> (especially union types).
>
> On Fri, Jun 26, 2020 at 7:29 AM Wes McKinney <we...@gmail.com> wrote:
>
> > I created a JIRA about this
> >
> > https://issues.apache.org/jira/browse/ARROW-9231
> >
> > This issue is quite important so please take a look.
> >
> > On Thu, Jun 25, 2020 at 8:53 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > On Thu, Jun 25, 2020 at 5:31 AM Antoine Pitrou <an...@python.org>
> > wrote:
> > > >
> > > >
> > > > Le 25/06/2020 à 12:18, Antoine Pitrou a écrit :
> > > > >
> > > > > Le 25/06/2020 à 00:40, Wes McKinney a écrit :
> > > > >> hi folks,
> > > > >>
> > > > >> This has come up in some other contexts, but I believe it would be a
> > > > >> good idea to increment the version number in Schema.fbs starting
> > with
> > > > >> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
> > > > >>
> > > > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
> > > > >>
> > > > >> Given that we are contemplating a number of changes to assist with
> > > > >> forward compatibility and a breaking serialization change for
> > unions,
> > > > >> this would seem prudent so that we do not risk breaking
> > compatibility
> > > > >> with 0.17.1 and prior.
> > > > >>
> > > > >> Given that there are no major backwards incompatibilities, there
> > > > >> should be no problem with 1.0.0 readers reading data generated by
> > > > >> libraries <= 0.17.1.
> > > > >
> > > > > Actually, it seems that a dense array with top-level null values
> > > > > (represented in 0.17.1 fashion) would need non-trivial rewriting of
> > its
> > > > > offsets and child arrays (at least one child array) to represent the
> > > > > nulls at the child level.
> > > > >
> > > > > This is unless we keep the top-level union null bitmap in C++ and
> > only
> > > > > avoid emitting it on the IPC side.  Which would be a slightly weird
> > > > > arrangement, but would limit incompatibilites on the C++ API side.
> > > >
> > > > Actually, if we do this, the same problem will appear on the IPC write
> > > > side (C++-created dense union arrays with a top-level null bitmap will
> > > > need regenerating some of the child buffers).
> > >
> > > I see. Well I think we can shut down this issue by giving up on Union
> > > forward compatibility V4 / pre-1.0 libraries.
> > >
> > > > Regards
> > > >
> > > > Antoine.
> >

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by Micah Kornfield <em...@gmail.com>.
I agree I think we have to do this given the number of changes in flight
(especially union types).

On Fri, Jun 26, 2020 at 7:29 AM Wes McKinney <we...@gmail.com> wrote:

> I created a JIRA about this
>
> https://issues.apache.org/jira/browse/ARROW-9231
>
> This issue is quite important so please take a look.
>
> On Thu, Jun 25, 2020 at 8:53 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > On Thu, Jun 25, 2020 at 5:31 AM Antoine Pitrou <an...@python.org>
> wrote:
> > >
> > >
> > > Le 25/06/2020 à 12:18, Antoine Pitrou a écrit :
> > > >
> > > > Le 25/06/2020 à 00:40, Wes McKinney a écrit :
> > > >> hi folks,
> > > >>
> > > >> This has come up in some other contexts, but I believe it would be a
> > > >> good idea to increment the version number in Schema.fbs starting
> with
> > > >> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
> > > >>
> > > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
> > > >>
> > > >> Given that we are contemplating a number of changes to assist with
> > > >> forward compatibility and a breaking serialization change for
> unions,
> > > >> this would seem prudent so that we do not risk breaking
> compatibility
> > > >> with 0.17.1 and prior.
> > > >>
> > > >> Given that there are no major backwards incompatibilities, there
> > > >> should be no problem with 1.0.0 readers reading data generated by
> > > >> libraries <= 0.17.1.
> > > >
> > > > Actually, it seems that a dense array with top-level null values
> > > > (represented in 0.17.1 fashion) would need non-trivial rewriting of
> its
> > > > offsets and child arrays (at least one child array) to represent the
> > > > nulls at the child level.
> > > >
> > > > This is unless we keep the top-level union null bitmap in C++ and
> only
> > > > avoid emitting it on the IPC side.  Which would be a slightly weird
> > > > arrangement, but would limit incompatibilites on the C++ API side.
> > >
> > > Actually, if we do this, the same problem will appear on the IPC write
> > > side (C++-created dense union arrays with a top-level null bitmap will
> > > need regenerating some of the child buffers).
> >
> > I see. Well I think we can shut down this issue by giving up on Union
> > forward compatibility V4 / pre-1.0 libraries.
> >
> > > Regards
> > >
> > > Antoine.
>

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by Wes McKinney <we...@gmail.com>.
I created a JIRA about this

https://issues.apache.org/jira/browse/ARROW-9231

This issue is quite important so please take a look.

On Thu, Jun 25, 2020 at 8:53 AM Wes McKinney <we...@gmail.com> wrote:
>
> On Thu, Jun 25, 2020 at 5:31 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Le 25/06/2020 à 12:18, Antoine Pitrou a écrit :
> > >
> > > Le 25/06/2020 à 00:40, Wes McKinney a écrit :
> > >> hi folks,
> > >>
> > >> This has come up in some other contexts, but I believe it would be a
> > >> good idea to increment the version number in Schema.fbs starting with
> > >> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
> > >>
> > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
> > >>
> > >> Given that we are contemplating a number of changes to assist with
> > >> forward compatibility and a breaking serialization change for unions,
> > >> this would seem prudent so that we do not risk breaking compatibility
> > >> with 0.17.1 and prior.
> > >>
> > >> Given that there are no major backwards incompatibilities, there
> > >> should be no problem with 1.0.0 readers reading data generated by
> > >> libraries <= 0.17.1.
> > >
> > > Actually, it seems that a dense array with top-level null values
> > > (represented in 0.17.1 fashion) would need non-trivial rewriting of its
> > > offsets and child arrays (at least one child array) to represent the
> > > nulls at the child level.
> > >
> > > This is unless we keep the top-level union null bitmap in C++ and only
> > > avoid emitting it on the IPC side.  Which would be a slightly weird
> > > arrangement, but would limit incompatibilites on the C++ API side.
> >
> > Actually, if we do this, the same problem will appear on the IPC write
> > side (C++-created dense union arrays with a top-level null bitmap will
> > need regenerating some of the child buffers).
>
> I see. Well I think we can shut down this issue by giving up on Union
> forward compatibility V4 / pre-1.0 libraries.
>
> > Regards
> >
> > Antoine.

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by Wes McKinney <we...@gmail.com>.
On Thu, Jun 25, 2020 at 5:31 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Le 25/06/2020 à 12:18, Antoine Pitrou a écrit :
> >
> > Le 25/06/2020 à 00:40, Wes McKinney a écrit :
> >> hi folks,
> >>
> >> This has come up in some other contexts, but I believe it would be a
> >> good idea to increment the version number in Schema.fbs starting with
> >> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
> >>
> >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
> >>
> >> Given that we are contemplating a number of changes to assist with
> >> forward compatibility and a breaking serialization change for unions,
> >> this would seem prudent so that we do not risk breaking compatibility
> >> with 0.17.1 and prior.
> >>
> >> Given that there are no major backwards incompatibilities, there
> >> should be no problem with 1.0.0 readers reading data generated by
> >> libraries <= 0.17.1.
> >
> > Actually, it seems that a dense array with top-level null values
> > (represented in 0.17.1 fashion) would need non-trivial rewriting of its
> > offsets and child arrays (at least one child array) to represent the
> > nulls at the child level.
> >
> > This is unless we keep the top-level union null bitmap in C++ and only
> > avoid emitting it on the IPC side.  Which would be a slightly weird
> > arrangement, but would limit incompatibilites on the C++ API side.
>
> Actually, if we do this, the same problem will appear on the IPC write
> side (C++-created dense union arrays with a top-level null bitmap will
> need regenerating some of the child buffers).

I see. Well I think we can shut down this issue by giving up on Union
forward compatibility V4 / pre-1.0 libraries.

> Regards
>
> Antoine.

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by Antoine Pitrou <an...@python.org>.
Le 25/06/2020 à 12:18, Antoine Pitrou a écrit :
> 
> Le 25/06/2020 à 00:40, Wes McKinney a écrit :
>> hi folks,
>>
>> This has come up in some other contexts, but I believe it would be a
>> good idea to increment the version number in Schema.fbs starting with
>> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
>>
>> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
>>
>> Given that we are contemplating a number of changes to assist with
>> forward compatibility and a breaking serialization change for unions,
>> this would seem prudent so that we do not risk breaking compatibility
>> with 0.17.1 and prior.
>>
>> Given that there are no major backwards incompatibilities, there
>> should be no problem with 1.0.0 readers reading data generated by
>> libraries <= 0.17.1.
> 
> Actually, it seems that a dense array with top-level null values
> (represented in 0.17.1 fashion) would need non-trivial rewriting of its
> offsets and child arrays (at least one child array) to represent the
> nulls at the child level.
> 
> This is unless we keep the top-level union null bitmap in C++ and only
> avoid emitting it on the IPC side.  Which would be a slightly weird
> arrangement, but would limit incompatibilites on the C++ API side.

Actually, if we do this, the same problem will appear on the IPC write
side (C++-created dense union arrays with a top-level null bitmap will
need regenerating some of the child buffers).

Regards

Antoine.

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

Posted by Antoine Pitrou <an...@python.org>.
Le 25/06/2020 à 00:40, Wes McKinney a écrit :
> hi folks,
> 
> This has come up in some other contexts, but I believe it would be a
> good idea to increment the version number in Schema.fbs starting with
> 1.0.0 to separate the pre-1.0 and post-1.0 worlds
> 
> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22
> 
> Given that we are contemplating a number of changes to assist with
> forward compatibility and a breaking serialization change for unions,
> this would seem prudent so that we do not risk breaking compatibility
> with 0.17.1 and prior.
> 
> Given that there are no major backwards incompatibilities, there
> should be no problem with 1.0.0 readers reading data generated by
> libraries <= 0.17.1.

Actually, it seems that a dense array with top-level null values
(represented in 0.17.1 fashion) would need non-trivial rewriting of its
offsets and child arrays (at least one child array) to represent the
nulls at the child level.

This is unless we keep the top-level union null bitmap in C++ and only
avoid emitting it on the IPC side.  Which would be a slightly weird
arrangement, but would limit incompatibilites on the C++ API side.

Regards

Antoine.