You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2019/08/22 22:23:50 UTC

[RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

The vote carries with 4 binding +1 votes and 1 non-binding +1

I'll merge the specification patch later today and we can begin
working on implementations so we can get this done for 0.15.0

On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler <cu...@gmail.com> wrote:
>
> +1 (non-binding)
>
> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou <so...@pitrou.net> wrote:
>
> >
> > Sorry, had forgotten to send my vote on this.
> >
> > +1 from me.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > On Wed, 14 Aug 2019 17:42:33 -0500
> > Wes McKinney <we...@gmail.com> wrote:
> > > hi all,
> > >
> > > As we've been discussing [1], there is a need to introduce 4 bytes of
> > > padding into the preamble of the "encapsulated IPC message" format to
> > > ensure that the Flatbuffers metadata payload begins on an 8-byte
> > > aligned memory offset. The alternative to this would be for Arrow
> > > implementations where alignment is important (e.g. C or C++) to copy
> > > the metadata (which is not always small) into memory when it is
> > > unaligned.
> > >
> > > Micah has proposed to address this by adding a
> > > 4-byte "continuation" value at the beginning of the payload
> > > having the value 0xFFFFFFFF. The reason to do it this way is that
> > > old clients will see an invalid length (what is currently the
> > > first 4 bytes of the message -- a 32-bit little endian signed
> > > integer indicating the metadata length) rather than potentially
> > > crashing on a valid length. We also propose to expand the "end of
> > > stream" marker used in the stream and file format from 4 to 8
> > > bytes. This has the additional effect of aligning the file footer
> > > defined in File.fbs.
> > >
> > > This would be a backwards incompatible protocol change, so older Arrow
> > > libraries would not be able to read these new messages. Maintaining
> > > forward compatibility (reading data produced by older libraries) would
> > > be possible as we can reason that a value other than the continuation
> > > value was produced by an older library (and then validate the
> > > Flatbuffer message of course). Arrow implementations could offer a
> > > backward compatibility mode for the sake of old readers if they desire
> > > (this may also assist with testing).
> > >
> > > Additionally with this vote, we want to formally approve the change to
> > > the Arrow "file" format to always write the (new 8-byte) end-of-stream
> > > marker, which enables code that processes Arrow streams to safely read
> > > the file's internal messages as though they were a normal stream.
> > >
> > > The PR making these changes to the IPC documentation is here
> > >
> > > https://github.com/apache/arrow/pull/4951
> > >
> > > Please vote to accept these changes. This vote will be open for at
> > > least 72 hours
> > >
> > > [ ] +1 Adopt these Arrow protocol changes
> > > [ ] +0
> > > [ ] -1 I disagree because...
> > >
> > > Here is my vote: +1
> > >
> > > Thanks,
> > > Wes
> > >
> > > [1]:
> > https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> > >
> >
> >
> >
> >

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Neal Richardson <ne...@gmail.com>.
Nice work!

On Fri, Sep 13, 2019 at 10:43 AM Wes McKinney <we...@gmail.com> wrote:
>
> Integration tests are passing:
>
> https://travis-ci.org/apache/arrow/jobs/584361357
>
> I'm merging the feature branch into master. Thanks all
>
> On Thu, Sep 12, 2019 at 6:15 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > Thanks all!
> >
> > It looks like all the changes are in, so we need to see if the
> > integration tests pass before we can merge these changes into master
> >
> > On Thu, Sep 12, 2019 at 7:19 AM Sebastien Binet <bi...@cern.ch> wrote:
> > >
> > > hi there,
> > >
> > > On Thu, Sep 12, 2019 at 12:45 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > > Thanks Bryan.
> > > >
> > > > I merged the Java patch with the EOS change and submitted a C++ patch
> > > > which also updates the specification
> > > >
> > > > https://github.com/apache/arrow/pull/5361
> > > >
> > > > Let me know when the JS or C# patches are ready to go and I can merge
> > > > those.
> > > >
> > > > I updated https://issues.apache.org/jira/browse/ARROW-6545 to track
> > > > the Go change corresponding to this
> > > >
> > >
> > > done. (I think)
> > >
> > > -s

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Wes McKinney <we...@gmail.com>.
Integration tests are passing:

https://travis-ci.org/apache/arrow/jobs/584361357

I'm merging the feature branch into master. Thanks all

On Thu, Sep 12, 2019 at 6:15 PM Wes McKinney <we...@gmail.com> wrote:
>
> Thanks all!
>
> It looks like all the changes are in, so we need to see if the
> integration tests pass before we can merge these changes into master
>
> On Thu, Sep 12, 2019 at 7:19 AM Sebastien Binet <bi...@cern.ch> wrote:
> >
> > hi there,
> >
> > On Thu, Sep 12, 2019 at 12:45 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > > Thanks Bryan.
> > >
> > > I merged the Java patch with the EOS change and submitted a C++ patch
> > > which also updates the specification
> > >
> > > https://github.com/apache/arrow/pull/5361
> > >
> > > Let me know when the JS or C# patches are ready to go and I can merge
> > > those.
> > >
> > > I updated https://issues.apache.org/jira/browse/ARROW-6545 to track
> > > the Go change corresponding to this
> > >
> >
> > done. (I think)
> >
> > -s

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Wes McKinney <we...@gmail.com>.
Thanks all!

It looks like all the changes are in, so we need to see if the
integration tests pass before we can merge these changes into master

On Thu, Sep 12, 2019 at 7:19 AM Sebastien Binet <bi...@cern.ch> wrote:
>
> hi there,
>
> On Thu, Sep 12, 2019 at 12:45 AM Wes McKinney <we...@gmail.com> wrote:
>
> > Thanks Bryan.
> >
> > I merged the Java patch with the EOS change and submitted a C++ patch
> > which also updates the specification
> >
> > https://github.com/apache/arrow/pull/5361
> >
> > Let me know when the JS or C# patches are ready to go and I can merge
> > those.
> >
> > I updated https://issues.apache.org/jira/browse/ARROW-6545 to track
> > the Go change corresponding to this
> >
>
> done. (I think)
>
> -s

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Sebastien Binet <bi...@cern.ch>.
hi there,

On Thu, Sep 12, 2019 at 12:45 AM Wes McKinney <we...@gmail.com> wrote:

> Thanks Bryan.
>
> I merged the Java patch with the EOS change and submitted a C++ patch
> which also updates the specification
>
> https://github.com/apache/arrow/pull/5361
>
> Let me know when the JS or C# patches are ready to go and I can merge
> those.
>
> I updated https://issues.apache.org/jira/browse/ARROW-6545 to track
> the Go change corresponding to this
>

done. (I think)

-s

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Wes McKinney <we...@gmail.com>.
Thanks Bryan.

I merged the Java patch with the EOS change and submitted a C++ patch
which also updates the specification

https://github.com/apache/arrow/pull/5361

Let me know when the JS or C# patches are ready to go and I can merge those.

I updated https://issues.apache.org/jira/browse/ARROW-6545 to track
the Go change corresponding to this

On Wed, Sep 11, 2019 at 12:33 AM Bryan Cutler <cu...@gmail.com> wrote:
>
> I have the patch for the EOS with Java writers up here
> https://github.com/apache/arrow/pull/5345. Just to clarify, the EOS of
> {0xFFFFFFFF, 0x00000000} is used for both stream and file formats, in
> non-legacy writing mode.
>
> On Mon, Sep 9, 2019 at 8:01 PM Bryan Cutler <cu...@gmail.com> wrote:
>
> > Sounds good to me also and I don't think we need a vote either.
> >
> > On Sat, Sep 7, 2019 at 7:36 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> >
> >> +1 on this, I also don't think a vote is necessary as long as we make the
> >> change before 0.15.0
> >>
> >> On Saturday, September 7, 2019, Wes McKinney <we...@gmail.com> wrote:
> >>
> >> > I see, thank you for catching this nuance.
> >> >
> >> > I agree that using {0xFFFFFFFF, 0x00000000} for EOS will resolve the
> >> > issue while allowing implementations to be backwards compatible (i.e.
> >> > handling the 4-byte EOS from older payloads).
> >> >
> >> > I'm not sure that we need to have a vote about this, what do others
> >> think?
> >> >
> >> > On Sat, Sep 7, 2019 at 12:47 AM Ji Liu <ni...@aliyun.com.invalid>
> >> wrote:
> >> > >
> >> > > Hi all,
> >> > >
> >> > > During the java code review[1], seems there is a problem with the
> >> > current implementations(C++/Java etc) when reaching EOS, since the new
> >> > format EOS is 8 bytes and the reader only reads 4 bytes when reach the
> >> end
> >> > of stream, and the additional 4 bytes will not be read which cause
> >> problems
> >> > for following up readings.
> >> > >
> >> > > There are some optional suggestions[2] as below, we should reach
> >> > consistent and fix this problem before 0.15 release.
> >> > > i. For the new format, an 8-byte EOS token should look like
> >> {0xFFFFFFFF,
> >> > 0x00000000}, so we read the continuation token first, and then know to
> >> read
> >> > the next 4 bytes, which are then 0 to signal EOS.ii. Reader just
> >> remember
> >> > the state, so if it reads the continuation token from the beginning,
> >> then
> >> > read all 8 bytes at the end.
> >> > >
> >> > > Thanks,
> >> > > Ji Liu
> >> > >
> >> > > [1] https://github.com/apache/arrow/pull/5229
> >> > > [2] https://github.com/apache/arrow/pull/5229#discussion_r321715682
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > ------------------------------------------------------------------
> >> > > From:Eric Erhardt <Er...@microsoft.com>
> >> > > Send Time:2019年9月5日(星期四) 07:16
> >> > > To:dev@arrow.apache.org <de...@arrow.apache.org>; Ji Liu <
> >> > niki.lj@aliyun.com>
> >> > > Cc:emkornfield <em...@gmail.com>; Paul Taylor <
> >> ptaylor@apache.org>
> >> > > Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
> >> > 8-byte Flatbuffer alignment requirements (2nd vote)
> >> > >
> >> > > The C# PR is up.
> >> > >
> >> > > https://github.com/apache/arrow/pull/5280
> >> > >
> >> > > Eric
> >> > >
> >> > > -----Original Message-----
> >> > > From: Eric Erhardt <Er...@microsoft.com.INVALID>
> >> > > Sent: Wednesday, September 4, 2019 10:12 AM
> >> > > To: dev@arrow.apache.org; Ji Liu <ni...@aliyun.com>
> >> > > Cc: emkornfield <em...@gmail.com>; Paul Taylor <
> >> ptaylor@apache.org
> >> > >
> >> > > Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
> >> > 8-byte Flatbuffer alignment requirements (2nd vote)
> >> > >
> >> > > I'm working on a PR for the C# bindings. I hope to have it up in the
> >> > next day or two. Integration tests for C# would be a great addition at
> >> some
> >> > point - it's been on my backlog. For now I plan on manually testing it.
> >> > >
> >> > > -----Original Message-----
> >> > > From: Wes McKinney <we...@gmail.com>
> >> > > Sent: Tuesday, September 3, 2019 10:17 PM
> >> > > To: Ji Liu <ni...@aliyun.com>
> >> > > Cc: emkornfield <em...@gmail.com>; dev <de...@arrow.apache.org>;
> >> > Paul Taylor <pt...@apache.org>
> >> > > Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> >> > 8-byte Flatbuffer alignment requirements (2nd vote)
> >> > >
> >> > > hi folks,
> >> > >
> >> > > We now have patches up for Java, JS, and Go. How are we doing on the
> >> > code reviews for getting these in?
> >> > >
> >> > > Since C# implements the binary protocol, the C# developers might want
> >> to
> >> > look at this before the 0.15.0 release also. Absent integration tests
> >> it's
> >> > difficult to verify the C# library, though
> >> > >
> >> > > Thanks
> >> > >
> >> > > On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <ni...@aliyun.com> wrote:
> >> > > >
> >> > > > Here is the Java implementation
> >> > > >
> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >> > > > ub.com
> >> %2Fapache%2Farrow%2Fpull%2F5229&amp;data=02%7C01%7CEric.Erhardt%
> >> > > > 40microsoft.com
> >> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> >> > > >
> >> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=b87u5x8lLvfdnU5
> >> > > > 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3D&amp;reserved=0
> >> > > >
> >> > > > cc @Wes McKinney @emkornfield
> >> > > >
> >> > > > Thanks,
> >> > > > Ji Liu
> >> > > >
> >> > > > ------------------------------------------------------------------
> >> > > > From:Ji Liu <ni...@aliyun.com.INVALID> Send Time:2019年8月28日(星期三)
> >> > > > 17:34 To:emkornfield <em...@gmail.com>; dev
> >> > > > <de...@arrow.apache.org> Cc:Paul Taylor <pt...@apache.org>
> >> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> >> > > > 8-byte Flatbuffer alignment requirements (2nd vote)
> >> > > >
> >> > > > I could take the Java implementation and will take a close watch on
> >> > this issue in the next few days.
> >> > > >
> >> > > > Thanks,
> >> > > > Ji Liu
> >> > > >
> >> > > >
> >> > > > ------------------------------------------------------------------
> >> > > > From:Micah Kornfield <em...@gmail.com> Send
> >> Time:2019年8月28日(星期三)
> >> > > > 17:14 To:dev <de...@arrow.apache.org> Cc:Paul Taylor
> >> > > > <pt...@apache.org>
> >> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> >> > > > 8-byte Flatbuffer alignment requirements (2nd vote)
> >> > > >
> >> > > > I should have integration tests with 0.14.1 generated binaries in
> >> the
> >> > > > next few days.  I think the one remaining unassigned piece of work
> >> in
> >> > > > the Java implementation, i can take that up next if no one else gets
> >> > to it.
> >> > > >
> >> > > > On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > Here's the C++ changes
> >> > > > >
> >> > > > >
> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> >> > > > > thub.com
> >> %2Fapache%2Farrow%2Fpull%2F5211&amp;data=02%7C01%7CEric.Erha
> >> > > > > rdt%40microsoft.com
> >> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> >> > > > >
> >> 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=zWaHS8X
> >> > > > > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3D&amp;reserved=0
> >> > > > >
> >> > > > > I'm going to create a integration branch where we can merge each
> >> > > > > patch before merging to master
> >> > > > >
> >> > > > > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <wesmckinn@gmail.com
> >> >
> >> > wrote:
> >> > > > > >
> >> > > > > > It isn't implemented in C++ yet but I will try to get a patch up
> >> > > > > > for that soon (today maybe). I think we should create a branch
> >> > > > > > where we can stack the patches that implement this for each
> >> > language.
> >> > > > > >
> >> > > > > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor
> >> > > > > > <pt...@gmail.com>
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > I'll do the JS updates. Is it safe to validate against the
> >> Arrow
> >> > > > > > > C++ integration tests?
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> >> > > > > > > > I created
> >> > > > > > > >
> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> >> > > > > > > > F%2Fissues.apache.org
> >> %2Fjira%2Fbrowse%2FARROW-6313&amp;data=02
> >> > > > > > > > %7C01%7CEric.Erhardt%40microsoft.com
> >> %7C90f02600c4ce40ff5c9008d
> >> > > > > > > >
> >> 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> >> > > > > > > >
> >> 38512163816&amp;sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> >> > > > > > > > NiLxI%3D&amp;reserved=0 as a
> >> > > > > tracking
> >> > > > > > > > issue with sub-issues on the development work.  So far
> >> no-one
> >> > > > > > > > has
> >> > > > > claimed
> >> > > > > > > > Java and Javascript tasks.
> >> > > > > > > >
> >> > > > > > > > Would it make sense to have a separate dev branch for this
> >> > work?
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > > Micah
> >> > > > > > > >
> >> > > > > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney
> >> > > > > > > > <we...@gmail.com>
> >> > > > > wrote:
> >> > > > > > > >
> >> > > > > > > >> The vote carries with 4 binding +1 votes and 1 non-binding
> >> +1
> >> > > > > > > >>
> >> > > > > > > >> I'll merge the specification patch later today and we can
> >> > > > > > > >> begin working on implementations so we can get this done
> >> for
> >> > > > > > > >> 0.15.0
> >> > > > > > > >>
> >> > > > > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler
> >> > > > > > > >> <cu...@gmail.com>
> >> > > > > wrote:
> >> > > > > > > >>> +1 (non-binding)
> >> > > > > > > >>>
> >> > > > > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou
> >> > > > > > > >>> <so...@pitrou.net>
> >> > > > > > > >> wrote:
> >> > > > > > > >>>> Sorry, had forgotten to send my vote on this.
> >> > > > > > > >>>>
> >> > > > > > > >>>> +1 from me.
> >> > > > > > > >>>>
> >> > > > > > > >>>> Regards
> >> > > > > > > >>>>
> >> > > > > > > >>>> Antoine.
> >> > > > > > > >>>>
> >> > > > > > > >>>>
> >> > > > > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney
> >> > > > > > > >>>> <we...@gmail.com> wrote:
> >> > > > > > > >>>>> hi all,
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> As we've been discussing [1], there is a need to
> >> introduce
> >> > > > > > > >>>>> 4
> >> > > > > bytes of
> >> > > > > > > >>>>> padding into the preamble of the "encapsulated IPC
> >> message"
> >> > > > > format to
> >> > > > > > > >>>>> ensure that the Flatbuffers metadata payload begins on
> >> an
> >> > > > > > > >>>>> 8-byte aligned memory offset. The alternative to this
> >> > > > > > > >>>>> would be for Arrow implementations where alignment is
> >> > > > > > > >>>>> important (e.g. C or C++) to
> >> > > > > copy
> >> > > > > > > >>>>> the metadata (which is not always small) into memory
> >> when
> >> > > > > > > >>>>> it is unaligned.
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> Micah has proposed to address this by adding a 4-byte
> >> > > > > > > >>>>> "continuation" value at the beginning of the payload
> >> > > > > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this
> >> way
> >> > > > > > > >>>>> is that old clients will see an invalid length (what is
> >> > > > > > > >>>>> currently the first 4 bytes of the message -- a 32-bit
> >> > > > > > > >>>>> little endian signed integer indicating the metadata
> >> > > > > > > >>>>> length) rather than potentially crashing on a valid
> >> > > > > > > >>>>> length. We also propose to expand the "end of stream"
> >> > > > > > > >>>>> marker used in the stream and file format from 4 to 8
> >> > > > > > > >>>>> bytes. This has the additional effect of aligning the
> >> file
> >> > footer defined in File.fbs.
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> This would be a backwards incompatible protocol change,
> >> so
> >> > > > > > > >>>>> older
> >> > > > > > > >> Arrow
> >> > > > > > > >>>>> libraries would not be able to read these new messages.
> >> > > > > Maintaining
> >> > > > > > > >>>>> forward compatibility (reading data produced by older
> >> > > > > > > >>>>> libraries)
> >> > > > > > > >> would
> >> > > > > > > >>>>> be possible as we can reason that a value other than the
> >> > > > > continuation
> >> > > > > > > >>>>> value was produced by an older library (and then
> >> validate
> >> > > > > > > >>>>> the Flatbuffer message of course). Arrow implementations
> >> > > > > > > >>>>> could offer
> >> > > > > a
> >> > > > > > > >>>>> backward compatibility mode for the sake of old readers
> >> if
> >> > > > > > > >>>>> they
> >> > > > > > > >> desire
> >> > > > > > > >>>>> (this may also assist with testing).
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> Additionally with this vote, we want to formally approve
> >> > > > > > > >>>>> the
> >> > > > > change
> >> > > > > > > >> to
> >> > > > > > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> >> > > > > > > >> end-of-stream
> >> > > > > > > >>>>> marker, which enables code that processes Arrow streams
> >> to
> >> > > > > > > >>>>> safely
> >> > > > > > > >> read
> >> > > > > > > >>>>> the file's internal messages as though they were a
> >> normal
> >> > stream.
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> The PR making these changes to the IPC documentation is
> >> > > > > > > >>>>> here
> >> > > > > > > >>>>>
> >> > > > > > > >>>>>
> >> https://nam06.safelinks.protection.outlook.com/?url=https%
> >> > > > > > > >>>>>
> >> 3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4951&amp;data
> >> > > > > > > >>>>> =02%7C01%7CEric.Erhardt%40microsoft.com
> >> %7C90f02600c4ce40ff
> >> > > > > > > >>>>>
> >> 5c9008d730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> >> > > > > > > >>>>>
> >> 0%7C637031638512163816&amp;sdata=WF9uQ1d7GzHohv31%2BW3tl3I
> >> > > > > > > >>>>> vp9Uo9h6VYVoXu52umTE%3D&amp;reserved=0
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> Please vote to accept these changes. This vote will be
> >> > > > > > > >>>>> open for
> >> > > > > at
> >> > > > > > > >>>>> least 72 hours
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> [ ] +1 Adopt these Arrow protocol changes [ ] +0 [ ] -1
> >> I
> >> > > > > > > >>>>> disagree because...
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> Here is my vote: +1
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> Thanks,
> >> > > > > > > >>>>> Wes
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> [1]:
> >> > > > > > > >>
> >> > > > >
> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> >> > > > > sts.apache.org
> >> %2Fthread.html%2F8440be572c49b7b2ffb76b63e6d935ada9efd
> >> > > > > 9c1c2021369b6d27786%40%253Cdev.arrow.apache.org
> >> %253E&amp;data=02%7C0
> >> > > > > 1%7CEric.Erhardt%40microsoft.com
> >> %7C90f02600c4ce40ff5c9008d730e66b68%
> >> > > > >
> >> 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637031638512173773&amp;
> >> > > > >
> >> sdata=4y7ProY0ZDIqXAWYah6NS7TRZHGoYfZ6zMipdLV5ntk%3D&amp;reserved=0
> >> > > > > > > >>>>
> >> > > > > > > >>>>
> >> > > > > > > >>>>
> >> > > > >
> >> > >
> >> >
> >>
> >

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Bryan Cutler <cu...@gmail.com>.
I have the patch for the EOS with Java writers up here
https://github.com/apache/arrow/pull/5345. Just to clarify, the EOS of
{0xFFFFFFFF, 0x00000000} is used for both stream and file formats, in
non-legacy writing mode.

On Mon, Sep 9, 2019 at 8:01 PM Bryan Cutler <cu...@gmail.com> wrote:

> Sounds good to me also and I don't think we need a vote either.
>
> On Sat, Sep 7, 2019 at 7:36 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
>> +1 on this, I also don't think a vote is necessary as long as we make the
>> change before 0.15.0
>>
>> On Saturday, September 7, 2019, Wes McKinney <we...@gmail.com> wrote:
>>
>> > I see, thank you for catching this nuance.
>> >
>> > I agree that using {0xFFFFFFFF, 0x00000000} for EOS will resolve the
>> > issue while allowing implementations to be backwards compatible (i.e.
>> > handling the 4-byte EOS from older payloads).
>> >
>> > I'm not sure that we need to have a vote about this, what do others
>> think?
>> >
>> > On Sat, Sep 7, 2019 at 12:47 AM Ji Liu <ni...@aliyun.com.invalid>
>> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > During the java code review[1], seems there is a problem with the
>> > current implementations(C++/Java etc) when reaching EOS, since the new
>> > format EOS is 8 bytes and the reader only reads 4 bytes when reach the
>> end
>> > of stream, and the additional 4 bytes will not be read which cause
>> problems
>> > for following up readings.
>> > >
>> > > There are some optional suggestions[2] as below, we should reach
>> > consistent and fix this problem before 0.15 release.
>> > > i. For the new format, an 8-byte EOS token should look like
>> {0xFFFFFFFF,
>> > 0x00000000}, so we read the continuation token first, and then know to
>> read
>> > the next 4 bytes, which are then 0 to signal EOS.ii. Reader just
>> remember
>> > the state, so if it reads the continuation token from the beginning,
>> then
>> > read all 8 bytes at the end.
>> > >
>> > > Thanks,
>> > > Ji Liu
>> > >
>> > > [1] https://github.com/apache/arrow/pull/5229
>> > > [2] https://github.com/apache/arrow/pull/5229#discussion_r321715682
>> > >
>> > >
>> > >
>> > >
>> > > ------------------------------------------------------------------
>> > > From:Eric Erhardt <Er...@microsoft.com>
>> > > Send Time:2019年9月5日(星期四) 07:16
>> > > To:dev@arrow.apache.org <de...@arrow.apache.org>; Ji Liu <
>> > niki.lj@aliyun.com>
>> > > Cc:emkornfield <em...@gmail.com>; Paul Taylor <
>> ptaylor@apache.org>
>> > > Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > >
>> > > The C# PR is up.
>> > >
>> > > https://github.com/apache/arrow/pull/5280
>> > >
>> > > Eric
>> > >
>> > > -----Original Message-----
>> > > From: Eric Erhardt <Er...@microsoft.com.INVALID>
>> > > Sent: Wednesday, September 4, 2019 10:12 AM
>> > > To: dev@arrow.apache.org; Ji Liu <ni...@aliyun.com>
>> > > Cc: emkornfield <em...@gmail.com>; Paul Taylor <
>> ptaylor@apache.org
>> > >
>> > > Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > >
>> > > I'm working on a PR for the C# bindings. I hope to have it up in the
>> > next day or two. Integration tests for C# would be a great addition at
>> some
>> > point - it's been on my backlog. For now I plan on manually testing it.
>> > >
>> > > -----Original Message-----
>> > > From: Wes McKinney <we...@gmail.com>
>> > > Sent: Tuesday, September 3, 2019 10:17 PM
>> > > To: Ji Liu <ni...@aliyun.com>
>> > > Cc: emkornfield <em...@gmail.com>; dev <de...@arrow.apache.org>;
>> > Paul Taylor <pt...@apache.org>
>> > > Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > >
>> > > hi folks,
>> > >
>> > > We now have patches up for Java, JS, and Go. How are we doing on the
>> > code reviews for getting these in?
>> > >
>> > > Since C# implements the binary protocol, the C# developers might want
>> to
>> > look at this before the 0.15.0 release also. Absent integration tests
>> it's
>> > difficult to verify the C# library, though
>> > >
>> > > Thanks
>> > >
>> > > On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <ni...@aliyun.com> wrote:
>> > > >
>> > > > Here is the Java implementation
>> > > >
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>> > > > ub.com
>> %2Fapache%2Farrow%2Fpull%2F5229&amp;data=02%7C01%7CEric.Erhardt%
>> > > > 40microsoft.com
>> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
>> > > >
>> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=b87u5x8lLvfdnU5
>> > > > 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3D&amp;reserved=0
>> > > >
>> > > > cc @Wes McKinney @emkornfield
>> > > >
>> > > > Thanks,
>> > > > Ji Liu
>> > > >
>> > > > ------------------------------------------------------------------
>> > > > From:Ji Liu <ni...@aliyun.com.INVALID> Send Time:2019年8月28日(星期三)
>> > > > 17:34 To:emkornfield <em...@gmail.com>; dev
>> > > > <de...@arrow.apache.org> Cc:Paul Taylor <pt...@apache.org>
>> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > > > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > > >
>> > > > I could take the Java implementation and will take a close watch on
>> > this issue in the next few days.
>> > > >
>> > > > Thanks,
>> > > > Ji Liu
>> > > >
>> > > >
>> > > > ------------------------------------------------------------------
>> > > > From:Micah Kornfield <em...@gmail.com> Send
>> Time:2019年8月28日(星期三)
>> > > > 17:14 To:dev <de...@arrow.apache.org> Cc:Paul Taylor
>> > > > <pt...@apache.org>
>> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > > > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > > >
>> > > > I should have integration tests with 0.14.1 generated binaries in
>> the
>> > > > next few days.  I think the one remaining unassigned piece of work
>> in
>> > > > the Java implementation, i can take that up next if no one else gets
>> > to it.
>> > > >
>> > > > On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com>
>> > wrote:
>> > > >
>> > > > > Here's the C++ changes
>> > > > >
>> > > > >
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
>> > > > > thub.com
>> %2Fapache%2Farrow%2Fpull%2F5211&amp;data=02%7C01%7CEric.Erha
>> > > > > rdt%40microsoft.com
>> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
>> > > > >
>> 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=zWaHS8X
>> > > > > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3D&amp;reserved=0
>> > > > >
>> > > > > I'm going to create a integration branch where we can merge each
>> > > > > patch before merging to master
>> > > > >
>> > > > > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <wesmckinn@gmail.com
>> >
>> > wrote:
>> > > > > >
>> > > > > > It isn't implemented in C++ yet but I will try to get a patch up
>> > > > > > for that soon (today maybe). I think we should create a branch
>> > > > > > where we can stack the patches that implement this for each
>> > language.
>> > > > > >
>> > > > > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor
>> > > > > > <pt...@gmail.com>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > I'll do the JS updates. Is it safe to validate against the
>> Arrow
>> > > > > > > C++ integration tests?
>> > > > > > >
>> > > > > > >
>> > > > > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
>> > > > > > > > I created
>> > > > > > > >
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
>> > > > > > > > F%2Fissues.apache.org
>> %2Fjira%2Fbrowse%2FARROW-6313&amp;data=02
>> > > > > > > > %7C01%7CEric.Erhardt%40microsoft.com
>> %7C90f02600c4ce40ff5c9008d
>> > > > > > > >
>> 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
>> > > > > > > >
>> 38512163816&amp;sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
>> > > > > > > > NiLxI%3D&amp;reserved=0 as a
>> > > > > tracking
>> > > > > > > > issue with sub-issues on the development work.  So far
>> no-one
>> > > > > > > > has
>> > > > > claimed
>> > > > > > > > Java and Javascript tasks.
>> > > > > > > >
>> > > > > > > > Would it make sense to have a separate dev branch for this
>> > work?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Micah
>> > > > > > > >
>> > > > > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney
>> > > > > > > > <we...@gmail.com>
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > >> The vote carries with 4 binding +1 votes and 1 non-binding
>> +1
>> > > > > > > >>
>> > > > > > > >> I'll merge the specification patch later today and we can
>> > > > > > > >> begin working on implementations so we can get this done
>> for
>> > > > > > > >> 0.15.0
>> > > > > > > >>
>> > > > > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler
>> > > > > > > >> <cu...@gmail.com>
>> > > > > wrote:
>> > > > > > > >>> +1 (non-binding)
>> > > > > > > >>>
>> > > > > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou
>> > > > > > > >>> <so...@pitrou.net>
>> > > > > > > >> wrote:
>> > > > > > > >>>> Sorry, had forgotten to send my vote on this.
>> > > > > > > >>>>
>> > > > > > > >>>> +1 from me.
>> > > > > > > >>>>
>> > > > > > > >>>> Regards
>> > > > > > > >>>>
>> > > > > > > >>>> Antoine.
>> > > > > > > >>>>
>> > > > > > > >>>>
>> > > > > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney
>> > > > > > > >>>> <we...@gmail.com> wrote:
>> > > > > > > >>>>> hi all,
>> > > > > > > >>>>>
>> > > > > > > >>>>> As we've been discussing [1], there is a need to
>> introduce
>> > > > > > > >>>>> 4
>> > > > > bytes of
>> > > > > > > >>>>> padding into the preamble of the "encapsulated IPC
>> message"
>> > > > > format to
>> > > > > > > >>>>> ensure that the Flatbuffers metadata payload begins on
>> an
>> > > > > > > >>>>> 8-byte aligned memory offset. The alternative to this
>> > > > > > > >>>>> would be for Arrow implementations where alignment is
>> > > > > > > >>>>> important (e.g. C or C++) to
>> > > > > copy
>> > > > > > > >>>>> the metadata (which is not always small) into memory
>> when
>> > > > > > > >>>>> it is unaligned.
>> > > > > > > >>>>>
>> > > > > > > >>>>> Micah has proposed to address this by adding a 4-byte
>> > > > > > > >>>>> "continuation" value at the beginning of the payload
>> > > > > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this
>> way
>> > > > > > > >>>>> is that old clients will see an invalid length (what is
>> > > > > > > >>>>> currently the first 4 bytes of the message -- a 32-bit
>> > > > > > > >>>>> little endian signed integer indicating the metadata
>> > > > > > > >>>>> length) rather than potentially crashing on a valid
>> > > > > > > >>>>> length. We also propose to expand the "end of stream"
>> > > > > > > >>>>> marker used in the stream and file format from 4 to 8
>> > > > > > > >>>>> bytes. This has the additional effect of aligning the
>> file
>> > footer defined in File.fbs.
>> > > > > > > >>>>>
>> > > > > > > >>>>> This would be a backwards incompatible protocol change,
>> so
>> > > > > > > >>>>> older
>> > > > > > > >> Arrow
>> > > > > > > >>>>> libraries would not be able to read these new messages.
>> > > > > Maintaining
>> > > > > > > >>>>> forward compatibility (reading data produced by older
>> > > > > > > >>>>> libraries)
>> > > > > > > >> would
>> > > > > > > >>>>> be possible as we can reason that a value other than the
>> > > > > continuation
>> > > > > > > >>>>> value was produced by an older library (and then
>> validate
>> > > > > > > >>>>> the Flatbuffer message of course). Arrow implementations
>> > > > > > > >>>>> could offer
>> > > > > a
>> > > > > > > >>>>> backward compatibility mode for the sake of old readers
>> if
>> > > > > > > >>>>> they
>> > > > > > > >> desire
>> > > > > > > >>>>> (this may also assist with testing).
>> > > > > > > >>>>>
>> > > > > > > >>>>> Additionally with this vote, we want to formally approve
>> > > > > > > >>>>> the
>> > > > > change
>> > > > > > > >> to
>> > > > > > > >>>>> the Arrow "file" format to always write the (new 8-byte)
>> > > > > > > >> end-of-stream
>> > > > > > > >>>>> marker, which enables code that processes Arrow streams
>> to
>> > > > > > > >>>>> safely
>> > > > > > > >> read
>> > > > > > > >>>>> the file's internal messages as though they were a
>> normal
>> > stream.
>> > > > > > > >>>>>
>> > > > > > > >>>>> The PR making these changes to the IPC documentation is
>> > > > > > > >>>>> here
>> > > > > > > >>>>>
>> > > > > > > >>>>>
>> https://nam06.safelinks.protection.outlook.com/?url=https%
>> > > > > > > >>>>>
>> 3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4951&amp;data
>> > > > > > > >>>>> =02%7C01%7CEric.Erhardt%40microsoft.com
>> %7C90f02600c4ce40ff
>> > > > > > > >>>>>
>> 5c9008d730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
>> > > > > > > >>>>>
>> 0%7C637031638512163816&amp;sdata=WF9uQ1d7GzHohv31%2BW3tl3I
>> > > > > > > >>>>> vp9Uo9h6VYVoXu52umTE%3D&amp;reserved=0
>> > > > > > > >>>>>
>> > > > > > > >>>>> Please vote to accept these changes. This vote will be
>> > > > > > > >>>>> open for
>> > > > > at
>> > > > > > > >>>>> least 72 hours
>> > > > > > > >>>>>
>> > > > > > > >>>>> [ ] +1 Adopt these Arrow protocol changes [ ] +0 [ ] -1
>> I
>> > > > > > > >>>>> disagree because...
>> > > > > > > >>>>>
>> > > > > > > >>>>> Here is my vote: +1
>> > > > > > > >>>>>
>> > > > > > > >>>>> Thanks,
>> > > > > > > >>>>> Wes
>> > > > > > > >>>>>
>> > > > > > > >>>>> [1]:
>> > > > > > > >>
>> > > > >
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
>> > > > > sts.apache.org
>> %2Fthread.html%2F8440be572c49b7b2ffb76b63e6d935ada9efd
>> > > > > 9c1c2021369b6d27786%40%253Cdev.arrow.apache.org
>> %253E&amp;data=02%7C0
>> > > > > 1%7CEric.Erhardt%40microsoft.com
>> %7C90f02600c4ce40ff5c9008d730e66b68%
>> > > > >
>> 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637031638512173773&amp;
>> > > > >
>> sdata=4y7ProY0ZDIqXAWYah6NS7TRZHGoYfZ6zMipdLV5ntk%3D&amp;reserved=0
>> > > > > > > >>>>
>> > > > > > > >>>>
>> > > > > > > >>>>
>> > > > >
>> > >
>> >
>>
>

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Bryan Cutler <cu...@gmail.com>.
Sounds good to me also and I don't think we need a vote either.

On Sat, Sep 7, 2019 at 7:36 PM Micah Kornfield <em...@gmail.com>
wrote:

> +1 on this, I also don't think a vote is necessary as long as we make the
> change before 0.15.0
>
> On Saturday, September 7, 2019, Wes McKinney <we...@gmail.com> wrote:
>
> > I see, thank you for catching this nuance.
> >
> > I agree that using {0xFFFFFFFF, 0x00000000} for EOS will resolve the
> > issue while allowing implementations to be backwards compatible (i.e.
> > handling the 4-byte EOS from older payloads).
> >
> > I'm not sure that we need to have a vote about this, what do others
> think?
> >
> > On Sat, Sep 7, 2019 at 12:47 AM Ji Liu <ni...@aliyun.com.invalid>
> wrote:
> > >
> > > Hi all,
> > >
> > > During the java code review[1], seems there is a problem with the
> > current implementations(C++/Java etc) when reaching EOS, since the new
> > format EOS is 8 bytes and the reader only reads 4 bytes when reach the
> end
> > of stream, and the additional 4 bytes will not be read which cause
> problems
> > for following up readings.
> > >
> > > There are some optional suggestions[2] as below, we should reach
> > consistent and fix this problem before 0.15 release.
> > > i. For the new format, an 8-byte EOS token should look like
> {0xFFFFFFFF,
> > 0x00000000}, so we read the continuation token first, and then know to
> read
> > the next 4 bytes, which are then 0 to signal EOS.ii. Reader just remember
> > the state, so if it reads the continuation token from the beginning, then
> > read all 8 bytes at the end.
> > >
> > > Thanks,
> > > Ji Liu
> > >
> > > [1] https://github.com/apache/arrow/pull/5229
> > > [2] https://github.com/apache/arrow/pull/5229#discussion_r321715682
> > >
> > >
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Eric Erhardt <Er...@microsoft.com>
> > > Send Time:2019年9月5日(星期四) 07:16
> > > To:dev@arrow.apache.org <de...@arrow.apache.org>; Ji Liu <
> > niki.lj@aliyun.com>
> > > Cc:emkornfield <em...@gmail.com>; Paul Taylor <
> ptaylor@apache.org>
> > > Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > 8-byte Flatbuffer alignment requirements (2nd vote)
> > >
> > > The C# PR is up.
> > >
> > > https://github.com/apache/arrow/pull/5280
> > >
> > > Eric
> > >
> > > -----Original Message-----
> > > From: Eric Erhardt <Er...@microsoft.com.INVALID>
> > > Sent: Wednesday, September 4, 2019 10:12 AM
> > > To: dev@arrow.apache.org; Ji Liu <ni...@aliyun.com>
> > > Cc: emkornfield <em...@gmail.com>; Paul Taylor <
> ptaylor@apache.org
> > >
> > > Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > 8-byte Flatbuffer alignment requirements (2nd vote)
> > >
> > > I'm working on a PR for the C# bindings. I hope to have it up in the
> > next day or two. Integration tests for C# would be a great addition at
> some
> > point - it's been on my backlog. For now I plan on manually testing it.
> > >
> > > -----Original Message-----
> > > From: Wes McKinney <we...@gmail.com>
> > > Sent: Tuesday, September 3, 2019 10:17 PM
> > > To: Ji Liu <ni...@aliyun.com>
> > > Cc: emkornfield <em...@gmail.com>; dev <de...@arrow.apache.org>;
> > Paul Taylor <pt...@apache.org>
> > > Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > 8-byte Flatbuffer alignment requirements (2nd vote)
> > >
> > > hi folks,
> > >
> > > We now have patches up for Java, JS, and Go. How are we doing on the
> > code reviews for getting these in?
> > >
> > > Since C# implements the binary protocol, the C# developers might want
> to
> > look at this before the 0.15.0 release also. Absent integration tests
> it's
> > difficult to verify the C# library, though
> > >
> > > Thanks
> > >
> > > On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <ni...@aliyun.com> wrote:
> > > >
> > > > Here is the Java implementation
> > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > > > ub.com
> %2Fapache%2Farrow%2Fpull%2F5229&amp;data=02%7C01%7CEric.Erhardt%
> > > > 40microsoft.com
> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> > > >
> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=b87u5x8lLvfdnU5
> > > > 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3D&amp;reserved=0
> > > >
> > > > cc @Wes McKinney @emkornfield
> > > >
> > > > Thanks,
> > > > Ji Liu
> > > >
> > > > ------------------------------------------------------------------
> > > > From:Ji Liu <ni...@aliyun.com.INVALID> Send Time:2019年8月28日(星期三)
> > > > 17:34 To:emkornfield <em...@gmail.com>; dev
> > > > <de...@arrow.apache.org> Cc:Paul Taylor <pt...@apache.org>
> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > > > 8-byte Flatbuffer alignment requirements (2nd vote)
> > > >
> > > > I could take the Java implementation and will take a close watch on
> > this issue in the next few days.
> > > >
> > > > Thanks,
> > > > Ji Liu
> > > >
> > > >
> > > > ------------------------------------------------------------------
> > > > From:Micah Kornfield <em...@gmail.com> Send
> Time:2019年8月28日(星期三)
> > > > 17:14 To:dev <de...@arrow.apache.org> Cc:Paul Taylor
> > > > <pt...@apache.org>
> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > > > 8-byte Flatbuffer alignment requirements (2nd vote)
> > > >
> > > > I should have integration tests with 0.14.1 generated binaries in the
> > > > next few days.  I think the one remaining unassigned piece of work in
> > > > the Java implementation, i can take that up next if no one else gets
> > to it.
> > > >
> > > > On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com>
> > wrote:
> > > >
> > > > > Here's the C++ changes
> > > > >
> > > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > > > thub.com
> %2Fapache%2Farrow%2Fpull%2F5211&amp;data=02%7C01%7CEric.Erha
> > > > > rdt%40microsoft.com
> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> > > > >
> 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=zWaHS8X
> > > > > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3D&amp;reserved=0
> > > > >
> > > > > I'm going to create a integration branch where we can merge each
> > > > > patch before merging to master
> > > > >
> > > > > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com>
> > wrote:
> > > > > >
> > > > > > It isn't implemented in C++ yet but I will try to get a patch up
> > > > > > for that soon (today maybe). I think we should create a branch
> > > > > > where we can stack the patches that implement this for each
> > language.
> > > > > >
> > > > > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor
> > > > > > <pt...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > I'll do the JS updates. Is it safe to validate against the
> Arrow
> > > > > > > C++ integration tests?
> > > > > > >
> > > > > > >
> > > > > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > > > > I created
> > > > > > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > > > > F%2Fissues.apache.org
> %2Fjira%2Fbrowse%2FARROW-6313&amp;data=02
> > > > > > > > %7C01%7CEric.Erhardt%40microsoft.com
> %7C90f02600c4ce40ff5c9008d
> > > > > > > >
> 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> > > > > > > >
> 38512163816&amp;sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> > > > > > > > NiLxI%3D&amp;reserved=0 as a
> > > > > tracking
> > > > > > > > issue with sub-issues on the development work.  So far no-one
> > > > > > > > has
> > > > > claimed
> > > > > > > > Java and Javascript tasks.
> > > > > > > >
> > > > > > > > Would it make sense to have a separate dev branch for this
> > work?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Micah
> > > > > > > >
> > > > > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney
> > > > > > > > <we...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > >> The vote carries with 4 binding +1 votes and 1 non-binding
> +1
> > > > > > > >>
> > > > > > > >> I'll merge the specification patch later today and we can
> > > > > > > >> begin working on implementations so we can get this done for
> > > > > > > >> 0.15.0
> > > > > > > >>
> > > > > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler
> > > > > > > >> <cu...@gmail.com>
> > > > > wrote:
> > > > > > > >>> +1 (non-binding)
> > > > > > > >>>
> > > > > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou
> > > > > > > >>> <so...@pitrou.net>
> > > > > > > >> wrote:
> > > > > > > >>>> Sorry, had forgotten to send my vote on this.
> > > > > > > >>>>
> > > > > > > >>>> +1 from me.
> > > > > > > >>>>
> > > > > > > >>>> Regards
> > > > > > > >>>>
> > > > > > > >>>> Antoine.
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney
> > > > > > > >>>> <we...@gmail.com> wrote:
> > > > > > > >>>>> hi all,
> > > > > > > >>>>>
> > > > > > > >>>>> As we've been discussing [1], there is a need to
> introduce
> > > > > > > >>>>> 4
> > > > > bytes of
> > > > > > > >>>>> padding into the preamble of the "encapsulated IPC
> message"
> > > > > format to
> > > > > > > >>>>> ensure that the Flatbuffers metadata payload begins on an
> > > > > > > >>>>> 8-byte aligned memory offset. The alternative to this
> > > > > > > >>>>> would be for Arrow implementations where alignment is
> > > > > > > >>>>> important (e.g. C or C++) to
> > > > > copy
> > > > > > > >>>>> the metadata (which is not always small) into memory when
> > > > > > > >>>>> it is unaligned.
> > > > > > > >>>>>
> > > > > > > >>>>> Micah has proposed to address this by adding a 4-byte
> > > > > > > >>>>> "continuation" value at the beginning of the payload
> > > > > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way
> > > > > > > >>>>> is that old clients will see an invalid length (what is
> > > > > > > >>>>> currently the first 4 bytes of the message -- a 32-bit
> > > > > > > >>>>> little endian signed integer indicating the metadata
> > > > > > > >>>>> length) rather than potentially crashing on a valid
> > > > > > > >>>>> length. We also propose to expand the "end of stream"
> > > > > > > >>>>> marker used in the stream and file format from 4 to 8
> > > > > > > >>>>> bytes. This has the additional effect of aligning the
> file
> > footer defined in File.fbs.
> > > > > > > >>>>>
> > > > > > > >>>>> This would be a backwards incompatible protocol change,
> so
> > > > > > > >>>>> older
> > > > > > > >> Arrow
> > > > > > > >>>>> libraries would not be able to read these new messages.
> > > > > Maintaining
> > > > > > > >>>>> forward compatibility (reading data produced by older
> > > > > > > >>>>> libraries)
> > > > > > > >> would
> > > > > > > >>>>> be possible as we can reason that a value other than the
> > > > > continuation
> > > > > > > >>>>> value was produced by an older library (and then validate
> > > > > > > >>>>> the Flatbuffer message of course). Arrow implementations
> > > > > > > >>>>> could offer
> > > > > a
> > > > > > > >>>>> backward compatibility mode for the sake of old readers
> if
> > > > > > > >>>>> they
> > > > > > > >> desire
> > > > > > > >>>>> (this may also assist with testing).
> > > > > > > >>>>>
> > > > > > > >>>>> Additionally with this vote, we want to formally approve
> > > > > > > >>>>> the
> > > > > change
> > > > > > > >> to
> > > > > > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > > > > > >> end-of-stream
> > > > > > > >>>>> marker, which enables code that processes Arrow streams
> to
> > > > > > > >>>>> safely
> > > > > > > >> read
> > > > > > > >>>>> the file's internal messages as though they were a normal
> > stream.
> > > > > > > >>>>>
> > > > > > > >>>>> The PR making these changes to the IPC documentation is
> > > > > > > >>>>> here
> > > > > > > >>>>>
> > > > > > > >>>>>
> https://nam06.safelinks.protection.outlook.com/?url=https%
> > > > > > > >>>>>
> 3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4951&amp;data
> > > > > > > >>>>> =02%7C01%7CEric.Erhardt%40microsoft.com
> %7C90f02600c4ce40ff
> > > > > > > >>>>>
> 5c9008d730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> > > > > > > >>>>>
> 0%7C637031638512163816&amp;sdata=WF9uQ1d7GzHohv31%2BW3tl3I
> > > > > > > >>>>> vp9Uo9h6VYVoXu52umTE%3D&amp;reserved=0
> > > > > > > >>>>>
> > > > > > > >>>>> Please vote to accept these changes. This vote will be
> > > > > > > >>>>> open for
> > > > > at
> > > > > > > >>>>> least 72 hours
> > > > > > > >>>>>
> > > > > > > >>>>> [ ] +1 Adopt these Arrow protocol changes [ ] +0 [ ] -1 I
> > > > > > > >>>>> disagree because...
> > > > > > > >>>>>
> > > > > > > >>>>> Here is my vote: +1
> > > > > > > >>>>>
> > > > > > > >>>>> Thanks,
> > > > > > > >>>>> Wes
> > > > > > > >>>>>
> > > > > > > >>>>> [1]:
> > > > > > > >>
> > > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > > > > sts.apache.org
> %2Fthread.html%2F8440be572c49b7b2ffb76b63e6d935ada9efd
> > > > > 9c1c2021369b6d27786%40%253Cdev.arrow.apache.org
> %253E&amp;data=02%7C0
> > > > > 1%7CEric.Erhardt%40microsoft.com
> %7C90f02600c4ce40ff5c9008d730e66b68%
> > > > >
> 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637031638512173773&amp;
> > > > > sdata=4y7ProY0ZDIqXAWYah6NS7TRZHGoYfZ6zMipdLV5ntk%3D&amp;reserved=0
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > >
> > >
> >
>

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Micah Kornfield <em...@gmail.com>.
+1 on this, I also don't think a vote is necessary as long as we make the
change before 0.15.0

On Saturday, September 7, 2019, Wes McKinney <we...@gmail.com> wrote:

> I see, thank you for catching this nuance.
>
> I agree that using {0xFFFFFFFF, 0x00000000} for EOS will resolve the
> issue while allowing implementations to be backwards compatible (i.e.
> handling the 4-byte EOS from older payloads).
>
> I'm not sure that we need to have a vote about this, what do others think?
>
> On Sat, Sep 7, 2019 at 12:47 AM Ji Liu <ni...@aliyun.com.invalid> wrote:
> >
> > Hi all,
> >
> > During the java code review[1], seems there is a problem with the
> current implementations(C++/Java etc) when reaching EOS, since the new
> format EOS is 8 bytes and the reader only reads 4 bytes when reach the end
> of stream, and the additional 4 bytes will not be read which cause problems
> for following up readings.
> >
> > There are some optional suggestions[2] as below, we should reach
> consistent and fix this problem before 0.15 release.
> > i. For the new format, an 8-byte EOS token should look like {0xFFFFFFFF,
> 0x00000000}, so we read the continuation token first, and then know to read
> the next 4 bytes, which are then 0 to signal EOS.ii. Reader just remember
> the state, so if it reads the continuation token from the beginning, then
> read all 8 bytes at the end.
> >
> > Thanks,
> > Ji Liu
> >
> > [1] https://github.com/apache/arrow/pull/5229
> > [2] https://github.com/apache/arrow/pull/5229#discussion_r321715682
> >
> >
> >
> >
> > ------------------------------------------------------------------
> > From:Eric Erhardt <Er...@microsoft.com>
> > Send Time:2019年9月5日(星期四) 07:16
> > To:dev@arrow.apache.org <de...@arrow.apache.org>; Ji Liu <
> niki.lj@aliyun.com>
> > Cc:emkornfield <em...@gmail.com>; Paul Taylor <pt...@apache.org>
> > Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
> 8-byte Flatbuffer alignment requirements (2nd vote)
> >
> > The C# PR is up.
> >
> > https://github.com/apache/arrow/pull/5280
> >
> > Eric
> >
> > -----Original Message-----
> > From: Eric Erhardt <Er...@microsoft.com.INVALID>
> > Sent: Wednesday, September 4, 2019 10:12 AM
> > To: dev@arrow.apache.org; Ji Liu <ni...@aliyun.com>
> > Cc: emkornfield <em...@gmail.com>; Paul Taylor <ptaylor@apache.org
> >
> > Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
> 8-byte Flatbuffer alignment requirements (2nd vote)
> >
> > I'm working on a PR for the C# bindings. I hope to have it up in the
> next day or two. Integration tests for C# would be a great addition at some
> point - it's been on my backlog. For now I plan on manually testing it.
> >
> > -----Original Message-----
> > From: Wes McKinney <we...@gmail.com>
> > Sent: Tuesday, September 3, 2019 10:17 PM
> > To: Ji Liu <ni...@aliyun.com>
> > Cc: emkornfield <em...@gmail.com>; dev <de...@arrow.apache.org>;
> Paul Taylor <pt...@apache.org>
> > Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> 8-byte Flatbuffer alignment requirements (2nd vote)
> >
> > hi folks,
> >
> > We now have patches up for Java, JS, and Go. How are we doing on the
> code reviews for getting these in?
> >
> > Since C# implements the binary protocol, the C# developers might want to
> look at this before the 0.15.0 release also. Absent integration tests it's
> difficult to verify the C# library, though
> >
> > Thanks
> >
> > On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <ni...@aliyun.com> wrote:
> > >
> > > Here is the Java implementation
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > > ub.com%2Fapache%2Farrow%2Fpull%2F5229&amp;data=02%7C01%7CEric.Erhardt%
> > > 40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> > > 1ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=b87u5x8lLvfdnU5
> > > 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3D&amp;reserved=0
> > >
> > > cc @Wes McKinney @emkornfield
> > >
> > > Thanks,
> > > Ji Liu
> > >
> > > ------------------------------------------------------------------
> > > From:Ji Liu <ni...@aliyun.com.INVALID> Send Time:2019年8月28日(星期三)
> > > 17:34 To:emkornfield <em...@gmail.com>; dev
> > > <de...@arrow.apache.org> Cc:Paul Taylor <pt...@apache.org>
> > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > > 8-byte Flatbuffer alignment requirements (2nd vote)
> > >
> > > I could take the Java implementation and will take a close watch on
> this issue in the next few days.
> > >
> > > Thanks,
> > > Ji Liu
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Micah Kornfield <em...@gmail.com> Send Time:2019年8月28日(星期三)
> > > 17:14 To:dev <de...@arrow.apache.org> Cc:Paul Taylor
> > > <pt...@apache.org>
> > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > > 8-byte Flatbuffer alignment requirements (2nd vote)
> > >
> > > I should have integration tests with 0.14.1 generated binaries in the
> > > next few days.  I think the one remaining unassigned piece of work in
> > > the Java implementation, i can take that up next if no one else gets
> to it.
> > >
> > > On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > > Here's the C++ changes
> > > >
> > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > > thub.com%2Fapache%2Farrow%2Fpull%2F5211&amp;data=02%7C01%7CEric.Erha
> > > > rdt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> > > > 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=zWaHS8X
> > > > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3D&amp;reserved=0
> > > >
> > > > I'm going to create a integration branch where we can merge each
> > > > patch before merging to master
> > > >
> > > > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com>
> wrote:
> > > > >
> > > > > It isn't implemented in C++ yet but I will try to get a patch up
> > > > > for that soon (today maybe). I think we should create a branch
> > > > > where we can stack the patches that implement this for each
> language.
> > > > >
> > > > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor
> > > > > <pt...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > I'll do the JS updates. Is it safe to validate against the Arrow
> > > > > > C++ integration tests?
> > > > > >
> > > > > >
> > > > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > > > I created
> > > > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > > > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-6313&amp;data=02
> > > > > > > %7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d
> > > > > > > 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> > > > > > > 38512163816&amp;sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> > > > > > > NiLxI%3D&amp;reserved=0 as a
> > > > tracking
> > > > > > > issue with sub-issues on the development work.  So far no-one
> > > > > > > has
> > > > claimed
> > > > > > > Java and Javascript tasks.
> > > > > > >
> > > > > > > Would it make sense to have a separate dev branch for this
> work?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Micah
> > > > > > >
> > > > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney
> > > > > > > <we...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > > > > >>
> > > > > > >> I'll merge the specification patch later today and we can
> > > > > > >> begin working on implementations so we can get this done for
> > > > > > >> 0.15.0
> > > > > > >>
> > > > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler
> > > > > > >> <cu...@gmail.com>
> > > > wrote:
> > > > > > >>> +1 (non-binding)
> > > > > > >>>
> > > > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou
> > > > > > >>> <so...@pitrou.net>
> > > > > > >> wrote:
> > > > > > >>>> Sorry, had forgotten to send my vote on this.
> > > > > > >>>>
> > > > > > >>>> +1 from me.
> > > > > > >>>>
> > > > > > >>>> Regards
> > > > > > >>>>
> > > > > > >>>> Antoine.
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney
> > > > > > >>>> <we...@gmail.com> wrote:
> > > > > > >>>>> hi all,
> > > > > > >>>>>
> > > > > > >>>>> As we've been discussing [1], there is a need to introduce
> > > > > > >>>>> 4
> > > > bytes of
> > > > > > >>>>> padding into the preamble of the "encapsulated IPC message"
> > > > format to
> > > > > > >>>>> ensure that the Flatbuffers metadata payload begins on an
> > > > > > >>>>> 8-byte aligned memory offset. The alternative to this
> > > > > > >>>>> would be for Arrow implementations where alignment is
> > > > > > >>>>> important (e.g. C or C++) to
> > > > copy
> > > > > > >>>>> the metadata (which is not always small) into memory when
> > > > > > >>>>> it is unaligned.
> > > > > > >>>>>
> > > > > > >>>>> Micah has proposed to address this by adding a 4-byte
> > > > > > >>>>> "continuation" value at the beginning of the payload
> > > > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way
> > > > > > >>>>> is that old clients will see an invalid length (what is
> > > > > > >>>>> currently the first 4 bytes of the message -- a 32-bit
> > > > > > >>>>> little endian signed integer indicating the metadata
> > > > > > >>>>> length) rather than potentially crashing on a valid
> > > > > > >>>>> length. We also propose to expand the "end of stream"
> > > > > > >>>>> marker used in the stream and file format from 4 to 8
> > > > > > >>>>> bytes. This has the additional effect of aligning the file
> footer defined in File.fbs.
> > > > > > >>>>>
> > > > > > >>>>> This would be a backwards incompatible protocol change, so
> > > > > > >>>>> older
> > > > > > >> Arrow
> > > > > > >>>>> libraries would not be able to read these new messages.
> > > > Maintaining
> > > > > > >>>>> forward compatibility (reading data produced by older
> > > > > > >>>>> libraries)
> > > > > > >> would
> > > > > > >>>>> be possible as we can reason that a value other than the
> > > > continuation
> > > > > > >>>>> value was produced by an older library (and then validate
> > > > > > >>>>> the Flatbuffer message of course). Arrow implementations
> > > > > > >>>>> could offer
> > > > a
> > > > > > >>>>> backward compatibility mode for the sake of old readers if
> > > > > > >>>>> they
> > > > > > >> desire
> > > > > > >>>>> (this may also assist with testing).
> > > > > > >>>>>
> > > > > > >>>>> Additionally with this vote, we want to formally approve
> > > > > > >>>>> the
> > > > change
> > > > > > >> to
> > > > > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > > > > >> end-of-stream
> > > > > > >>>>> marker, which enables code that processes Arrow streams to
> > > > > > >>>>> safely
> > > > > > >> read
> > > > > > >>>>> the file's internal messages as though they were a normal
> stream.
> > > > > > >>>>>
> > > > > > >>>>> The PR making these changes to the IPC documentation is
> > > > > > >>>>> here
> > > > > > >>>>>
> > > > > > >>>>> https://nam06.safelinks.protection.outlook.com/?url=https%
> > > > > > >>>>> 3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4951&amp;data
> > > > > > >>>>> =02%7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff
> > > > > > >>>>> 5c9008d730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> > > > > > >>>>> 0%7C637031638512163816&amp;sdata=WF9uQ1d7GzHohv31%2BW3tl3I
> > > > > > >>>>> vp9Uo9h6VYVoXu52umTE%3D&amp;reserved=0
> > > > > > >>>>>
> > > > > > >>>>> Please vote to accept these changes. This vote will be
> > > > > > >>>>> open for
> > > > at
> > > > > > >>>>> least 72 hours
> > > > > > >>>>>
> > > > > > >>>>> [ ] +1 Adopt these Arrow protocol changes [ ] +0 [ ] -1 I
> > > > > > >>>>> disagree because...
> > > > > > >>>>>
> > > > > > >>>>> Here is my vote: +1
> > > > > > >>>>>
> > > > > > >>>>> Thanks,
> > > > > > >>>>> Wes
> > > > > > >>>>>
> > > > > > >>>>> [1]:
> > > > > > >>
> > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > > > sts.apache.org%2Fthread.html%2F8440be572c49b7b2ffb76b63e6d935ada9efd
> > > > 9c1c2021369b6d27786%40%253Cdev.arrow.apache.org%253E&amp;data=02%7C0
> > > > 1%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%
> > > > 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637031638512173773&amp;
> > > > sdata=4y7ProY0ZDIqXAWYah6NS7TRZHGoYfZ6zMipdLV5ntk%3D&amp;reserved=0
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > >
> >
>

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Wes McKinney <we...@gmail.com>.
I see, thank you for catching this nuance.

I agree that using {0xFFFFFFFF, 0x00000000} for EOS will resolve the
issue while allowing implementations to be backwards compatible (i.e.
handling the 4-byte EOS from older payloads).

I'm not sure that we need to have a vote about this, what do others think?

On Sat, Sep 7, 2019 at 12:47 AM Ji Liu <ni...@aliyun.com.invalid> wrote:
>
> Hi all,
>
> During the java code review[1], seems there is a problem with the current implementations(C++/Java etc) when reaching EOS, since the new format EOS is 8 bytes and the reader only reads 4 bytes when reach the end of stream, and the additional 4 bytes will not be read which cause problems for following up readings.
>
> There are some optional suggestions[2] as below, we should reach consistent and fix this problem before 0.15 release.
> i. For the new format, an 8-byte EOS token should look like {0xFFFFFFFF, 0x00000000}, so we read the continuation token first, and then know to read the next 4 bytes, which are then 0 to signal EOS.ii. Reader just remember the state, so if it reads the continuation token from the beginning, then read all 8 bytes at the end.
>
> Thanks,
> Ji Liu
>
> [1] https://github.com/apache/arrow/pull/5229
> [2] https://github.com/apache/arrow/pull/5229#discussion_r321715682
>
>
>
>
> ------------------------------------------------------------------
> From:Eric Erhardt <Er...@microsoft.com>
> Send Time:2019年9月5日(星期四) 07:16
> To:dev@arrow.apache.org <de...@arrow.apache.org>; Ji Liu <ni...@aliyun.com>
> Cc:emkornfield <em...@gmail.com>; Paul Taylor <pt...@apache.org>
> Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)
>
> The C# PR is up.
>
> https://github.com/apache/arrow/pull/5280
>
> Eric
>
> -----Original Message-----
> From: Eric Erhardt <Er...@microsoft.com.INVALID>
> Sent: Wednesday, September 4, 2019 10:12 AM
> To: dev@arrow.apache.org; Ji Liu <ni...@aliyun.com>
> Cc: emkornfield <em...@gmail.com>; Paul Taylor <pt...@apache.org>
> Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I'm working on a PR for the C# bindings. I hope to have it up in the next day or two. Integration tests for C# would be a great addition at some point - it's been on my backlog. For now I plan on manually testing it.
>
> -----Original Message-----
> From: Wes McKinney <we...@gmail.com>
> Sent: Tuesday, September 3, 2019 10:17 PM
> To: Ji Liu <ni...@aliyun.com>
> Cc: emkornfield <em...@gmail.com>; dev <de...@arrow.apache.org>; Paul Taylor <pt...@apache.org>
> Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)
>
> hi folks,
>
> We now have patches up for Java, JS, and Go. How are we doing on the code reviews for getting these in?
>
> Since C# implements the binary protocol, the C# developers might want to look at this before the 0.15.0 release also. Absent integration tests it's difficult to verify the C# library, though
>
> Thanks
>
> On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <ni...@aliyun.com> wrote:
> >
> > Here is the Java implementation
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2Fapache%2Farrow%2Fpull%2F5229&amp;data=02%7C01%7CEric.Erhardt%
> > 40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> > 1ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=b87u5x8lLvfdnU5
> > 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3D&amp;reserved=0
> >
> > cc @Wes McKinney @emkornfield
> >
> > Thanks,
> > Ji Liu
> >
> > ------------------------------------------------------------------
> > From:Ji Liu <ni...@aliyun.com.INVALID> Send Time:2019年8月28日(星期三)
> > 17:34 To:emkornfield <em...@gmail.com>; dev
> > <de...@arrow.apache.org> Cc:Paul Taylor <pt...@apache.org>
> > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > 8-byte Flatbuffer alignment requirements (2nd vote)
> >
> > I could take the Java implementation and will take a close watch on this issue in the next few days.
> >
> > Thanks,
> > Ji Liu
> >
> >
> > ------------------------------------------------------------------
> > From:Micah Kornfield <em...@gmail.com> Send Time:2019年8月28日(星期三)
> > 17:14 To:dev <de...@arrow.apache.org> Cc:Paul Taylor
> > <pt...@apache.org>
> > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > 8-byte Flatbuffer alignment requirements (2nd vote)
> >
> > I should have integration tests with 0.14.1 generated binaries in the
> > next few days.  I think the one remaining unassigned piece of work in
> > the Java implementation, i can take that up next if no one else gets to it.
> >
> > On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > > Here's the C++ changes
> > >
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thub.com%2Fapache%2Farrow%2Fpull%2F5211&amp;data=02%7C01%7CEric.Erha
> > > rdt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> > > 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=zWaHS8X
> > > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3D&amp;reserved=0
> > >
> > > I'm going to create a integration branch where we can merge each
> > > patch before merging to master
> > >
> > > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com> wrote:
> > > >
> > > > It isn't implemented in C++ yet but I will try to get a patch up
> > > > for that soon (today maybe). I think we should create a branch
> > > > where we can stack the patches that implement this for each language.
> > > >
> > > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor
> > > > <pt...@gmail.com>
> > > wrote:
> > > > >
> > > > > I'll do the JS updates. Is it safe to validate against the Arrow
> > > > > C++ integration tests?
> > > > >
> > > > >
> > > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > > I created
> > > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-6313&amp;data=02
> > > > > > %7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d
> > > > > > 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> > > > > > 38512163816&amp;sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> > > > > > NiLxI%3D&amp;reserved=0 as a
> > > tracking
> > > > > > issue with sub-issues on the development work.  So far no-one
> > > > > > has
> > > claimed
> > > > > > Java and Javascript tasks.
> > > > > >
> > > > > > Would it make sense to have a separate dev branch for this work?
> > > > > >
> > > > > > Thanks,
> > > > > > Micah
> > > > > >
> > > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney
> > > > > > <we...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > > > >>
> > > > > >> I'll merge the specification patch later today and we can
> > > > > >> begin working on implementations so we can get this done for
> > > > > >> 0.15.0
> > > > > >>
> > > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler
> > > > > >> <cu...@gmail.com>
> > > wrote:
> > > > > >>> +1 (non-binding)
> > > > > >>>
> > > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou
> > > > > >>> <so...@pitrou.net>
> > > > > >> wrote:
> > > > > >>>> Sorry, had forgotten to send my vote on this.
> > > > > >>>>
> > > > > >>>> +1 from me.
> > > > > >>>>
> > > > > >>>> Regards
> > > > > >>>>
> > > > > >>>> Antoine.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney
> > > > > >>>> <we...@gmail.com> wrote:
> > > > > >>>>> hi all,
> > > > > >>>>>
> > > > > >>>>> As we've been discussing [1], there is a need to introduce
> > > > > >>>>> 4
> > > bytes of
> > > > > >>>>> padding into the preamble of the "encapsulated IPC message"
> > > format to
> > > > > >>>>> ensure that the Flatbuffers metadata payload begins on an
> > > > > >>>>> 8-byte aligned memory offset. The alternative to this
> > > > > >>>>> would be for Arrow implementations where alignment is
> > > > > >>>>> important (e.g. C or C++) to
> > > copy
> > > > > >>>>> the metadata (which is not always small) into memory when
> > > > > >>>>> it is unaligned.
> > > > > >>>>>
> > > > > >>>>> Micah has proposed to address this by adding a 4-byte
> > > > > >>>>> "continuation" value at the beginning of the payload
> > > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way
> > > > > >>>>> is that old clients will see an invalid length (what is
> > > > > >>>>> currently the first 4 bytes of the message -- a 32-bit
> > > > > >>>>> little endian signed integer indicating the metadata
> > > > > >>>>> length) rather than potentially crashing on a valid
> > > > > >>>>> length. We also propose to expand the "end of stream"
> > > > > >>>>> marker used in the stream and file format from 4 to 8
> > > > > >>>>> bytes. This has the additional effect of aligning the file footer defined in File.fbs.
> > > > > >>>>>
> > > > > >>>>> This would be a backwards incompatible protocol change, so
> > > > > >>>>> older
> > > > > >> Arrow
> > > > > >>>>> libraries would not be able to read these new messages.
> > > Maintaining
> > > > > >>>>> forward compatibility (reading data produced by older
> > > > > >>>>> libraries)
> > > > > >> would
> > > > > >>>>> be possible as we can reason that a value other than the
> > > continuation
> > > > > >>>>> value was produced by an older library (and then validate
> > > > > >>>>> the Flatbuffer message of course). Arrow implementations
> > > > > >>>>> could offer
> > > a
> > > > > >>>>> backward compatibility mode for the sake of old readers if
> > > > > >>>>> they
> > > > > >> desire
> > > > > >>>>> (this may also assist with testing).
> > > > > >>>>>
> > > > > >>>>> Additionally with this vote, we want to formally approve
> > > > > >>>>> the
> > > change
> > > > > >> to
> > > > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > > > >> end-of-stream
> > > > > >>>>> marker, which enables code that processes Arrow streams to
> > > > > >>>>> safely
> > > > > >> read
> > > > > >>>>> the file's internal messages as though they were a normal stream.
> > > > > >>>>>
> > > > > >>>>> The PR making these changes to the IPC documentation is
> > > > > >>>>> here
> > > > > >>>>>
> > > > > >>>>> https://nam06.safelinks.protection.outlook.com/?url=https%
> > > > > >>>>> 3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4951&amp;data
> > > > > >>>>> =02%7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff
> > > > > >>>>> 5c9008d730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> > > > > >>>>> 0%7C637031638512163816&amp;sdata=WF9uQ1d7GzHohv31%2BW3tl3I
> > > > > >>>>> vp9Uo9h6VYVoXu52umTE%3D&amp;reserved=0
> > > > > >>>>>
> > > > > >>>>> Please vote to accept these changes. This vote will be
> > > > > >>>>> open for
> > > at
> > > > > >>>>> least 72 hours
> > > > > >>>>>
> > > > > >>>>> [ ] +1 Adopt these Arrow protocol changes [ ] +0 [ ] -1 I
> > > > > >>>>> disagree because...
> > > > > >>>>>
> > > > > >>>>> Here is my vote: +1
> > > > > >>>>>
> > > > > >>>>> Thanks,
> > > > > >>>>> Wes
> > > > > >>>>>
> > > > > >>>>> [1]:
> > > > > >>
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > > sts.apache.org%2Fthread.html%2F8440be572c49b7b2ffb76b63e6d935ada9efd
> > > 9c1c2021369b6d27786%40%253Cdev.arrow.apache.org%253E&amp;data=02%7C0
> > > 1%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%
> > > 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637031638512173773&amp;
> > > sdata=4y7ProY0ZDIqXAWYah6NS7TRZHGoYfZ6zMipdLV5ntk%3D&amp;reserved=0
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > >
>

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Ji Liu <ni...@aliyun.com.INVALID>.
Hi all,

During the java code review[1], seems there is a problem with the current implementations(C++/Java etc) when reaching EOS, since the new format EOS is 8 bytes and the reader only reads 4 bytes when reach the end of stream, and the additional 4 bytes will not be read which cause problems for following up readings.

There are some optional suggestions[2] as below, we should reach consistent and fix this problem before 0.15 release.
i. For the new format, an 8-byte EOS token should look like {0xFFFFFFFF, 0x00000000}, so we read the continuation token first, and then know to read the next 4 bytes, which are then 0 to signal EOS.ii. Reader just remember the state, so if it reads the continuation token from the beginning, then read all 8 bytes at the end.

Thanks,
Ji Liu

[1] https://github.com/apache/arrow/pull/5229
[2] https://github.com/apache/arrow/pull/5229#discussion_r321715682




------------------------------------------------------------------
From:Eric Erhardt <Er...@microsoft.com>
Send Time:2019年9月5日(星期四) 07:16
To:dev@arrow.apache.org <de...@arrow.apache.org>; Ji Liu <ni...@aliyun.com>
Cc:emkornfield <em...@gmail.com>; Paul Taylor <pt...@apache.org>
Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

The C# PR is up.

https://github.com/apache/arrow/pull/5280

Eric

-----Original Message-----
From: Eric Erhardt <Er...@microsoft.com.INVALID> 
Sent: Wednesday, September 4, 2019 10:12 AM
To: dev@arrow.apache.org; Ji Liu <ni...@aliyun.com>
Cc: emkornfield <em...@gmail.com>; Paul Taylor <pt...@apache.org>
Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

I'm working on a PR for the C# bindings. I hope to have it up in the next day or two. Integration tests for C# would be a great addition at some point - it's been on my backlog. For now I plan on manually testing it.

-----Original Message-----
From: Wes McKinney <we...@gmail.com>
Sent: Tuesday, September 3, 2019 10:17 PM
To: Ji Liu <ni...@aliyun.com>
Cc: emkornfield <em...@gmail.com>; dev <de...@arrow.apache.org>; Paul Taylor <pt...@apache.org>
Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

hi folks,

We now have patches up for Java, JS, and Go. How are we doing on the code reviews for getting these in?

Since C# implements the binary protocol, the C# developers might want to look at this before the 0.15.0 release also. Absent integration tests it's difficult to verify the C# library, though

Thanks

On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <ni...@aliyun.com> wrote:
>
> Here is the Java implementation
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Farrow%2Fpull%2F5229&amp;data=02%7C01%7CEric.Erhardt%
> 40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=b87u5x8lLvfdnU5
> 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3D&amp;reserved=0
>
> cc @Wes McKinney @emkornfield
>
> Thanks,
> Ji Liu
>
> ------------------------------------------------------------------
> From:Ji Liu <ni...@aliyun.com.INVALID> Send Time:2019年8月28日(星期三)
> 17:34 To:emkornfield <em...@gmail.com>; dev 
> <de...@arrow.apache.org> Cc:Paul Taylor <pt...@apache.org>
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 
> 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I could take the Java implementation and will take a close watch on this issue in the next few days.
>
> Thanks,
> Ji Liu
>
>
> ------------------------------------------------------------------
> From:Micah Kornfield <em...@gmail.com> Send Time:2019年8月28日(星期三)
> 17:14 To:dev <de...@arrow.apache.org> Cc:Paul Taylor 
> <pt...@apache.org>
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 
> 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I should have integration tests with 0.14.1 generated binaries in the 
> next few days.  I think the one remaining unassigned piece of work in 
> the Java implementation, i can take that up next if no one else gets to it.
>
> On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com> wrote:
>
> > Here's the C++ changes
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > thub.com%2Fapache%2Farrow%2Fpull%2F5211&amp;data=02%7C01%7CEric.Erha
> > rdt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> > 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=zWaHS8X
> > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3D&amp;reserved=0
> >
> > I'm going to create a integration branch where we can merge each 
> > patch before merging to master
> >
> > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > It isn't implemented in C++ yet but I will try to get a patch up 
> > > for that soon (today maybe). I think we should create a branch 
> > > where we can stack the patches that implement this for each language.
> > >
> > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor 
> > > <pt...@gmail.com>
> > wrote:
> > > >
> > > > I'll do the JS updates. Is it safe to validate against the Arrow
> > > > C++ integration tests?
> > > >
> > > >
> > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > I created
> > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-6313&amp;data=02
> > > > > %7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d
> > > > > 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> > > > > 38512163816&amp;sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> > > > > NiLxI%3D&amp;reserved=0 as a
> > tracking
> > > > > issue with sub-issues on the development work.  So far no-one 
> > > > > has
> > claimed
> > > > > Java and Javascript tasks.
> > > > >
> > > > > Would it make sense to have a separate dev branch for this work?
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney 
> > > > > <we...@gmail.com>
> > wrote:
> > > > >
> > > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > > >>
> > > > >> I'll merge the specification patch later today and we can 
> > > > >> begin working on implementations so we can get this done for
> > > > >> 0.15.0
> > > > >>
> > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler 
> > > > >> <cu...@gmail.com>
> > wrote:
> > > > >>> +1 (non-binding)
> > > > >>>
> > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou 
> > > > >>> <so...@pitrou.net>
> > > > >> wrote:
> > > > >>>> Sorry, had forgotten to send my vote on this.
> > > > >>>>
> > > > >>>> +1 from me.
> > > > >>>>
> > > > >>>> Regards
> > > > >>>>
> > > > >>>> Antoine.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney 
> > > > >>>> <we...@gmail.com> wrote:
> > > > >>>>> hi all,
> > > > >>>>>
> > > > >>>>> As we've been discussing [1], there is a need to introduce
> > > > >>>>> 4
> > bytes of
> > > > >>>>> padding into the preamble of the "encapsulated IPC message"
> > format to
> > > > >>>>> ensure that the Flatbuffers metadata payload begins on an 
> > > > >>>>> 8-byte aligned memory offset. The alternative to this 
> > > > >>>>> would be for Arrow implementations where alignment is 
> > > > >>>>> important (e.g. C or C++) to
> > copy
> > > > >>>>> the metadata (which is not always small) into memory when 
> > > > >>>>> it is unaligned.
> > > > >>>>>
> > > > >>>>> Micah has proposed to address this by adding a 4-byte 
> > > > >>>>> "continuation" value at the beginning of the payload 
> > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way 
> > > > >>>>> is that old clients will see an invalid length (what is 
> > > > >>>>> currently the first 4 bytes of the message -- a 32-bit 
> > > > >>>>> little endian signed integer indicating the metadata
> > > > >>>>> length) rather than potentially crashing on a valid 
> > > > >>>>> length. We also propose to expand the "end of stream"
> > > > >>>>> marker used in the stream and file format from 4 to 8 
> > > > >>>>> bytes. This has the additional effect of aligning the file footer defined in File.fbs.
> > > > >>>>>
> > > > >>>>> This would be a backwards incompatible protocol change, so 
> > > > >>>>> older
> > > > >> Arrow
> > > > >>>>> libraries would not be able to read these new messages.
> > Maintaining
> > > > >>>>> forward compatibility (reading data produced by older
> > > > >>>>> libraries)
> > > > >> would
> > > > >>>>> be possible as we can reason that a value other than the
> > continuation
> > > > >>>>> value was produced by an older library (and then validate 
> > > > >>>>> the Flatbuffer message of course). Arrow implementations 
> > > > >>>>> could offer
> > a
> > > > >>>>> backward compatibility mode for the sake of old readers if 
> > > > >>>>> they
> > > > >> desire
> > > > >>>>> (this may also assist with testing).
> > > > >>>>>
> > > > >>>>> Additionally with this vote, we want to formally approve 
> > > > >>>>> the
> > change
> > > > >> to
> > > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > > >> end-of-stream
> > > > >>>>> marker, which enables code that processes Arrow streams to 
> > > > >>>>> safely
> > > > >> read
> > > > >>>>> the file's internal messages as though they were a normal stream.
> > > > >>>>>
> > > > >>>>> The PR making these changes to the IPC documentation is 
> > > > >>>>> here
> > > > >>>>>
> > > > >>>>> https://nam06.safelinks.protection.outlook.com/?url=https%
> > > > >>>>> 3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4951&amp;data
> > > > >>>>> =02%7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff
> > > > >>>>> 5c9008d730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> > > > >>>>> 0%7C637031638512163816&amp;sdata=WF9uQ1d7GzHohv31%2BW3tl3I
> > > > >>>>> vp9Uo9h6VYVoXu52umTE%3D&amp;reserved=0
> > > > >>>>>
> > > > >>>>> Please vote to accept these changes. This vote will be 
> > > > >>>>> open for
> > at
> > > > >>>>> least 72 hours
> > > > >>>>>
> > > > >>>>> [ ] +1 Adopt these Arrow protocol changes [ ] +0 [ ] -1 I 
> > > > >>>>> disagree because...
> > > > >>>>>
> > > > >>>>> Here is my vote: +1
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Wes
> > > > >>>>>
> > > > >>>>> [1]:
> > > > >>
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > sts.apache.org%2Fthread.html%2F8440be572c49b7b2ffb76b63e6d935ada9efd
> > 9c1c2021369b6d27786%40%253Cdev.arrow.apache.org%253E&amp;data=02%7C0
> > 1%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%
> > 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637031638512173773&amp;
> > sdata=4y7ProY0ZDIqXAWYah6NS7TRZHGoYfZ6zMipdLV5ntk%3D&amp;reserved=0
> > > > >>>>
> > > > >>>>
> > > > >>>>
> >


RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Eric Erhardt <Er...@microsoft.com.INVALID>.
The C# PR is up.

https://github.com/apache/arrow/pull/5280

Eric

-----Original Message-----
From: Eric Erhardt <Er...@microsoft.com.INVALID> 
Sent: Wednesday, September 4, 2019 10:12 AM
To: dev@arrow.apache.org; Ji Liu <ni...@aliyun.com>
Cc: emkornfield <em...@gmail.com>; Paul Taylor <pt...@apache.org>
Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

I'm working on a PR for the C# bindings. I hope to have it up in the next day or two. Integration tests for C# would be a great addition at some point - it's been on my backlog. For now I plan on manually testing it.

-----Original Message-----
From: Wes McKinney <we...@gmail.com>
Sent: Tuesday, September 3, 2019 10:17 PM
To: Ji Liu <ni...@aliyun.com>
Cc: emkornfield <em...@gmail.com>; dev <de...@arrow.apache.org>; Paul Taylor <pt...@apache.org>
Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

hi folks,

We now have patches up for Java, JS, and Go. How are we doing on the code reviews for getting these in?

Since C# implements the binary protocol, the C# developers might want to look at this before the 0.15.0 release also. Absent integration tests it's difficult to verify the C# library, though

Thanks

On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <ni...@aliyun.com> wrote:
>
> Here is the Java implementation
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Farrow%2Fpull%2F5229&amp;data=02%7C01%7CEric.Erhardt%
> 40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=b87u5x8lLvfdnU5
> 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3D&amp;reserved=0
>
> cc @Wes McKinney @emkornfield
>
> Thanks,
> Ji Liu
>
> ------------------------------------------------------------------
> From:Ji Liu <ni...@aliyun.com.INVALID> Send Time:2019年8月28日(星期三)
> 17:34 To:emkornfield <em...@gmail.com>; dev 
> <de...@arrow.apache.org> Cc:Paul Taylor <pt...@apache.org>
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 
> 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I could take the Java implementation and will take a close watch on this issue in the next few days.
>
> Thanks,
> Ji Liu
>
>
> ------------------------------------------------------------------
> From:Micah Kornfield <em...@gmail.com> Send Time:2019年8月28日(星期三)
> 17:14 To:dev <de...@arrow.apache.org> Cc:Paul Taylor 
> <pt...@apache.org>
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 
> 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I should have integration tests with 0.14.1 generated binaries in the 
> next few days.  I think the one remaining unassigned piece of work in 
> the Java implementation, i can take that up next if no one else gets to it.
>
> On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com> wrote:
>
> > Here's the C++ changes
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > thub.com%2Fapache%2Farrow%2Fpull%2F5211&amp;data=02%7C01%7CEric.Erha
> > rdt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> > 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=zWaHS8X
> > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3D&amp;reserved=0
> >
> > I'm going to create a integration branch where we can merge each 
> > patch before merging to master
> >
> > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > It isn't implemented in C++ yet but I will try to get a patch up 
> > > for that soon (today maybe). I think we should create a branch 
> > > where we can stack the patches that implement this for each language.
> > >
> > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor 
> > > <pt...@gmail.com>
> > wrote:
> > > >
> > > > I'll do the JS updates. Is it safe to validate against the Arrow
> > > > C++ integration tests?
> > > >
> > > >
> > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > I created
> > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-6313&amp;data=02
> > > > > %7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d
> > > > > 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> > > > > 38512163816&amp;sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> > > > > NiLxI%3D&amp;reserved=0 as a
> > tracking
> > > > > issue with sub-issues on the development work.  So far no-one 
> > > > > has
> > claimed
> > > > > Java and Javascript tasks.
> > > > >
> > > > > Would it make sense to have a separate dev branch for this work?
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney 
> > > > > <we...@gmail.com>
> > wrote:
> > > > >
> > > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > > >>
> > > > >> I'll merge the specification patch later today and we can 
> > > > >> begin working on implementations so we can get this done for
> > > > >> 0.15.0
> > > > >>
> > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler 
> > > > >> <cu...@gmail.com>
> > wrote:
> > > > >>> +1 (non-binding)
> > > > >>>
> > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou 
> > > > >>> <so...@pitrou.net>
> > > > >> wrote:
> > > > >>>> Sorry, had forgotten to send my vote on this.
> > > > >>>>
> > > > >>>> +1 from me.
> > > > >>>>
> > > > >>>> Regards
> > > > >>>>
> > > > >>>> Antoine.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney 
> > > > >>>> <we...@gmail.com> wrote:
> > > > >>>>> hi all,
> > > > >>>>>
> > > > >>>>> As we've been discussing [1], there is a need to introduce
> > > > >>>>> 4
> > bytes of
> > > > >>>>> padding into the preamble of the "encapsulated IPC message"
> > format to
> > > > >>>>> ensure that the Flatbuffers metadata payload begins on an 
> > > > >>>>> 8-byte aligned memory offset. The alternative to this 
> > > > >>>>> would be for Arrow implementations where alignment is 
> > > > >>>>> important (e.g. C or C++) to
> > copy
> > > > >>>>> the metadata (which is not always small) into memory when 
> > > > >>>>> it is unaligned.
> > > > >>>>>
> > > > >>>>> Micah has proposed to address this by adding a 4-byte 
> > > > >>>>> "continuation" value at the beginning of the payload 
> > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way 
> > > > >>>>> is that old clients will see an invalid length (what is 
> > > > >>>>> currently the first 4 bytes of the message -- a 32-bit 
> > > > >>>>> little endian signed integer indicating the metadata
> > > > >>>>> length) rather than potentially crashing on a valid 
> > > > >>>>> length. We also propose to expand the "end of stream"
> > > > >>>>> marker used in the stream and file format from 4 to 8 
> > > > >>>>> bytes. This has the additional effect of aligning the file footer defined in File.fbs.
> > > > >>>>>
> > > > >>>>> This would be a backwards incompatible protocol change, so 
> > > > >>>>> older
> > > > >> Arrow
> > > > >>>>> libraries would not be able to read these new messages.
> > Maintaining
> > > > >>>>> forward compatibility (reading data produced by older
> > > > >>>>> libraries)
> > > > >> would
> > > > >>>>> be possible as we can reason that a value other than the
> > continuation
> > > > >>>>> value was produced by an older library (and then validate 
> > > > >>>>> the Flatbuffer message of course). Arrow implementations 
> > > > >>>>> could offer
> > a
> > > > >>>>> backward compatibility mode for the sake of old readers if 
> > > > >>>>> they
> > > > >> desire
> > > > >>>>> (this may also assist with testing).
> > > > >>>>>
> > > > >>>>> Additionally with this vote, we want to formally approve 
> > > > >>>>> the
> > change
> > > > >> to
> > > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > > >> end-of-stream
> > > > >>>>> marker, which enables code that processes Arrow streams to 
> > > > >>>>> safely
> > > > >> read
> > > > >>>>> the file's internal messages as though they were a normal stream.
> > > > >>>>>
> > > > >>>>> The PR making these changes to the IPC documentation is 
> > > > >>>>> here
> > > > >>>>>
> > > > >>>>> https://nam06.safelinks.protection.outlook.com/?url=https%
> > > > >>>>> 3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4951&amp;data
> > > > >>>>> =02%7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff
> > > > >>>>> 5c9008d730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> > > > >>>>> 0%7C637031638512163816&amp;sdata=WF9uQ1d7GzHohv31%2BW3tl3I
> > > > >>>>> vp9Uo9h6VYVoXu52umTE%3D&amp;reserved=0
> > > > >>>>>
> > > > >>>>> Please vote to accept these changes. This vote will be 
> > > > >>>>> open for
> > at
> > > > >>>>> least 72 hours
> > > > >>>>>
> > > > >>>>> [ ] +1 Adopt these Arrow protocol changes [ ] +0 [ ] -1 I 
> > > > >>>>> disagree because...
> > > > >>>>>
> > > > >>>>> Here is my vote: +1
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Wes
> > > > >>>>>
> > > > >>>>> [1]:
> > > > >>
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > sts.apache.org%2Fthread.html%2F8440be572c49b7b2ffb76b63e6d935ada9efd
> > 9c1c2021369b6d27786%40%253Cdev.arrow.apache.org%253E&amp;data=02%7C0
> > 1%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%
> > 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637031638512173773&amp;
> > sdata=4y7ProY0ZDIqXAWYah6NS7TRZHGoYfZ6zMipdLV5ntk%3D&amp;reserved=0
> > > > >>>>
> > > > >>>>
> > > > >>>>
> >

RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Eric Erhardt <Er...@microsoft.com.INVALID>.
I'm working on a PR for the C# bindings. I hope to have it up in the next day or two. Integration tests for C# would be a great addition at some point - it's been on my backlog. For now I plan on manually testing it.

-----Original Message-----
From: Wes McKinney <we...@gmail.com> 
Sent: Tuesday, September 3, 2019 10:17 PM
To: Ji Liu <ni...@aliyun.com>
Cc: emkornfield <em...@gmail.com>; dev <de...@arrow.apache.org>; Paul Taylor <pt...@apache.org>
Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

hi folks,

We now have patches up for Java, JS, and Go. How are we doing on the code reviews for getting these in?

Since C# implements the binary protocol, the C# developers might want to look at this before the 0.15.0 release also. Absent integration tests it's difficult to verify the C# library, though

Thanks

On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <ni...@aliyun.com> wrote:
>
> Here is the Java implementation
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Farrow%2Fpull%2F5229&amp;data=02%7C01%7CEric.Erhardt%
> 40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=b87u5x8lLvfdnU5
> 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3D&amp;reserved=0
>
> cc @Wes McKinney @emkornfield
>
> Thanks,
> Ji Liu
>
> ------------------------------------------------------------------
> From:Ji Liu <ni...@aliyun.com.INVALID> Send Time:2019年8月28日(星期三) 
> 17:34 To:emkornfield <em...@gmail.com>; dev 
> <de...@arrow.apache.org> Cc:Paul Taylor <pt...@apache.org>
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 
> 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I could take the Java implementation and will take a close watch on this issue in the next few days.
>
> Thanks,
> Ji Liu
>
>
> ------------------------------------------------------------------
> From:Micah Kornfield <em...@gmail.com> Send Time:2019年8月28日(星期三) 
> 17:14 To:dev <de...@arrow.apache.org> Cc:Paul Taylor 
> <pt...@apache.org>
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 
> 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I should have integration tests with 0.14.1 generated binaries in the 
> next few days.  I think the one remaining unassigned piece of work in 
> the Java implementation, i can take that up next if no one else gets to it.
>
> On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com> wrote:
>
> > Here's the C++ changes
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > thub.com%2Fapache%2Farrow%2Fpull%2F5211&amp;data=02%7C01%7CEric.Erha
> > rdt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> > 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=zWaHS8X
> > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3D&amp;reserved=0
> >
> > I'm going to create a integration branch where we can merge each 
> > patch before merging to master
> >
> > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > It isn't implemented in C++ yet but I will try to get a patch up 
> > > for that soon (today maybe). I think we should create a branch 
> > > where we can stack the patches that implement this for each language.
> > >
> > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor 
> > > <pt...@gmail.com>
> > wrote:
> > > >
> > > > I'll do the JS updates. Is it safe to validate against the Arrow 
> > > > C++ integration tests?
> > > >
> > > >
> > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > I created 
> > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-6313&amp;data=02
> > > > > %7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d
> > > > > 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> > > > > 38512163816&amp;sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> > > > > NiLxI%3D&amp;reserved=0 as a
> > tracking
> > > > > issue with sub-issues on the development work.  So far no-one 
> > > > > has
> > claimed
> > > > > Java and Javascript tasks.
> > > > >
> > > > > Would it make sense to have a separate dev branch for this work?
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney 
> > > > > <we...@gmail.com>
> > wrote:
> > > > >
> > > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > > >>
> > > > >> I'll merge the specification patch later today and we can 
> > > > >> begin working on implementations so we can get this done for 
> > > > >> 0.15.0
> > > > >>
> > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler 
> > > > >> <cu...@gmail.com>
> > wrote:
> > > > >>> +1 (non-binding)
> > > > >>>
> > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou 
> > > > >>> <so...@pitrou.net>
> > > > >> wrote:
> > > > >>>> Sorry, had forgotten to send my vote on this.
> > > > >>>>
> > > > >>>> +1 from me.
> > > > >>>>
> > > > >>>> Regards
> > > > >>>>
> > > > >>>> Antoine.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney 
> > > > >>>> <we...@gmail.com> wrote:
> > > > >>>>> hi all,
> > > > >>>>>
> > > > >>>>> As we've been discussing [1], there is a need to introduce 
> > > > >>>>> 4
> > bytes of
> > > > >>>>> padding into the preamble of the "encapsulated IPC message"
> > format to
> > > > >>>>> ensure that the Flatbuffers metadata payload begins on an 
> > > > >>>>> 8-byte aligned memory offset. The alternative to this 
> > > > >>>>> would be for Arrow implementations where alignment is 
> > > > >>>>> important (e.g. C or C++) to
> > copy
> > > > >>>>> the metadata (which is not always small) into memory when 
> > > > >>>>> it is unaligned.
> > > > >>>>>
> > > > >>>>> Micah has proposed to address this by adding a 4-byte 
> > > > >>>>> "continuation" value at the beginning of the payload 
> > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way 
> > > > >>>>> is that old clients will see an invalid length (what is 
> > > > >>>>> currently the first 4 bytes of the message -- a 32-bit 
> > > > >>>>> little endian signed integer indicating the metadata 
> > > > >>>>> length) rather than potentially crashing on a valid 
> > > > >>>>> length. We also propose to expand the "end of stream" 
> > > > >>>>> marker used in the stream and file format from 4 to 8 
> > > > >>>>> bytes. This has the additional effect of aligning the file footer defined in File.fbs.
> > > > >>>>>
> > > > >>>>> This would be a backwards incompatible protocol change, so 
> > > > >>>>> older
> > > > >> Arrow
> > > > >>>>> libraries would not be able to read these new messages.
> > Maintaining
> > > > >>>>> forward compatibility (reading data produced by older 
> > > > >>>>> libraries)
> > > > >> would
> > > > >>>>> be possible as we can reason that a value other than the
> > continuation
> > > > >>>>> value was produced by an older library (and then validate 
> > > > >>>>> the Flatbuffer message of course). Arrow implementations 
> > > > >>>>> could offer
> > a
> > > > >>>>> backward compatibility mode for the sake of old readers if 
> > > > >>>>> they
> > > > >> desire
> > > > >>>>> (this may also assist with testing).
> > > > >>>>>
> > > > >>>>> Additionally with this vote, we want to formally approve 
> > > > >>>>> the
> > change
> > > > >> to
> > > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > > >> end-of-stream
> > > > >>>>> marker, which enables code that processes Arrow streams to 
> > > > >>>>> safely
> > > > >> read
> > > > >>>>> the file's internal messages as though they were a normal stream.
> > > > >>>>>
> > > > >>>>> The PR making these changes to the IPC documentation is 
> > > > >>>>> here
> > > > >>>>>
> > > > >>>>> https://nam06.safelinks.protection.outlook.com/?url=https%
> > > > >>>>> 3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4951&amp;data
> > > > >>>>> =02%7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff
> > > > >>>>> 5c9008d730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> > > > >>>>> 0%7C637031638512163816&amp;sdata=WF9uQ1d7GzHohv31%2BW3tl3I
> > > > >>>>> vp9Uo9h6VYVoXu52umTE%3D&amp;reserved=0
> > > > >>>>>
> > > > >>>>> Please vote to accept these changes. This vote will be 
> > > > >>>>> open for
> > at
> > > > >>>>> least 72 hours
> > > > >>>>>
> > > > >>>>> [ ] +1 Adopt these Arrow protocol changes [ ] +0 [ ] -1 I 
> > > > >>>>> disagree because...
> > > > >>>>>
> > > > >>>>> Here is my vote: +1
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Wes
> > > > >>>>>
> > > > >>>>> [1]:
> > > > >>
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > sts.apache.org%2Fthread.html%2F8440be572c49b7b2ffb76b63e6d935ada9efd
> > 9c1c2021369b6d27786%40%253Cdev.arrow.apache.org%253E&amp;data=02%7C0
> > 1%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%
> > 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637031638512173773&amp;
> > sdata=4y7ProY0ZDIqXAWYah6NS7TRZHGoYfZ6zMipdLV5ntk%3D&amp;reserved=0
> > > > >>>>
> > > > >>>>
> > > > >>>>
> >

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Wes McKinney <we...@gmail.com>.
hi folks,

We now have patches up for Java, JS, and Go. How are we doing on the
code reviews for getting these in?

Since C# implements the binary protocol, the C# developers might want
to look at this before the 0.15.0 release also. Absent integration
tests it's difficult to verify the C# library, though

Thanks

On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <ni...@aliyun.com> wrote:
>
> Here is the Java implementation
> https://github.com/apache/arrow/pull/5229
>
> cc @Wes McKinney @emkornfield
>
> Thanks,
> Ji Liu
>
> ------------------------------------------------------------------
> From:Ji Liu <ni...@aliyun.com.INVALID>
> Send Time:2019年8月28日(星期三) 17:34
> To:emkornfield <em...@gmail.com>; dev <de...@arrow.apache.org>
> Cc:Paul Taylor <pt...@apache.org>
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I could take the Java implementation and will take a close watch on this issue in the next few days.
>
> Thanks,
> Ji Liu
>
>
> ------------------------------------------------------------------
> From:Micah Kornfield <em...@gmail.com>
> Send Time:2019年8月28日(星期三) 17:14
> To:dev <de...@arrow.apache.org>
> Cc:Paul Taylor <pt...@apache.org>
> Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)
>
> I should have integration tests with 0.14.1 generated binaries in the next
> few days.  I think the one remaining unassigned piece of work in the Java
> implementation, i can take that up next if no one else gets to it.
>
> On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com> wrote:
>
> > Here's the C++ changes
> >
> > https://github.com/apache/arrow/pull/5211
> >
> > I'm going to create a integration branch where we can merge each patch
> > before merging to master
> >
> > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > It isn't implemented in C++ yet but I will try to get a patch up for
> > > that soon (today maybe). I think we should create a branch where we
> > > can stack the patches that implement this for each language.
> > >
> > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor <pt...@gmail.com>
> > wrote:
> > > >
> > > > I'll do the JS updates. Is it safe to validate against the Arrow C++
> > > > integration tests?
> > > >
> > > >
> > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > I created https://issues.apache.org/jira/browse/ARROW-6313 as a
> > tracking
> > > > > issue with sub-issues on the development work.  So far no-one has
> > claimed
> > > > > Java and Javascript tasks.
> > > > >
> > > > > Would it make sense to have a separate dev branch for this work?
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney <we...@gmail.com>
> > wrote:
> > > > >
> > > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > > >>
> > > > >> I'll merge the specification patch later today and we can begin
> > > > >> working on implementations so we can get this done for 0.15.0
> > > > >>
> > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler <cu...@gmail.com>
> > wrote:
> > > > >>> +1 (non-binding)
> > > > >>>
> > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou <so...@pitrou.net>
> > > > >> wrote:
> > > > >>>> Sorry, had forgotten to send my vote on this.
> > > > >>>>
> > > > >>>> +1 from me.
> > > > >>>>
> > > > >>>> Regards
> > > > >>>>
> > > > >>>> Antoine.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500
> > > > >>>> Wes McKinney <we...@gmail.com> wrote:
> > > > >>>>> hi all,
> > > > >>>>>
> > > > >>>>> As we've been discussing [1], there is a need to introduce 4
> > bytes of
> > > > >>>>> padding into the preamble of the "encapsulated IPC message"
> > format to
> > > > >>>>> ensure that the Flatbuffers metadata payload begins on an 8-byte
> > > > >>>>> aligned memory offset. The alternative to this would be for Arrow
> > > > >>>>> implementations where alignment is important (e.g. C or C++) to
> > copy
> > > > >>>>> the metadata (which is not always small) into memory when it is
> > > > >>>>> unaligned.
> > > > >>>>>
> > > > >>>>> Micah has proposed to address this by adding a
> > > > >>>>> 4-byte "continuation" value at the beginning of the payload
> > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way is that
> > > > >>>>> old clients will see an invalid length (what is currently the
> > > > >>>>> first 4 bytes of the message -- a 32-bit little endian signed
> > > > >>>>> integer indicating the metadata length) rather than potentially
> > > > >>>>> crashing on a valid length. We also propose to expand the "end of
> > > > >>>>> stream" marker used in the stream and file format from 4 to 8
> > > > >>>>> bytes. This has the additional effect of aligning the file footer
> > > > >>>>> defined in File.fbs.
> > > > >>>>>
> > > > >>>>> This would be a backwards incompatible protocol change, so older
> > > > >> Arrow
> > > > >>>>> libraries would not be able to read these new messages.
> > Maintaining
> > > > >>>>> forward compatibility (reading data produced by older libraries)
> > > > >> would
> > > > >>>>> be possible as we can reason that a value other than the
> > continuation
> > > > >>>>> value was produced by an older library (and then validate the
> > > > >>>>> Flatbuffer message of course). Arrow implementations could offer
> > a
> > > > >>>>> backward compatibility mode for the sake of old readers if they
> > > > >> desire
> > > > >>>>> (this may also assist with testing).
> > > > >>>>>
> > > > >>>>> Additionally with this vote, we want to formally approve the
> > change
> > > > >> to
> > > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > > >> end-of-stream
> > > > >>>>> marker, which enables code that processes Arrow streams to safely
> > > > >> read
> > > > >>>>> the file's internal messages as though they were a normal stream.
> > > > >>>>>
> > > > >>>>> The PR making these changes to the IPC documentation is here
> > > > >>>>>
> > > > >>>>> https://github.com/apache/arrow/pull/4951
> > > > >>>>>
> > > > >>>>> Please vote to accept these changes. This vote will be open for
> > at
> > > > >>>>> least 72 hours
> > > > >>>>>
> > > > >>>>> [ ] +1 Adopt these Arrow protocol changes
> > > > >>>>> [ ] +0
> > > > >>>>> [ ] -1 I disagree because...
> > > > >>>>>
> > > > >>>>> Here is my vote: +1
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Wes
> > > > >>>>>
> > > > >>>>> [1]:
> > > > >>
> > https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> > > > >>>>
> > > > >>>>
> > > > >>>>
> >

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Ji Liu <ni...@aliyun.com.INVALID>.
Here is the Java implementation
https://github.com/apache/arrow/pull/5229

cc @Wes McKinney @emkornfield

Thanks,
Ji Liu


------------------------------------------------------------------
From:Ji Liu <ni...@aliyun.com.INVALID>
Send Time:2019年8月28日(星期三) 17:34
To:emkornfield <em...@gmail.com>; dev <de...@arrow.apache.org>
Cc:Paul Taylor <pt...@apache.org>
Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

I could take the Java implementation and will take a close watch on this issue in the next few days.

Thanks,
Ji Liu


------------------------------------------------------------------
From:Micah Kornfield <em...@gmail.com>
Send Time:2019年8月28日(星期三) 17:14
To:dev <de...@arrow.apache.org>
Cc:Paul Taylor <pt...@apache.org>
Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

I should have integration tests with 0.14.1 generated binaries in the next
few days.  I think the one remaining unassigned piece of work in the Java
implementation, i can take that up next if no one else gets to it.

On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com> wrote:

> Here's the C++ changes
>
> https://github.com/apache/arrow/pull/5211
>
> I'm going to create a integration branch where we can merge each patch
> before merging to master
>
> On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > It isn't implemented in C++ yet but I will try to get a patch up for
> > that soon (today maybe). I think we should create a branch where we
> > can stack the patches that implement this for each language.
> >
> > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor <pt...@gmail.com>
> wrote:
> > >
> > > I'll do the JS updates. Is it safe to validate against the Arrow C++
> > > integration tests?
> > >
> > >
> > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > I created https://issues.apache.org/jira/browse/ARROW-6313 as a
> tracking
> > > > issue with sub-issues on the development work.  So far no-one has
> claimed
> > > > Java and Javascript tasks.
> > > >
> > > > Would it make sense to have a separate dev branch for this work?
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > >>
> > > >> I'll merge the specification patch later today and we can begin
> > > >> working on implementations so we can get this done for 0.15.0
> > > >>
> > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler <cu...@gmail.com>
> wrote:
> > > >>> +1 (non-binding)
> > > >>>
> > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou <so...@pitrou.net>
> > > >> wrote:
> > > >>>> Sorry, had forgotten to send my vote on this.
> > > >>>>
> > > >>>> +1 from me.
> > > >>>>
> > > >>>> Regards
> > > >>>>
> > > >>>> Antoine.
> > > >>>>
> > > >>>>
> > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500
> > > >>>> Wes McKinney <we...@gmail.com> wrote:
> > > >>>>> hi all,
> > > >>>>>
> > > >>>>> As we've been discussing [1], there is a need to introduce 4
> bytes of
> > > >>>>> padding into the preamble of the "encapsulated IPC message"
> format to
> > > >>>>> ensure that the Flatbuffers metadata payload begins on an 8-byte
> > > >>>>> aligned memory offset. The alternative to this would be for Arrow
> > > >>>>> implementations where alignment is important (e.g. C or C++) to
> copy
> > > >>>>> the metadata (which is not always small) into memory when it is
> > > >>>>> unaligned.
> > > >>>>>
> > > >>>>> Micah has proposed to address this by adding a
> > > >>>>> 4-byte "continuation" value at the beginning of the payload
> > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way is that
> > > >>>>> old clients will see an invalid length (what is currently the
> > > >>>>> first 4 bytes of the message -- a 32-bit little endian signed
> > > >>>>> integer indicating the metadata length) rather than potentially
> > > >>>>> crashing on a valid length. We also propose to expand the "end of
> > > >>>>> stream" marker used in the stream and file format from 4 to 8
> > > >>>>> bytes. This has the additional effect of aligning the file footer
> > > >>>>> defined in File.fbs.
> > > >>>>>
> > > >>>>> This would be a backwards incompatible protocol change, so older
> > > >> Arrow
> > > >>>>> libraries would not be able to read these new messages.
> Maintaining
> > > >>>>> forward compatibility (reading data produced by older libraries)
> > > >> would
> > > >>>>> be possible as we can reason that a value other than the
> continuation
> > > >>>>> value was produced by an older library (and then validate the
> > > >>>>> Flatbuffer message of course). Arrow implementations could offer
> a
> > > >>>>> backward compatibility mode for the sake of old readers if they
> > > >> desire
> > > >>>>> (this may also assist with testing).
> > > >>>>>
> > > >>>>> Additionally with this vote, we want to formally approve the
> change
> > > >> to
> > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > >> end-of-stream
> > > >>>>> marker, which enables code that processes Arrow streams to safely
> > > >> read
> > > >>>>> the file's internal messages as though they were a normal stream.
> > > >>>>>
> > > >>>>> The PR making these changes to the IPC documentation is here
> > > >>>>>
> > > >>>>> https://github.com/apache/arrow/pull/4951
> > > >>>>>
> > > >>>>> Please vote to accept these changes. This vote will be open for
> at
> > > >>>>> least 72 hours
> > > >>>>>
> > > >>>>> [ ] +1 Adopt these Arrow protocol changes
> > > >>>>> [ ] +0
> > > >>>>> [ ] -1 I disagree because...
> > > >>>>>
> > > >>>>> Here is my vote: +1
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Wes
> > > >>>>>
> > > >>>>> [1]:
> > > >>
> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> > > >>>>
> > > >>>>
> > > >>>>
>

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Ji Liu <ni...@aliyun.com.INVALID>.
I could take the Java implementation and will take a close watch on this issue in the next few days.

Thanks,
Ji Liu


------------------------------------------------------------------
From:Micah Kornfield <em...@gmail.com>
Send Time:2019年8月28日(星期三) 17:14
To:dev <de...@arrow.apache.org>
Cc:Paul Taylor <pt...@apache.org>
Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

I should have integration tests with 0.14.1 generated binaries in the next
few days.  I think the one remaining unassigned piece of work in the Java
implementation, i can take that up next if no one else gets to it.

On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com> wrote:

> Here's the C++ changes
>
> https://github.com/apache/arrow/pull/5211
>
> I'm going to create a integration branch where we can merge each patch
> before merging to master
>
> On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > It isn't implemented in C++ yet but I will try to get a patch up for
> > that soon (today maybe). I think we should create a branch where we
> > can stack the patches that implement this for each language.
> >
> > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor <pt...@gmail.com>
> wrote:
> > >
> > > I'll do the JS updates. Is it safe to validate against the Arrow C++
> > > integration tests?
> > >
> > >
> > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > I created https://issues.apache.org/jira/browse/ARROW-6313 as a
> tracking
> > > > issue with sub-issues on the development work.  So far no-one has
> claimed
> > > > Java and Javascript tasks.
> > > >
> > > > Would it make sense to have a separate dev branch for this work?
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > >>
> > > >> I'll merge the specification patch later today and we can begin
> > > >> working on implementations so we can get this done for 0.15.0
> > > >>
> > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler <cu...@gmail.com>
> wrote:
> > > >>> +1 (non-binding)
> > > >>>
> > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou <so...@pitrou.net>
> > > >> wrote:
> > > >>>> Sorry, had forgotten to send my vote on this.
> > > >>>>
> > > >>>> +1 from me.
> > > >>>>
> > > >>>> Regards
> > > >>>>
> > > >>>> Antoine.
> > > >>>>
> > > >>>>
> > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500
> > > >>>> Wes McKinney <we...@gmail.com> wrote:
> > > >>>>> hi all,
> > > >>>>>
> > > >>>>> As we've been discussing [1], there is a need to introduce 4
> bytes of
> > > >>>>> padding into the preamble of the "encapsulated IPC message"
> format to
> > > >>>>> ensure that the Flatbuffers metadata payload begins on an 8-byte
> > > >>>>> aligned memory offset. The alternative to this would be for Arrow
> > > >>>>> implementations where alignment is important (e.g. C or C++) to
> copy
> > > >>>>> the metadata (which is not always small) into memory when it is
> > > >>>>> unaligned.
> > > >>>>>
> > > >>>>> Micah has proposed to address this by adding a
> > > >>>>> 4-byte "continuation" value at the beginning of the payload
> > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way is that
> > > >>>>> old clients will see an invalid length (what is currently the
> > > >>>>> first 4 bytes of the message -- a 32-bit little endian signed
> > > >>>>> integer indicating the metadata length) rather than potentially
> > > >>>>> crashing on a valid length. We also propose to expand the "end of
> > > >>>>> stream" marker used in the stream and file format from 4 to 8
> > > >>>>> bytes. This has the additional effect of aligning the file footer
> > > >>>>> defined in File.fbs.
> > > >>>>>
> > > >>>>> This would be a backwards incompatible protocol change, so older
> > > >> Arrow
> > > >>>>> libraries would not be able to read these new messages.
> Maintaining
> > > >>>>> forward compatibility (reading data produced by older libraries)
> > > >> would
> > > >>>>> be possible as we can reason that a value other than the
> continuation
> > > >>>>> value was produced by an older library (and then validate the
> > > >>>>> Flatbuffer message of course). Arrow implementations could offer
> a
> > > >>>>> backward compatibility mode for the sake of old readers if they
> > > >> desire
> > > >>>>> (this may also assist with testing).
> > > >>>>>
> > > >>>>> Additionally with this vote, we want to formally approve the
> change
> > > >> to
> > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > >> end-of-stream
> > > >>>>> marker, which enables code that processes Arrow streams to safely
> > > >> read
> > > >>>>> the file's internal messages as though they were a normal stream.
> > > >>>>>
> > > >>>>> The PR making these changes to the IPC documentation is here
> > > >>>>>
> > > >>>>> https://github.com/apache/arrow/pull/4951
> > > >>>>>
> > > >>>>> Please vote to accept these changes. This vote will be open for
> at
> > > >>>>> least 72 hours
> > > >>>>>
> > > >>>>> [ ] +1 Adopt these Arrow protocol changes
> > > >>>>> [ ] +0
> > > >>>>> [ ] -1 I disagree because...
> > > >>>>>
> > > >>>>> Here is my vote: +1
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Wes
> > > >>>>>
> > > >>>>> [1]:
> > > >>
> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> > > >>>>
> > > >>>>
> > > >>>>
>

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Micah Kornfield <em...@gmail.com>.
I should have integration tests with 0.14.1 generated binaries in the next
few days.  I think the one remaining unassigned piece of work in the Java
implementation, i can take that up next if no one else gets to it.

On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <we...@gmail.com> wrote:

> Here's the C++ changes
>
> https://github.com/apache/arrow/pull/5211
>
> I'm going to create a integration branch where we can merge each patch
> before merging to master
>
> On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > It isn't implemented in C++ yet but I will try to get a patch up for
> > that soon (today maybe). I think we should create a branch where we
> > can stack the patches that implement this for each language.
> >
> > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor <pt...@gmail.com>
> wrote:
> > >
> > > I'll do the JS updates. Is it safe to validate against the Arrow C++
> > > integration tests?
> > >
> > >
> > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > I created https://issues.apache.org/jira/browse/ARROW-6313 as a
> tracking
> > > > issue with sub-issues on the development work.  So far no-one has
> claimed
> > > > Java and Javascript tasks.
> > > >
> > > > Would it make sense to have a separate dev branch for this work?
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > >>
> > > >> I'll merge the specification patch later today and we can begin
> > > >> working on implementations so we can get this done for 0.15.0
> > > >>
> > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler <cu...@gmail.com>
> wrote:
> > > >>> +1 (non-binding)
> > > >>>
> > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou <so...@pitrou.net>
> > > >> wrote:
> > > >>>> Sorry, had forgotten to send my vote on this.
> > > >>>>
> > > >>>> +1 from me.
> > > >>>>
> > > >>>> Regards
> > > >>>>
> > > >>>> Antoine.
> > > >>>>
> > > >>>>
> > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500
> > > >>>> Wes McKinney <we...@gmail.com> wrote:
> > > >>>>> hi all,
> > > >>>>>
> > > >>>>> As we've been discussing [1], there is a need to introduce 4
> bytes of
> > > >>>>> padding into the preamble of the "encapsulated IPC message"
> format to
> > > >>>>> ensure that the Flatbuffers metadata payload begins on an 8-byte
> > > >>>>> aligned memory offset. The alternative to this would be for Arrow
> > > >>>>> implementations where alignment is important (e.g. C or C++) to
> copy
> > > >>>>> the metadata (which is not always small) into memory when it is
> > > >>>>> unaligned.
> > > >>>>>
> > > >>>>> Micah has proposed to address this by adding a
> > > >>>>> 4-byte "continuation" value at the beginning of the payload
> > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way is that
> > > >>>>> old clients will see an invalid length (what is currently the
> > > >>>>> first 4 bytes of the message -- a 32-bit little endian signed
> > > >>>>> integer indicating the metadata length) rather than potentially
> > > >>>>> crashing on a valid length. We also propose to expand the "end of
> > > >>>>> stream" marker used in the stream and file format from 4 to 8
> > > >>>>> bytes. This has the additional effect of aligning the file footer
> > > >>>>> defined in File.fbs.
> > > >>>>>
> > > >>>>> This would be a backwards incompatible protocol change, so older
> > > >> Arrow
> > > >>>>> libraries would not be able to read these new messages.
> Maintaining
> > > >>>>> forward compatibility (reading data produced by older libraries)
> > > >> would
> > > >>>>> be possible as we can reason that a value other than the
> continuation
> > > >>>>> value was produced by an older library (and then validate the
> > > >>>>> Flatbuffer message of course). Arrow implementations could offer
> a
> > > >>>>> backward compatibility mode for the sake of old readers if they
> > > >> desire
> > > >>>>> (this may also assist with testing).
> > > >>>>>
> > > >>>>> Additionally with this vote, we want to formally approve the
> change
> > > >> to
> > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > >> end-of-stream
> > > >>>>> marker, which enables code that processes Arrow streams to safely
> > > >> read
> > > >>>>> the file's internal messages as though they were a normal stream.
> > > >>>>>
> > > >>>>> The PR making these changes to the IPC documentation is here
> > > >>>>>
> > > >>>>> https://github.com/apache/arrow/pull/4951
> > > >>>>>
> > > >>>>> Please vote to accept these changes. This vote will be open for
> at
> > > >>>>> least 72 hours
> > > >>>>>
> > > >>>>> [ ] +1 Adopt these Arrow protocol changes
> > > >>>>> [ ] +0
> > > >>>>> [ ] -1 I disagree because...
> > > >>>>>
> > > >>>>> Here is my vote: +1
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Wes
> > > >>>>>
> > > >>>>> [1]:
> > > >>
> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> > > >>>>
> > > >>>>
> > > >>>>
>

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Wes McKinney <we...@gmail.com>.
Here's the C++ changes

https://github.com/apache/arrow/pull/5211

I'm going to create a integration branch where we can merge each patch
before merging to master

On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <we...@gmail.com> wrote:
>
> It isn't implemented in C++ yet but I will try to get a patch up for
> that soon (today maybe). I think we should create a branch where we
> can stack the patches that implement this for each language.
>
> On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor <pt...@gmail.com> wrote:
> >
> > I'll do the JS updates. Is it safe to validate against the Arrow C++
> > integration tests?
> >
> >
> > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > I created https://issues.apache.org/jira/browse/ARROW-6313 as a tracking
> > > issue with sub-issues on the development work.  So far no-one has claimed
> > > Java and Javascript tasks.
> > >
> > > Would it make sense to have a separate dev branch for this work?
> > >
> > > Thanks,
> > > Micah
> > >
> > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:
> > >
> > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > >>
> > >> I'll merge the specification patch later today and we can begin
> > >> working on implementations so we can get this done for 0.15.0
> > >>
> > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler <cu...@gmail.com> wrote:
> > >>> +1 (non-binding)
> > >>>
> > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou <so...@pitrou.net>
> > >> wrote:
> > >>>> Sorry, had forgotten to send my vote on this.
> > >>>>
> > >>>> +1 from me.
> > >>>>
> > >>>> Regards
> > >>>>
> > >>>> Antoine.
> > >>>>
> > >>>>
> > >>>> On Wed, 14 Aug 2019 17:42:33 -0500
> > >>>> Wes McKinney <we...@gmail.com> wrote:
> > >>>>> hi all,
> > >>>>>
> > >>>>> As we've been discussing [1], there is a need to introduce 4 bytes of
> > >>>>> padding into the preamble of the "encapsulated IPC message" format to
> > >>>>> ensure that the Flatbuffers metadata payload begins on an 8-byte
> > >>>>> aligned memory offset. The alternative to this would be for Arrow
> > >>>>> implementations where alignment is important (e.g. C or C++) to copy
> > >>>>> the metadata (which is not always small) into memory when it is
> > >>>>> unaligned.
> > >>>>>
> > >>>>> Micah has proposed to address this by adding a
> > >>>>> 4-byte "continuation" value at the beginning of the payload
> > >>>>> having the value 0xFFFFFFFF. The reason to do it this way is that
> > >>>>> old clients will see an invalid length (what is currently the
> > >>>>> first 4 bytes of the message -- a 32-bit little endian signed
> > >>>>> integer indicating the metadata length) rather than potentially
> > >>>>> crashing on a valid length. We also propose to expand the "end of
> > >>>>> stream" marker used in the stream and file format from 4 to 8
> > >>>>> bytes. This has the additional effect of aligning the file footer
> > >>>>> defined in File.fbs.
> > >>>>>
> > >>>>> This would be a backwards incompatible protocol change, so older
> > >> Arrow
> > >>>>> libraries would not be able to read these new messages. Maintaining
> > >>>>> forward compatibility (reading data produced by older libraries)
> > >> would
> > >>>>> be possible as we can reason that a value other than the continuation
> > >>>>> value was produced by an older library (and then validate the
> > >>>>> Flatbuffer message of course). Arrow implementations could offer a
> > >>>>> backward compatibility mode for the sake of old readers if they
> > >> desire
> > >>>>> (this may also assist with testing).
> > >>>>>
> > >>>>> Additionally with this vote, we want to formally approve the change
> > >> to
> > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > >> end-of-stream
> > >>>>> marker, which enables code that processes Arrow streams to safely
> > >> read
> > >>>>> the file's internal messages as though they were a normal stream.
> > >>>>>
> > >>>>> The PR making these changes to the IPC documentation is here
> > >>>>>
> > >>>>> https://github.com/apache/arrow/pull/4951
> > >>>>>
> > >>>>> Please vote to accept these changes. This vote will be open for at
> > >>>>> least 72 hours
> > >>>>>
> > >>>>> [ ] +1 Adopt these Arrow protocol changes
> > >>>>> [ ] +0
> > >>>>> [ ] -1 I disagree because...
> > >>>>>
> > >>>>> Here is my vote: +1
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Wes
> > >>>>>
> > >>>>> [1]:
> > >> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> > >>>>
> > >>>>
> > >>>>

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Wes McKinney <we...@gmail.com>.
It isn't implemented in C++ yet but I will try to get a patch up for
that soon (today maybe). I think we should create a branch where we
can stack the patches that implement this for each language.

On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor <pt...@gmail.com> wrote:
>
> I'll do the JS updates. Is it safe to validate against the Arrow C++
> integration tests?
>
>
> On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > I created https://issues.apache.org/jira/browse/ARROW-6313 as a tracking
> > issue with sub-issues on the development work.  So far no-one has claimed
> > Java and Javascript tasks.
> >
> > Would it make sense to have a separate dev branch for this work?
> >
> > Thanks,
> > Micah
> >
> > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:
> >
> >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> >>
> >> I'll merge the specification patch later today and we can begin
> >> working on implementations so we can get this done for 0.15.0
> >>
> >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler <cu...@gmail.com> wrote:
> >>> +1 (non-binding)
> >>>
> >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou <so...@pitrou.net>
> >> wrote:
> >>>> Sorry, had forgotten to send my vote on this.
> >>>>
> >>>> +1 from me.
> >>>>
> >>>> Regards
> >>>>
> >>>> Antoine.
> >>>>
> >>>>
> >>>> On Wed, 14 Aug 2019 17:42:33 -0500
> >>>> Wes McKinney <we...@gmail.com> wrote:
> >>>>> hi all,
> >>>>>
> >>>>> As we've been discussing [1], there is a need to introduce 4 bytes of
> >>>>> padding into the preamble of the "encapsulated IPC message" format to
> >>>>> ensure that the Flatbuffers metadata payload begins on an 8-byte
> >>>>> aligned memory offset. The alternative to this would be for Arrow
> >>>>> implementations where alignment is important (e.g. C or C++) to copy
> >>>>> the metadata (which is not always small) into memory when it is
> >>>>> unaligned.
> >>>>>
> >>>>> Micah has proposed to address this by adding a
> >>>>> 4-byte "continuation" value at the beginning of the payload
> >>>>> having the value 0xFFFFFFFF. The reason to do it this way is that
> >>>>> old clients will see an invalid length (what is currently the
> >>>>> first 4 bytes of the message -- a 32-bit little endian signed
> >>>>> integer indicating the metadata length) rather than potentially
> >>>>> crashing on a valid length. We also propose to expand the "end of
> >>>>> stream" marker used in the stream and file format from 4 to 8
> >>>>> bytes. This has the additional effect of aligning the file footer
> >>>>> defined in File.fbs.
> >>>>>
> >>>>> This would be a backwards incompatible protocol change, so older
> >> Arrow
> >>>>> libraries would not be able to read these new messages. Maintaining
> >>>>> forward compatibility (reading data produced by older libraries)
> >> would
> >>>>> be possible as we can reason that a value other than the continuation
> >>>>> value was produced by an older library (and then validate the
> >>>>> Flatbuffer message of course). Arrow implementations could offer a
> >>>>> backward compatibility mode for the sake of old readers if they
> >> desire
> >>>>> (this may also assist with testing).
> >>>>>
> >>>>> Additionally with this vote, we want to formally approve the change
> >> to
> >>>>> the Arrow "file" format to always write the (new 8-byte)
> >> end-of-stream
> >>>>> marker, which enables code that processes Arrow streams to safely
> >> read
> >>>>> the file's internal messages as though they were a normal stream.
> >>>>>
> >>>>> The PR making these changes to the IPC documentation is here
> >>>>>
> >>>>> https://github.com/apache/arrow/pull/4951
> >>>>>
> >>>>> Please vote to accept these changes. This vote will be open for at
> >>>>> least 72 hours
> >>>>>
> >>>>> [ ] +1 Adopt these Arrow protocol changes
> >>>>> [ ] +0
> >>>>> [ ] -1 I disagree because...
> >>>>>
> >>>>> Here is my vote: +1
> >>>>>
> >>>>> Thanks,
> >>>>> Wes
> >>>>>
> >>>>> [1]:
> >> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> >>>>
> >>>>
> >>>>

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Paul Taylor <pt...@gmail.com>.
I'll do the JS updates. Is it safe to validate against the Arrow C++ 
integration tests?


On 8/22/19 7:28 PM, Micah Kornfield wrote:
> I created https://issues.apache.org/jira/browse/ARROW-6313 as a tracking
> issue with sub-issues on the development work.  So far no-one has claimed
> Java and Javascript tasks.
>
> Would it make sense to have a separate dev branch for this work?
>
> Thanks,
> Micah
>
> On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:
>
>> The vote carries with 4 binding +1 votes and 1 non-binding +1
>>
>> I'll merge the specification patch later today and we can begin
>> working on implementations so we can get this done for 0.15.0
>>
>> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler <cu...@gmail.com> wrote:
>>> +1 (non-binding)
>>>
>>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou <so...@pitrou.net>
>> wrote:
>>>> Sorry, had forgotten to send my vote on this.
>>>>
>>>> +1 from me.
>>>>
>>>> Regards
>>>>
>>>> Antoine.
>>>>
>>>>
>>>> On Wed, 14 Aug 2019 17:42:33 -0500
>>>> Wes McKinney <we...@gmail.com> wrote:
>>>>> hi all,
>>>>>
>>>>> As we've been discussing [1], there is a need to introduce 4 bytes of
>>>>> padding into the preamble of the "encapsulated IPC message" format to
>>>>> ensure that the Flatbuffers metadata payload begins on an 8-byte
>>>>> aligned memory offset. The alternative to this would be for Arrow
>>>>> implementations where alignment is important (e.g. C or C++) to copy
>>>>> the metadata (which is not always small) into memory when it is
>>>>> unaligned.
>>>>>
>>>>> Micah has proposed to address this by adding a
>>>>> 4-byte "continuation" value at the beginning of the payload
>>>>> having the value 0xFFFFFFFF. The reason to do it this way is that
>>>>> old clients will see an invalid length (what is currently the
>>>>> first 4 bytes of the message -- a 32-bit little endian signed
>>>>> integer indicating the metadata length) rather than potentially
>>>>> crashing on a valid length. We also propose to expand the "end of
>>>>> stream" marker used in the stream and file format from 4 to 8
>>>>> bytes. This has the additional effect of aligning the file footer
>>>>> defined in File.fbs.
>>>>>
>>>>> This would be a backwards incompatible protocol change, so older
>> Arrow
>>>>> libraries would not be able to read these new messages. Maintaining
>>>>> forward compatibility (reading data produced by older libraries)
>> would
>>>>> be possible as we can reason that a value other than the continuation
>>>>> value was produced by an older library (and then validate the
>>>>> Flatbuffer message of course). Arrow implementations could offer a
>>>>> backward compatibility mode for the sake of old readers if they
>> desire
>>>>> (this may also assist with testing).
>>>>>
>>>>> Additionally with this vote, we want to formally approve the change
>> to
>>>>> the Arrow "file" format to always write the (new 8-byte)
>> end-of-stream
>>>>> marker, which enables code that processes Arrow streams to safely
>> read
>>>>> the file's internal messages as though they were a normal stream.
>>>>>
>>>>> The PR making these changes to the IPC documentation is here
>>>>>
>>>>> https://github.com/apache/arrow/pull/4951
>>>>>
>>>>> Please vote to accept these changes. This vote will be open for at
>>>>> least 72 hours
>>>>>
>>>>> [ ] +1 Adopt these Arrow protocol changes
>>>>> [ ] +0
>>>>> [ ] -1 I disagree because...
>>>>>
>>>>> Here is my vote: +1
>>>>>
>>>>> Thanks,
>>>>> Wes
>>>>>
>>>>> [1]:
>> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
>>>>
>>>>
>>>>

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

Posted by Micah Kornfield <em...@gmail.com>.
I created https://issues.apache.org/jira/browse/ARROW-6313 as a tracking
issue with sub-issues on the development work.  So far no-one has claimed
Java and Javascript tasks.

Would it make sense to have a separate dev branch for this work?

Thanks,
Micah

On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:

> The vote carries with 4 binding +1 votes and 1 non-binding +1
>
> I'll merge the specification patch later today and we can begin
> working on implementations so we can get this done for 0.15.0
>
> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler <cu...@gmail.com> wrote:
> >
> > +1 (non-binding)
> >
> > On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou <so...@pitrou.net>
> wrote:
> >
> > >
> > > Sorry, had forgotten to send my vote on this.
> > >
> > > +1 from me.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > On Wed, 14 Aug 2019 17:42:33 -0500
> > > Wes McKinney <we...@gmail.com> wrote:
> > > > hi all,
> > > >
> > > > As we've been discussing [1], there is a need to introduce 4 bytes of
> > > > padding into the preamble of the "encapsulated IPC message" format to
> > > > ensure that the Flatbuffers metadata payload begins on an 8-byte
> > > > aligned memory offset. The alternative to this would be for Arrow
> > > > implementations where alignment is important (e.g. C or C++) to copy
> > > > the metadata (which is not always small) into memory when it is
> > > > unaligned.
> > > >
> > > > Micah has proposed to address this by adding a
> > > > 4-byte "continuation" value at the beginning of the payload
> > > > having the value 0xFFFFFFFF. The reason to do it this way is that
> > > > old clients will see an invalid length (what is currently the
> > > > first 4 bytes of the message -- a 32-bit little endian signed
> > > > integer indicating the metadata length) rather than potentially
> > > > crashing on a valid length. We also propose to expand the "end of
> > > > stream" marker used in the stream and file format from 4 to 8
> > > > bytes. This has the additional effect of aligning the file footer
> > > > defined in File.fbs.
> > > >
> > > > This would be a backwards incompatible protocol change, so older
> Arrow
> > > > libraries would not be able to read these new messages. Maintaining
> > > > forward compatibility (reading data produced by older libraries)
> would
> > > > be possible as we can reason that a value other than the continuation
> > > > value was produced by an older library (and then validate the
> > > > Flatbuffer message of course). Arrow implementations could offer a
> > > > backward compatibility mode for the sake of old readers if they
> desire
> > > > (this may also assist with testing).
> > > >
> > > > Additionally with this vote, we want to formally approve the change
> to
> > > > the Arrow "file" format to always write the (new 8-byte)
> end-of-stream
> > > > marker, which enables code that processes Arrow streams to safely
> read
> > > > the file's internal messages as though they were a normal stream.
> > > >
> > > > The PR making these changes to the IPC documentation is here
> > > >
> > > > https://github.com/apache/arrow/pull/4951
> > > >
> > > > Please vote to accept these changes. This vote will be open for at
> > > > least 72 hours
> > > >
> > > > [ ] +1 Adopt these Arrow protocol changes
> > > > [ ] +0
> > > > [ ] -1 I disagree because...
> > > >
> > > > Here is my vote: +1
> > > >
> > > > Thanks,
> > > > Wes
> > > >
> > > > [1]:
> > >
> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E
> > > >
> > >
> > >
> > >
> > >
>