You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2019/04/25 20:33:55 UTC

[VOTE] Add 64-bit offset list, binary, string (utf8) types to the Arrow columnar format

In a recent mailing list discussion [1] Micah Kornfield has proposed
to add new list and variable-size binary and unicode types to the
Arrow columnar format with 64-bit signed integer offsets, to be used
in addition to the existing 32-bit offset varieties. These will be
implemented as new types in the Type union in Schema.fbs (the
particular names can be debated in the PR that implements them):

LargeList
LargeBinary
LargeString [UTF8]

While very large contiguous columns are not a principle use case for
the columnar format, it has been observed empirically that there are
applications that use the format to represent datasets where
realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
column and cannot be easily (or at all) split into smaller chunks.

Please vote whether to accept the changes. The vote will be open for at
least 72 hours.

[ ] +1 Accept the additions to the columnar format
[ ] +0
[ ] -1 Do not accept the changes because...

Thanks,
Wes

[1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E

[RESULT] [VOTE] Add 64-bit offset list, binary, string (utf8) types to the Arrow columnar format

Posted by Wes McKinney <we...@gmail.com>.
The vote carries with 3 binding +1 and 2 non-binding +1

On Fri, Apr 26, 2019 at 10:05 AM Brian Bowman <Br...@sas.com> wrote:
>
> Can non-Arrow PMC members/committers vote?
>
> If so, +1
>
> -Brian
>
> On 4/25/19, 4:34 PM, "Wes McKinney" <we...@gmail.com> wrote:
>
>     EXTERNAL
>
>     In a recent mailing list discussion [1] Micah Kornfield has proposed
>     to add new list and variable-size binary and unicode types to the
>     Arrow columnar format with 64-bit signed integer offsets, to be used
>     in addition to the existing 32-bit offset varieties. These will be
>     implemented as new types in the Type union in Schema.fbs (the
>     particular names can be debated in the PR that implements them):
>
>     LargeList
>     LargeBinary
>     LargeString [UTF8]
>
>     While very large contiguous columns are not a principle use case for
>     the columnar format, it has been observed empirically that there are
>     applications that use the format to represent datasets where
>     realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
>     column and cannot be easily (or at all) split into smaller chunks.
>
>     Please vote whether to accept the changes. The vote will be open for at
>     least 72 hours.
>
>     [ ] +1 Accept the additions to the columnar format
>     [ ] +0
>     [ ] -1 Do not accept the changes because...
>
>     Thanks,
>     Wes
>
>     [1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
>
>

Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the Arrow columnar format

Posted by Brian Bowman <Br...@sas.com>.
Can non-Arrow PMC members/committers vote?  

If so, +1 

-Brian

On 4/25/19, 4:34 PM, "Wes McKinney" <we...@gmail.com> wrote:

    EXTERNAL
    
    In a recent mailing list discussion [1] Micah Kornfield has proposed
    to add new list and variable-size binary and unicode types to the
    Arrow columnar format with 64-bit signed integer offsets, to be used
    in addition to the existing 32-bit offset varieties. These will be
    implemented as new types in the Type union in Schema.fbs (the
    particular names can be debated in the PR that implements them):
    
    LargeList
    LargeBinary
    LargeString [UTF8]
    
    While very large contiguous columns are not a principle use case for
    the columnar format, it has been observed empirically that there are
    applications that use the format to represent datasets where
    realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
    column and cannot be easily (or at all) split into smaller chunks.
    
    Please vote whether to accept the changes. The vote will be open for at
    least 72 hours.
    
    [ ] +1 Accept the additions to the columnar format
    [ ] +0
    [ ] -1 Do not accept the changes because...
    
    Thanks,
    Wes
    
    [1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
    


Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the Arrow columnar format

Posted by Antoine Pitrou <an...@python.org>.
+1 (binding)

Regards

Antoine.


Le 25/04/2019 à 22:33, Wes McKinney a écrit :
> In a recent mailing list discussion [1] Micah Kornfield has proposed
> to add new list and variable-size binary and unicode types to the
> Arrow columnar format with 64-bit signed integer offsets, to be used
> in addition to the existing 32-bit offset varieties. These will be
> implemented as new types in the Type union in Schema.fbs (the
> particular names can be debated in the PR that implements them):
> 
> LargeList
> LargeBinary
> LargeString [UTF8]
> 
> While very large contiguous columns are not a principle use case for
> the columnar format, it has been observed empirically that there are
> applications that use the format to represent datasets where
> realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
> column and cannot be easily (or at all) split into smaller chunks.
> 
> Please vote whether to accept the changes. The vote will be open for at
> least 72 hours.
> 
> [ ] +1 Accept the additions to the columnar format
> [ ] +0
> [ ] -1 Do not accept the changes because...
> 
> Thanks,
> Wes
> 
> [1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
> 

Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the Arrow columnar format

Posted by Micah Kornfield <em...@gmail.com>.
+1 (non-binding)

On Thu, Apr 25, 2019 at 1:58 PM Philipp Moritz <pc...@gmail.com> wrote:

> +1 (binding)
>
> On Thu, Apr 25, 2019 at 1:34 PM Wes McKinney <we...@gmail.com> wrote:
>
> > +1 (binding)
> >
> > On Thu, Apr 25, 2019 at 3:33 PM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > In a recent mailing list discussion [1] Micah Kornfield has proposed
> > > to add new list and variable-size binary and unicode types to the
> > > Arrow columnar format with 64-bit signed integer offsets, to be used
> > > in addition to the existing 32-bit offset varieties. These will be
> > > implemented as new types in the Type union in Schema.fbs (the
> > > particular names can be debated in the PR that implements them):
> > >
> > > LargeList
> > > LargeBinary
> > > LargeString [UTF8]
> > >
> > > While very large contiguous columns are not a principle use case for
> > > the columnar format, it has been observed empirically that there are
> > > applications that use the format to represent datasets where
> > > realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
> > > column and cannot be easily (or at all) split into smaller chunks.
> > >
> > > Please vote whether to accept the changes. The vote will be open for at
> > > least 72 hours.
> > >
> > > [ ] +1 Accept the additions to the columnar format
> > > [ ] +0
> > > [ ] -1 Do not accept the changes because...
> > >
> > > Thanks,
> > > Wes
> > >
> > > [1]:
> >
> https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
> >
>

Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the Arrow columnar format

Posted by Philipp Moritz <pc...@gmail.com>.
+1 (binding)

On Thu, Apr 25, 2019 at 1:34 PM Wes McKinney <we...@gmail.com> wrote:

> +1 (binding)
>
> On Thu, Apr 25, 2019 at 3:33 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > In a recent mailing list discussion [1] Micah Kornfield has proposed
> > to add new list and variable-size binary and unicode types to the
> > Arrow columnar format with 64-bit signed integer offsets, to be used
> > in addition to the existing 32-bit offset varieties. These will be
> > implemented as new types in the Type union in Schema.fbs (the
> > particular names can be debated in the PR that implements them):
> >
> > LargeList
> > LargeBinary
> > LargeString [UTF8]
> >
> > While very large contiguous columns are not a principle use case for
> > the columnar format, it has been observed empirically that there are
> > applications that use the format to represent datasets where
> > realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
> > column and cannot be easily (or at all) split into smaller chunks.
> >
> > Please vote whether to accept the changes. The vote will be open for at
> > least 72 hours.
> >
> > [ ] +1 Accept the additions to the columnar format
> > [ ] +0
> > [ ] -1 Do not accept the changes because...
> >
> > Thanks,
> > Wes
> >
> > [1]:
> https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
>

Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the Arrow columnar format

Posted by Wes McKinney <we...@gmail.com>.
+1 (binding)

On Thu, Apr 25, 2019 at 3:33 PM Wes McKinney <we...@gmail.com> wrote:
>
> In a recent mailing list discussion [1] Micah Kornfield has proposed
> to add new list and variable-size binary and unicode types to the
> Arrow columnar format with 64-bit signed integer offsets, to be used
> in addition to the existing 32-bit offset varieties. These will be
> implemented as new types in the Type union in Schema.fbs (the
> particular names can be debated in the PR that implements them):
>
> LargeList
> LargeBinary
> LargeString [UTF8]
>
> While very large contiguous columns are not a principle use case for
> the columnar format, it has been observed empirically that there are
> applications that use the format to represent datasets where
> realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
> column and cannot be easily (or at all) split into smaller chunks.
>
> Please vote whether to accept the changes. The vote will be open for at
> least 72 hours.
>
> [ ] +1 Accept the additions to the columnar format
> [ ] +0
> [ ] -1 Do not accept the changes because...
>
> Thanks,
> Wes
>
> [1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E