You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2019/04/25 20:33:55 UTC
[VOTE] Add 64-bit offset list, binary, string (utf8) types to the
Arrow columnar format
In a recent mailing list discussion [1] Micah Kornfield has proposed
to add new list and variable-size binary and unicode types to the
Arrow columnar format with 64-bit signed integer offsets, to be used
in addition to the existing 32-bit offset varieties. These will be
implemented as new types in the Type union in Schema.fbs (the
particular names can be debated in the PR that implements them):
LargeList
LargeBinary
LargeString [UTF8]
While very large contiguous columns are not a principle use case for
the columnar format, it has been observed empirically that there are
applications that use the format to represent datasets where
realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
column and cannot be easily (or at all) split into smaller chunks.
Please vote whether to accept the changes. The vote will be open for at
least 72 hours.
[ ] +1 Accept the additions to the columnar format
[ ] +0
[ ] -1 Do not accept the changes because...
Thanks,
Wes
[1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
[RESULT] [VOTE] Add 64-bit offset list, binary, string (utf8) types
to the Arrow columnar format
Posted by Wes McKinney <we...@gmail.com>.
The vote carries with 3 binding +1 and 2 non-binding +1
On Fri, Apr 26, 2019 at 10:05 AM Brian Bowman <Br...@sas.com> wrote:
>
> Can non-Arrow PMC members/committers vote?
>
> If so, +1
>
> -Brian
>
> On 4/25/19, 4:34 PM, "Wes McKinney" <we...@gmail.com> wrote:
>
> EXTERNAL
>
> In a recent mailing list discussion [1] Micah Kornfield has proposed
> to add new list and variable-size binary and unicode types to the
> Arrow columnar format with 64-bit signed integer offsets, to be used
> in addition to the existing 32-bit offset varieties. These will be
> implemented as new types in the Type union in Schema.fbs (the
> particular names can be debated in the PR that implements them):
>
> LargeList
> LargeBinary
> LargeString [UTF8]
>
> While very large contiguous columns are not a principle use case for
> the columnar format, it has been observed empirically that there are
> applications that use the format to represent datasets where
> realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
> column and cannot be easily (or at all) split into smaller chunks.
>
> Please vote whether to accept the changes. The vote will be open for at
> least 72 hours.
>
> [ ] +1 Accept the additions to the columnar format
> [ ] +0
> [ ] -1 Do not accept the changes because...
>
> Thanks,
> Wes
>
> [1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
>
>
Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the
Arrow columnar format
Posted by Brian Bowman <Br...@sas.com>.
Can non-Arrow PMC members/committers vote?
If so, +1
-Brian
On 4/25/19, 4:34 PM, "Wes McKinney" <we...@gmail.com> wrote:
EXTERNAL
In a recent mailing list discussion [1] Micah Kornfield has proposed
to add new list and variable-size binary and unicode types to the
Arrow columnar format with 64-bit signed integer offsets, to be used
in addition to the existing 32-bit offset varieties. These will be
implemented as new types in the Type union in Schema.fbs (the
particular names can be debated in the PR that implements them):
LargeList
LargeBinary
LargeString [UTF8]
While very large contiguous columns are not a principle use case for
the columnar format, it has been observed empirically that there are
applications that use the format to represent datasets where
realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
column and cannot be easily (or at all) split into smaller chunks.
Please vote whether to accept the changes. The vote will be open for at
least 72 hours.
[ ] +1 Accept the additions to the columnar format
[ ] +0
[ ] -1 Do not accept the changes because...
Thanks,
Wes
[1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the
Arrow columnar format
Posted by Antoine Pitrou <an...@python.org>.
+1 (binding)
Regards
Antoine.
Le 25/04/2019 à 22:33, Wes McKinney a écrit :
> In a recent mailing list discussion [1] Micah Kornfield has proposed
> to add new list and variable-size binary and unicode types to the
> Arrow columnar format with 64-bit signed integer offsets, to be used
> in addition to the existing 32-bit offset varieties. These will be
> implemented as new types in the Type union in Schema.fbs (the
> particular names can be debated in the PR that implements them):
>
> LargeList
> LargeBinary
> LargeString [UTF8]
>
> While very large contiguous columns are not a principle use case for
> the columnar format, it has been observed empirically that there are
> applications that use the format to represent datasets where
> realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
> column and cannot be easily (or at all) split into smaller chunks.
>
> Please vote whether to accept the changes. The vote will be open for at
> least 72 hours.
>
> [ ] +1 Accept the additions to the columnar format
> [ ] +0
> [ ] -1 Do not accept the changes because...
>
> Thanks,
> Wes
>
> [1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
>
Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the
Arrow columnar format
Posted by Micah Kornfield <em...@gmail.com>.
+1 (non-binding)
On Thu, Apr 25, 2019 at 1:58 PM Philipp Moritz <pc...@gmail.com> wrote:
> +1 (binding)
>
> On Thu, Apr 25, 2019 at 1:34 PM Wes McKinney <we...@gmail.com> wrote:
>
> > +1 (binding)
> >
> > On Thu, Apr 25, 2019 at 3:33 PM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > In a recent mailing list discussion [1] Micah Kornfield has proposed
> > > to add new list and variable-size binary and unicode types to the
> > > Arrow columnar format with 64-bit signed integer offsets, to be used
> > > in addition to the existing 32-bit offset varieties. These will be
> > > implemented as new types in the Type union in Schema.fbs (the
> > > particular names can be debated in the PR that implements them):
> > >
> > > LargeList
> > > LargeBinary
> > > LargeString [UTF8]
> > >
> > > While very large contiguous columns are not a principle use case for
> > > the columnar format, it has been observed empirically that there are
> > > applications that use the format to represent datasets where
> > > realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
> > > column and cannot be easily (or at all) split into smaller chunks.
> > >
> > > Please vote whether to accept the changes. The vote will be open for at
> > > least 72 hours.
> > >
> > > [ ] +1 Accept the additions to the columnar format
> > > [ ] +0
> > > [ ] -1 Do not accept the changes because...
> > >
> > > Thanks,
> > > Wes
> > >
> > > [1]:
> >
> https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
> >
>
Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the
Arrow columnar format
Posted by Philipp Moritz <pc...@gmail.com>.
+1 (binding)
On Thu, Apr 25, 2019 at 1:34 PM Wes McKinney <we...@gmail.com> wrote:
> +1 (binding)
>
> On Thu, Apr 25, 2019 at 3:33 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > In a recent mailing list discussion [1] Micah Kornfield has proposed
> > to add new list and variable-size binary and unicode types to the
> > Arrow columnar format with 64-bit signed integer offsets, to be used
> > in addition to the existing 32-bit offset varieties. These will be
> > implemented as new types in the Type union in Schema.fbs (the
> > particular names can be debated in the PR that implements them):
> >
> > LargeList
> > LargeBinary
> > LargeString [UTF8]
> >
> > While very large contiguous columns are not a principle use case for
> > the columnar format, it has been observed empirically that there are
> > applications that use the format to represent datasets where
> > realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
> > column and cannot be easily (or at all) split into smaller chunks.
> >
> > Please vote whether to accept the changes. The vote will be open for at
> > least 72 hours.
> >
> > [ ] +1 Accept the additions to the columnar format
> > [ ] +0
> > [ ] -1 Do not accept the changes because...
> >
> > Thanks,
> > Wes
> >
> > [1]:
> https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
>
Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the
Arrow columnar format
Posted by Wes McKinney <we...@gmail.com>.
+1 (binding)
On Thu, Apr 25, 2019 at 3:33 PM Wes McKinney <we...@gmail.com> wrote:
>
> In a recent mailing list discussion [1] Micah Kornfield has proposed
> to add new list and variable-size binary and unicode types to the
> Arrow columnar format with 64-bit signed integer offsets, to be used
> in addition to the existing 32-bit offset varieties. These will be
> implemented as new types in the Type union in Schema.fbs (the
> particular names can be debated in the PR that implements them):
>
> LargeList
> LargeBinary
> LargeString [UTF8]
>
> While very large contiguous columns are not a principle use case for
> the columnar format, it has been observed empirically that there are
> applications that use the format to represent datasets where
> realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
> column and cannot be easily (or at all) split into smaller chunks.
>
> Please vote whether to accept the changes. The vote will be open for at
> least 72 hours.
>
> [ ] +1 Accept the additions to the columnar format
> [ ] +0
> [ ] -1 Do not accept the changes because...
>
> Thanks,
> Wes
>
> [1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E