You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2019/11/21 04:41:57 UTC

[VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Hello,
As discussed on [1], I've proposed clarifications in a PR [2] that
clarifies:

1.  It is not required that all dictionary batches occur at the beginning
of the IPC stream format (if a the first record batch has an all null
dictionary encoded column, the null column's dictionary might not be sent
until later in the stream).

2.  A second dictionary batch for the same ID that is not a "delta batch"
in an IPC stream indicates the dictionary should be replaced.

3.  Clarifies that the file format, can only contain 1 "NON-delta"
dictionary batch and multiple "delta" dictionary batches. Dictionary
replacement is not supported in the file format.

4.  Add an enum to dictionary metadata for possible future changes in what
format dictionary batches can be sent. (the most likely would be an array
Map<Int, Value>).  An enum is needed as a place holder to allow for forward
compatibility past the release 1.0.0.

If accepted there will be work in all implementations to make sure that
they cover the edge cases highlighted and additional integration testing
will be needed.

Please vote whether to accept these additions. The vote will be open for at
least 72 hours.

[ ] +1 Accept these change to the specification
[ ] +0
[ ] -1 Do not accept the changes because...

Thanks,
Micah


[1]
https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
[2] https://github.com/apache/arrow/pull/5585

Re: [Result] [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Ji Liu <ni...@aliyun.com.INVALID>.
Thanks Micah, I'll take the Java side implementation.

Thanks,
Ji Liu


------------------------------------------------------------------
From:Micah Kornfield <em...@gmail.com>
Send Time:2019年12月2日(星期一) 09:25
To:dev <de...@arrow.apache.org>
Subject:Re: [Result] [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

I've merged the PR and created ARROW-7283
<https://issues.apache.org/jira/browse/ARROW-7283> [1] to track
implementation for languages currently in the integration test.


[1] https://issues.apache.org/jira/browse/ARROW-7283

On Wed, Nov 27, 2019 at 1:03 AM Micah Kornfield <em...@gmail.com>
wrote:

> The vote carries with 3 bindings votes +1 votes, 1 non-binding +1 vote and
> 1 non-binding +.5 vote.
>
> To follow-up I will:
> 1.  Open up JIRAs for work items in reference implementations (c++/java)
> 2.  Merge the pull request containing the specification changes.
>
> Thanks,
> Micah
>
> On Tue, Nov 26, 2019 at 12:50 AM Sutou Kouhei <ko...@clear-code.com> wrote:
>
>> +1 (binding)
>>
>> In <CA...@mail.gmail.com>
>>   "[VOTE] Clarifications and forward compatibility changes for Dictionary
>> Encoding (second iteration)" on Wed, 20 Nov 2019 20:41:57 -0800,
>>   Micah Kornfield <em...@gmail.com> wrote:
>>
>> > Hello,
>> > As discussed on [1], I've proposed clarifications in a PR [2] that
>> > clarifies:
>> >
>> > 1.  It is not required that all dictionary batches occur at the
>> beginning
>> > of the IPC stream format (if a the first record batch has an all null
>> > dictionary encoded column, the null column's dictionary might not be
>> sent
>> > until later in the stream).
>> >
>> > 2.  A second dictionary batch for the same ID that is not a "delta
>> batch"
>> > in an IPC stream indicates the dictionary should be replaced.
>> >
>> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
>> > dictionary batch and multiple "delta" dictionary batches. Dictionary
>> > replacement is not supported in the file format.
>> >
>> > 4.  Add an enum to dictionary metadata for possible future changes in
>> what
>> > format dictionary batches can be sent. (the most likely would be an
>> array
>> > Map<Int, Value>).  An enum is needed as a place holder to allow for
>> forward
>> > compatibility past the release 1.0.0.
>> >
>> > If accepted there will be work in all implementations to make sure that
>> > they cover the edge cases highlighted and additional integration testing
>> > will be needed.
>> >
>> > Please vote whether to accept these additions. The vote will be open
>> for at
>> > least 72 hours.
>> >
>> > [ ] +1 Accept these change to the specification
>> > [ ] +0
>> > [ ] -1 Do not accept the changes because...
>> >
>> > Thanks,
>> > Micah
>> >
>> >
>> > [1]
>> >
>> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
>> > [2] https://github.com/apache/arrow/pull/5585
>>
>

Re: [Result] [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Micah Kornfield <em...@gmail.com>.
I've merged the PR and created ARROW-7283
<https://issues.apache.org/jira/browse/ARROW-7283> [1] to track
implementation for languages currently in the integration test.


[1] https://issues.apache.org/jira/browse/ARROW-7283

On Wed, Nov 27, 2019 at 1:03 AM Micah Kornfield <em...@gmail.com>
wrote:

> The vote carries with 3 bindings votes +1 votes, 1 non-binding +1 vote and
> 1 non-binding +.5 vote.
>
> To follow-up I will:
> 1.  Open up JIRAs for work items in reference implementations (c++/java)
> 2.  Merge the pull request containing the specification changes.
>
> Thanks,
> Micah
>
> On Tue, Nov 26, 2019 at 12:50 AM Sutou Kouhei <ko...@clear-code.com> wrote:
>
>> +1 (binding)
>>
>> In <CA...@mail.gmail.com>
>>   "[VOTE] Clarifications and forward compatibility changes for Dictionary
>> Encoding (second iteration)" on Wed, 20 Nov 2019 20:41:57 -0800,
>>   Micah Kornfield <em...@gmail.com> wrote:
>>
>> > Hello,
>> > As discussed on [1], I've proposed clarifications in a PR [2] that
>> > clarifies:
>> >
>> > 1.  It is not required that all dictionary batches occur at the
>> beginning
>> > of the IPC stream format (if a the first record batch has an all null
>> > dictionary encoded column, the null column's dictionary might not be
>> sent
>> > until later in the stream).
>> >
>> > 2.  A second dictionary batch for the same ID that is not a "delta
>> batch"
>> > in an IPC stream indicates the dictionary should be replaced.
>> >
>> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
>> > dictionary batch and multiple "delta" dictionary batches. Dictionary
>> > replacement is not supported in the file format.
>> >
>> > 4.  Add an enum to dictionary metadata for possible future changes in
>> what
>> > format dictionary batches can be sent. (the most likely would be an
>> array
>> > Map<Int, Value>).  An enum is needed as a place holder to allow for
>> forward
>> > compatibility past the release 1.0.0.
>> >
>> > If accepted there will be work in all implementations to make sure that
>> > they cover the edge cases highlighted and additional integration testing
>> > will be needed.
>> >
>> > Please vote whether to accept these additions. The vote will be open
>> for at
>> > least 72 hours.
>> >
>> > [ ] +1 Accept these change to the specification
>> > [ ] +0
>> > [ ] -1 Do not accept the changes because...
>> >
>> > Thanks,
>> > Micah
>> >
>> >
>> > [1]
>> >
>> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
>> > [2] https://github.com/apache/arrow/pull/5585
>>
>

[Result] [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Micah Kornfield <em...@gmail.com>.
The vote carries with 3 bindings votes +1 votes, 1 non-binding +1 vote and
1 non-binding +.5 vote.

To follow-up I will:
1.  Open up JIRAs for work items in reference implementations (c++/java)
2.  Merge the pull request containing the specification changes.

Thanks,
Micah

On Tue, Nov 26, 2019 at 12:50 AM Sutou Kouhei <ko...@clear-code.com> wrote:

> +1 (binding)
>
> In <CA...@mail.gmail.com>
>   "[VOTE] Clarifications and forward compatibility changes for Dictionary
> Encoding (second iteration)" on Wed, 20 Nov 2019 20:41:57 -0800,
>   Micah Kornfield <em...@gmail.com> wrote:
>
> > Hello,
> > As discussed on [1], I've proposed clarifications in a PR [2] that
> > clarifies:
> >
> > 1.  It is not required that all dictionary batches occur at the beginning
> > of the IPC stream format (if a the first record batch has an all null
> > dictionary encoded column, the null column's dictionary might not be sent
> > until later in the stream).
> >
> > 2.  A second dictionary batch for the same ID that is not a "delta batch"
> > in an IPC stream indicates the dictionary should be replaced.
> >
> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> > replacement is not supported in the file format.
> >
> > 4.  Add an enum to dictionary metadata for possible future changes in
> what
> > format dictionary batches can be sent. (the most likely would be an array
> > Map<Int, Value>).  An enum is needed as a place holder to allow for
> forward
> > compatibility past the release 1.0.0.
> >
> > If accepted there will be work in all implementations to make sure that
> > they cover the edge cases highlighted and additional integration testing
> > will be needed.
> >
> > Please vote whether to accept these additions. The vote will be open for
> at
> > least 72 hours.
> >
> > [ ] +1 Accept these change to the specification
> > [ ] +0
> > [ ] -1 Do not accept the changes because...
> >
> > Thanks,
> > Micah
> >
> >
> > [1]
> >
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> > [2] https://github.com/apache/arrow/pull/5585
>

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Sutou Kouhei <ko...@clear-code.com>.
+1 (binding)

In <CA...@mail.gmail.com>
  "[VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)" on Wed, 20 Nov 2019 20:41:57 -0800,
  Micah Kornfield <em...@gmail.com> wrote:

> Hello,
> As discussed on [1], I've proposed clarifications in a PR [2] that
> clarifies:
> 
> 1.  It is not required that all dictionary batches occur at the beginning
> of the IPC stream format (if a the first record batch has an all null
> dictionary encoded column, the null column's dictionary might not be sent
> until later in the stream).
> 
> 2.  A second dictionary batch for the same ID that is not a "delta batch"
> in an IPC stream indicates the dictionary should be replaced.
> 
> 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> dictionary batch and multiple "delta" dictionary batches. Dictionary
> replacement is not supported in the file format.
> 
> 4.  Add an enum to dictionary metadata for possible future changes in what
> format dictionary batches can be sent. (the most likely would be an array
> Map<Int, Value>).  An enum is needed as a place holder to allow for forward
> compatibility past the release 1.0.0.
> 
> If accepted there will be work in all implementations to make sure that
> they cover the edge cases highlighted and additional integration testing
> will be needed.
> 
> Please vote whether to accept these additions. The vote will be open for at
> least 72 hours.
> 
> [ ] +1 Accept these change to the specification
> [ ] +0
> [ ] -1 Do not accept the changes because...
> 
> Thanks,
> Micah
> 
> 
> [1]
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> [2] https://github.com/apache/arrow/pull/5585

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Ji Liu <ni...@aliyun.com.INVALID>.
To clarify, we have already implemented option #1 ("It is not required that all dictionary batches occur at the beginning") in the previous PR[1].

So hope this proposal will be accepted and I would like to take follow-up works in Java side if possible.

Thanks,
Ji Liu


[1] https://github.com/apache/arrow/pull/4960


------------------------------------------------------------------
From:Ji Liu <ni...@aliyun.com.INVALID>
Send Time:2019年11月26日(星期二) 14:04
To:dev <de...@arrow.apache.org>; Micah Kornfield <em...@gmail.com>
Cc:Wes McKinney <we...@gmail.com>
Subject:Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

+1 (non-binding)

Thanks
Ji Liu


------------------------------------------------------------------
From:Fan Liya <li...@gmail.com>
Send Time:2019年11月26日(星期二) 14:01
To:dev <de...@arrow.apache.org>; Micah Kornfield <em...@gmail.com>
Cc:Wes McKinney <we...@gmail.com>
Subject:Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

I am sorry I did not follow the thread closely (will follow up later).
However, the proposal above looks good to me.
So I am +0.5 for this.

Best,
Liya Fan

On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield <em...@gmail.com>
wrote:

> Could other members of the community chime in on this?  In particular
> getting views from other language maintainers would be good.
>
> Thanks,
> Micah
>
> On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
> > Forgot to say,  My vote is +1 (binding).
> >
> > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney <we...@gmail.com>
> wrote:
> >
> >> +1 (binding). Thanks Micah
> >>
> >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield <emkornfield@gmail.com
> >
> >> wrote:
> >> >
> >> > Hello,
> >> > As discussed on [1], I've proposed clarifications in a PR [2] that
> >> > clarifies:
> >> >
> >> > 1.  It is not required that all dictionary batches occur at the
> >> beginning
> >> > of the IPC stream format (if a the first record batch has an all null
> >> > dictionary encoded column, the null column's dictionary might not be
> >> sent
> >> > until later in the stream).
> >> >
> >> > 2.  A second dictionary batch for the same ID that is not a "delta
> >> batch"
> >> > in an IPC stream indicates the dictionary should be replaced.
> >> >
> >> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> >> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> >> > replacement is not supported in the file format.
> >> >
> >> > 4.  Add an enum to dictionary metadata for possible future changes in
> >> what
> >> > format dictionary batches can be sent. (the most likely would be an
> >> array
> >> > Map<Int, Value>).  An enum is needed as a place holder to allow for
> >> forward
> >> > compatibility past the release 1.0.0.
> >> >
> >> > If accepted there will be work in all implementations to make sure
> that
> >> > they cover the edge cases highlighted and additional integration
> testing
> >> > will be needed.
> >> >
> >> > Please vote whether to accept these additions. The vote will be open
> >> for at
> >> > least 72 hours.
> >> >
> >> > [ ] +1 Accept these change to the specification
> >> > [ ] +0
> >> > [ ] -1 Do not accept the changes because...
> >> >
> >> > Thanks,
> >> > Micah
> >> >
> >> >
> >> > [1]
> >> >
> >>
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> >> > [2] https://github.com/apache/arrow/pull/5585
> >>
> >
>


Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Ji Liu <ni...@aliyun.com.INVALID>.
+1 (non-binding)

Thanks
Ji Liu


------------------------------------------------------------------
From:Fan Liya <li...@gmail.com>
Send Time:2019年11月26日(星期二) 14:01
To:dev <de...@arrow.apache.org>; Micah Kornfield <em...@gmail.com>
Cc:Wes McKinney <we...@gmail.com>
Subject:Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

I am sorry I did not follow the thread closely (will follow up later).
However, the proposal above looks good to me.
So I am +0.5 for this.

Best,
Liya Fan

On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield <em...@gmail.com>
wrote:

> Could other members of the community chime in on this?  In particular
> getting views from other language maintainers would be good.
>
> Thanks,
> Micah
>
> On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
> > Forgot to say,  My vote is +1 (binding).
> >
> > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney <we...@gmail.com>
> wrote:
> >
> >> +1 (binding). Thanks Micah
> >>
> >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield <emkornfield@gmail.com
> >
> >> wrote:
> >> >
> >> > Hello,
> >> > As discussed on [1], I've proposed clarifications in a PR [2] that
> >> > clarifies:
> >> >
> >> > 1.  It is not required that all dictionary batches occur at the
> >> beginning
> >> > of the IPC stream format (if a the first record batch has an all null
> >> > dictionary encoded column, the null column's dictionary might not be
> >> sent
> >> > until later in the stream).
> >> >
> >> > 2.  A second dictionary batch for the same ID that is not a "delta
> >> batch"
> >> > in an IPC stream indicates the dictionary should be replaced.
> >> >
> >> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> >> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> >> > replacement is not supported in the file format.
> >> >
> >> > 4.  Add an enum to dictionary metadata for possible future changes in
> >> what
> >> > format dictionary batches can be sent. (the most likely would be an
> >> array
> >> > Map<Int, Value>).  An enum is needed as a place holder to allow for
> >> forward
> >> > compatibility past the release 1.0.0.
> >> >
> >> > If accepted there will be work in all implementations to make sure
> that
> >> > they cover the edge cases highlighted and additional integration
> testing
> >> > will be needed.
> >> >
> >> > Please vote whether to accept these additions. The vote will be open
> >> for at
> >> > least 72 hours.
> >> >
> >> > [ ] +1 Accept these change to the specification
> >> > [ ] +0
> >> > [ ] -1 Do not accept the changes because...
> >> >
> >> > Thanks,
> >> > Micah
> >> >
> >> >
> >> > [1]
> >> >
> >>
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> >> > [2] https://github.com/apache/arrow/pull/5585
> >>
> >
>

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Fan Liya <li...@gmail.com>.
I am sorry I did not follow the thread closely (will follow up later).
However, the proposal above looks good to me.
So I am +0.5 for this.

Best,
Liya Fan

On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield <em...@gmail.com>
wrote:

> Could other members of the community chime in on this?  In particular
> getting views from other language maintainers would be good.
>
> Thanks,
> Micah
>
> On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
> > Forgot to say,  My vote is +1 (binding).
> >
> > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney <we...@gmail.com>
> wrote:
> >
> >> +1 (binding). Thanks Micah
> >>
> >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield <emkornfield@gmail.com
> >
> >> wrote:
> >> >
> >> > Hello,
> >> > As discussed on [1], I've proposed clarifications in a PR [2] that
> >> > clarifies:
> >> >
> >> > 1.  It is not required that all dictionary batches occur at the
> >> beginning
> >> > of the IPC stream format (if a the first record batch has an all null
> >> > dictionary encoded column, the null column's dictionary might not be
> >> sent
> >> > until later in the stream).
> >> >
> >> > 2.  A second dictionary batch for the same ID that is not a "delta
> >> batch"
> >> > in an IPC stream indicates the dictionary should be replaced.
> >> >
> >> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> >> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> >> > replacement is not supported in the file format.
> >> >
> >> > 4.  Add an enum to dictionary metadata for possible future changes in
> >> what
> >> > format dictionary batches can be sent. (the most likely would be an
> >> array
> >> > Map<Int, Value>).  An enum is needed as a place holder to allow for
> >> forward
> >> > compatibility past the release 1.0.0.
> >> >
> >> > If accepted there will be work in all implementations to make sure
> that
> >> > they cover the edge cases highlighted and additional integration
> testing
> >> > will be needed.
> >> >
> >> > Please vote whether to accept these additions. The vote will be open
> >> for at
> >> > least 72 hours.
> >> >
> >> > [ ] +1 Accept these change to the specification
> >> > [ ] +0
> >> > [ ] -1 Do not accept the changes because...
> >> >
> >> > Thanks,
> >> > Micah
> >> >
> >> >
> >> > [1]
> >> >
> >>
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> >> > [2] https://github.com/apache/arrow/pull/5585
> >>
> >
>

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Micah Kornfield <em...@gmail.com>.
Could other members of the community chime in on this?  In particular
getting views from other language maintainers would be good.

Thanks,
Micah

On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield <em...@gmail.com>
wrote:

> Forgot to say,  My vote is +1 (binding).
>
> On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney <we...@gmail.com> wrote:
>
>> +1 (binding). Thanks Micah
>>
>> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield <em...@gmail.com>
>> wrote:
>> >
>> > Hello,
>> > As discussed on [1], I've proposed clarifications in a PR [2] that
>> > clarifies:
>> >
>> > 1.  It is not required that all dictionary batches occur at the
>> beginning
>> > of the IPC stream format (if a the first record batch has an all null
>> > dictionary encoded column, the null column's dictionary might not be
>> sent
>> > until later in the stream).
>> >
>> > 2.  A second dictionary batch for the same ID that is not a "delta
>> batch"
>> > in an IPC stream indicates the dictionary should be replaced.
>> >
>> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
>> > dictionary batch and multiple "delta" dictionary batches. Dictionary
>> > replacement is not supported in the file format.
>> >
>> > 4.  Add an enum to dictionary metadata for possible future changes in
>> what
>> > format dictionary batches can be sent. (the most likely would be an
>> array
>> > Map<Int, Value>).  An enum is needed as a place holder to allow for
>> forward
>> > compatibility past the release 1.0.0.
>> >
>> > If accepted there will be work in all implementations to make sure that
>> > they cover the edge cases highlighted and additional integration testing
>> > will be needed.
>> >
>> > Please vote whether to accept these additions. The vote will be open
>> for at
>> > least 72 hours.
>> >
>> > [ ] +1 Accept these change to the specification
>> > [ ] +0
>> > [ ] -1 Do not accept the changes because...
>> >
>> > Thanks,
>> > Micah
>> >
>> >
>> > [1]
>> >
>> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
>> > [2] https://github.com/apache/arrow/pull/5585
>>
>

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Micah Kornfield <em...@gmail.com>.
Forgot to say,  My vote is +1 (binding).

On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney <we...@gmail.com> wrote:

> +1 (binding). Thanks Micah
>
> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > Hello,
> > As discussed on [1], I've proposed clarifications in a PR [2] that
> > clarifies:
> >
> > 1.  It is not required that all dictionary batches occur at the beginning
> > of the IPC stream format (if a the first record batch has an all null
> > dictionary encoded column, the null column's dictionary might not be sent
> > until later in the stream).
> >
> > 2.  A second dictionary batch for the same ID that is not a "delta batch"
> > in an IPC stream indicates the dictionary should be replaced.
> >
> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> > replacement is not supported in the file format.
> >
> > 4.  Add an enum to dictionary metadata for possible future changes in
> what
> > format dictionary batches can be sent. (the most likely would be an array
> > Map<Int, Value>).  An enum is needed as a place holder to allow for
> forward
> > compatibility past the release 1.0.0.
> >
> > If accepted there will be work in all implementations to make sure that
> > they cover the edge cases highlighted and additional integration testing
> > will be needed.
> >
> > Please vote whether to accept these additions. The vote will be open for
> at
> > least 72 hours.
> >
> > [ ] +1 Accept these change to the specification
> > [ ] +0
> > [ ] -1 Do not accept the changes because...
> >
> > Thanks,
> > Micah
> >
> >
> > [1]
> >
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> > [2] https://github.com/apache/arrow/pull/5585
>

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Wes McKinney <we...@gmail.com>.
+1 (binding). Thanks Micah

On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield <em...@gmail.com> wrote:
>
> Hello,
> As discussed on [1], I've proposed clarifications in a PR [2] that
> clarifies:
>
> 1.  It is not required that all dictionary batches occur at the beginning
> of the IPC stream format (if a the first record batch has an all null
> dictionary encoded column, the null column's dictionary might not be sent
> until later in the stream).
>
> 2.  A second dictionary batch for the same ID that is not a "delta batch"
> in an IPC stream indicates the dictionary should be replaced.
>
> 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> dictionary batch and multiple "delta" dictionary batches. Dictionary
> replacement is not supported in the file format.
>
> 4.  Add an enum to dictionary metadata for possible future changes in what
> format dictionary batches can be sent. (the most likely would be an array
> Map<Int, Value>).  An enum is needed as a place holder to allow for forward
> compatibility past the release 1.0.0.
>
> If accepted there will be work in all implementations to make sure that
> they cover the edge cases highlighted and additional integration testing
> will be needed.
>
> Please vote whether to accept these additions. The vote will be open for at
> least 72 hours.
>
> [ ] +1 Accept these change to the specification
> [ ] +0
> [ ] -1 Do not accept the changes because...
>
> Thanks,
> Micah
>
>
> [1]
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> [2] https://github.com/apache/arrow/pull/5585