You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2019/11/27 06:03:34 UTC

[Result] [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

The vote carries with 3 bindings votes +1 votes, 1 non-binding +1 vote and
1 non-binding +.5 vote.

To follow-up I will:
1.  Open up JIRAs for work items in reference implementations (c++/java)
2.  Merge the pull request containing the specification changes.

Thanks,
Micah

On Tue, Nov 26, 2019 at 12:50 AM Sutou Kouhei <ko...@clear-code.com> wrote:

> +1 (binding)
>
> In <CA...@mail.gmail.com>
>   "[VOTE] Clarifications and forward compatibility changes for Dictionary
> Encoding (second iteration)" on Wed, 20 Nov 2019 20:41:57 -0800,
>   Micah Kornfield <em...@gmail.com> wrote:
>
> > Hello,
> > As discussed on [1], I've proposed clarifications in a PR [2] that
> > clarifies:
> >
> > 1.  It is not required that all dictionary batches occur at the beginning
> > of the IPC stream format (if a the first record batch has an all null
> > dictionary encoded column, the null column's dictionary might not be sent
> > until later in the stream).
> >
> > 2.  A second dictionary batch for the same ID that is not a "delta batch"
> > in an IPC stream indicates the dictionary should be replaced.
> >
> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> > replacement is not supported in the file format.
> >
> > 4.  Add an enum to dictionary metadata for possible future changes in
> what
> > format dictionary batches can be sent. (the most likely would be an array
> > Map<Int, Value>).  An enum is needed as a place holder to allow for
> forward
> > compatibility past the release 1.0.0.
> >
> > If accepted there will be work in all implementations to make sure that
> > they cover the edge cases highlighted and additional integration testing
> > will be needed.
> >
> > Please vote whether to accept these additions. The vote will be open for
> at
> > least 72 hours.
> >
> > [ ] +1 Accept these change to the specification
> > [ ] +0
> > [ ] -1 Do not accept the changes because...
> >
> > Thanks,
> > Micah
> >
> >
> > [1]
> >
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> > [2] https://github.com/apache/arrow/pull/5585
>

Re: [Result] [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Ji Liu <ni...@aliyun.com.INVALID>.
Thanks Micah, I'll take the Java side implementation.

Thanks,
Ji Liu


------------------------------------------------------------------
From:Micah Kornfield <em...@gmail.com>
Send Time:2019年12月2日(星期一) 09:25
To:dev <de...@arrow.apache.org>
Subject:Re: [Result] [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

I've merged the PR and created ARROW-7283
<https://issues.apache.org/jira/browse/ARROW-7283> [1] to track
implementation for languages currently in the integration test.


[1] https://issues.apache.org/jira/browse/ARROW-7283

On Wed, Nov 27, 2019 at 1:03 AM Micah Kornfield <em...@gmail.com>
wrote:

> The vote carries with 3 bindings votes +1 votes, 1 non-binding +1 vote and
> 1 non-binding +.5 vote.
>
> To follow-up I will:
> 1.  Open up JIRAs for work items in reference implementations (c++/java)
> 2.  Merge the pull request containing the specification changes.
>
> Thanks,
> Micah
>
> On Tue, Nov 26, 2019 at 12:50 AM Sutou Kouhei <ko...@clear-code.com> wrote:
>
>> +1 (binding)
>>
>> In <CA...@mail.gmail.com>
>>   "[VOTE] Clarifications and forward compatibility changes for Dictionary
>> Encoding (second iteration)" on Wed, 20 Nov 2019 20:41:57 -0800,
>>   Micah Kornfield <em...@gmail.com> wrote:
>>
>> > Hello,
>> > As discussed on [1], I've proposed clarifications in a PR [2] that
>> > clarifies:
>> >
>> > 1.  It is not required that all dictionary batches occur at the
>> beginning
>> > of the IPC stream format (if a the first record batch has an all null
>> > dictionary encoded column, the null column's dictionary might not be
>> sent
>> > until later in the stream).
>> >
>> > 2.  A second dictionary batch for the same ID that is not a "delta
>> batch"
>> > in an IPC stream indicates the dictionary should be replaced.
>> >
>> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
>> > dictionary batch and multiple "delta" dictionary batches. Dictionary
>> > replacement is not supported in the file format.
>> >
>> > 4.  Add an enum to dictionary metadata for possible future changes in
>> what
>> > format dictionary batches can be sent. (the most likely would be an
>> array
>> > Map<Int, Value>).  An enum is needed as a place holder to allow for
>> forward
>> > compatibility past the release 1.0.0.
>> >
>> > If accepted there will be work in all implementations to make sure that
>> > they cover the edge cases highlighted and additional integration testing
>> > will be needed.
>> >
>> > Please vote whether to accept these additions. The vote will be open
>> for at
>> > least 72 hours.
>> >
>> > [ ] +1 Accept these change to the specification
>> > [ ] +0
>> > [ ] -1 Do not accept the changes because...
>> >
>> > Thanks,
>> > Micah
>> >
>> >
>> > [1]
>> >
>> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
>> > [2] https://github.com/apache/arrow/pull/5585
>>
>

Re: [Result] [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

Posted by Micah Kornfield <em...@gmail.com>.
I've merged the PR and created ARROW-7283
<https://issues.apache.org/jira/browse/ARROW-7283> [1] to track
implementation for languages currently in the integration test.


[1] https://issues.apache.org/jira/browse/ARROW-7283

On Wed, Nov 27, 2019 at 1:03 AM Micah Kornfield <em...@gmail.com>
wrote:

> The vote carries with 3 bindings votes +1 votes, 1 non-binding +1 vote and
> 1 non-binding +.5 vote.
>
> To follow-up I will:
> 1.  Open up JIRAs for work items in reference implementations (c++/java)
> 2.  Merge the pull request containing the specification changes.
>
> Thanks,
> Micah
>
> On Tue, Nov 26, 2019 at 12:50 AM Sutou Kouhei <ko...@clear-code.com> wrote:
>
>> +1 (binding)
>>
>> In <CA...@mail.gmail.com>
>>   "[VOTE] Clarifications and forward compatibility changes for Dictionary
>> Encoding (second iteration)" on Wed, 20 Nov 2019 20:41:57 -0800,
>>   Micah Kornfield <em...@gmail.com> wrote:
>>
>> > Hello,
>> > As discussed on [1], I've proposed clarifications in a PR [2] that
>> > clarifies:
>> >
>> > 1.  It is not required that all dictionary batches occur at the
>> beginning
>> > of the IPC stream format (if a the first record batch has an all null
>> > dictionary encoded column, the null column's dictionary might not be
>> sent
>> > until later in the stream).
>> >
>> > 2.  A second dictionary batch for the same ID that is not a "delta
>> batch"
>> > in an IPC stream indicates the dictionary should be replaced.
>> >
>> > 3.  Clarifies that the file format, can only contain 1 "NON-delta"
>> > dictionary batch and multiple "delta" dictionary batches. Dictionary
>> > replacement is not supported in the file format.
>> >
>> > 4.  Add an enum to dictionary metadata for possible future changes in
>> what
>> > format dictionary batches can be sent. (the most likely would be an
>> array
>> > Map<Int, Value>).  An enum is needed as a place holder to allow for
>> forward
>> > compatibility past the release 1.0.0.
>> >
>> > If accepted there will be work in all implementations to make sure that
>> > they cover the edge cases highlighted and additional integration testing
>> > will be needed.
>> >
>> > Please vote whether to accept these additions. The vote will be open
>> for at
>> > least 72 hours.
>> >
>> > [ ] +1 Accept these change to the specification
>> > [ ] +0
>> > [ ] -1 Do not accept the changes because...
>> >
>> > Thanks,
>> > Micah
>> >
>> >
>> > [1]
>> >
>> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
>> > [2] https://github.com/apache/arrow/pull/5585
>>
>