You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Ji Liu (Jira)" <ji...@apache.org> on 2019/08/21 14:15:00 UTC

[jira] [Updated] (ARROW-6308) [Java] Support write interleaved dictionaries and batches in IPC stream

     [ https://issues.apache.org/jira/browse/ARROW-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ji Liu updated ARROW-6308:
--------------------------
    Description: 
Per discussions in the following threads, as spec([http://arrow.apache.org/docs/format/IPC.html#streaming-format]) described, as long as a record batch doesn't reference a dictionary they can be interleaved.

[https://github.com/apache/arrow/pull/4960]

[https://github.com/apache/arrow/pull/5146]

Currently it’s able to parse dictionaries and batches which are interleaved via ARROW-6040,  But it’s impossible to write data in this format.

cases below should be supported:

i. have a record batch of one dictionary encoded column S
 # Schema
 # RecordBatch: S=[null, null, null, null]
 # DictionaryBatch: ['abc', 'efg']
 # Recordbatch: S=[0, 1, 0, 1]

ii. have a record batch of two dictionary encoded column S1, S2
 # Schema
 # DictionaryBatch S1: ['ab', 'cd']
 # RecordBatch: S1 = [0,1,0,1] S2 =[null, null, null,]
 # DictionaryBatch S2: ['cc', 'dd']
 # RecordBatch: S1 = [0,1,0,1] S2 =[0,1,0,1]

This issue is used to record this problem, and should be done after a ML discuss.

  was:
Per discussions in the following threads, as spec([http://arrow.apache.org/docs/format/IPC.html#streaming-format]) described, as long as a record batch doesn't reference a dictionary they can be interleaved.

[https://github.com/apache/arrow/pull/4960]

[https://github.com/apache/arrow/pull/5146]

Currently it’s able to parse dictionaries and batches which are interleaved via ARROW-6040,  But it’s impossible to write data in this format.

 

 

This issue is used to record this problem, and should be done after a ML discuss.


> [Java] Support write interleaved dictionaries and batches in IPC stream
> -----------------------------------------------------------------------
>
>                 Key: ARROW-6308
>                 URL: https://issues.apache.org/jira/browse/ARROW-6308
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: Ji Liu
>            Assignee: Ji Liu
>            Priority: Major
>
> Per discussions in the following threads, as spec([http://arrow.apache.org/docs/format/IPC.html#streaming-format]) described, as long as a record batch doesn't reference a dictionary they can be interleaved.
> [https://github.com/apache/arrow/pull/4960]
> [https://github.com/apache/arrow/pull/5146]
> Currently it’s able to parse dictionaries and batches which are interleaved via ARROW-6040,  But it’s impossible to write data in this format.
> cases below should be supported:
> i. have a record batch of one dictionary encoded column S
>  # Schema
>  # RecordBatch: S=[null, null, null, null]
>  # DictionaryBatch: ['abc', 'efg']
>  # Recordbatch: S=[0, 1, 0, 1]
> ii. have a record batch of two dictionary encoded column S1, S2
>  # Schema
>  # DictionaryBatch S1: ['ab', 'cd']
>  # RecordBatch: S1 = [0,1,0,1] S2 =[null, null, null,]
>  # DictionaryBatch S2: ['cc', 'dd']
>  # RecordBatch: S1 = [0,1,0,1] S2 =[0,1,0,1]
> This issue is used to record this problem, and should be done after a ML discuss.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)