You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Julien Le Dem (JIRA)" <ji...@apache.org> on 2016/08/12 21:59:20 UTC
[jira] [Updated] (ARROW-255) Finalize Dictionary representation
[ https://issues.apache.org/jira/browse/ARROW-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Le Dem updated ARROW-255:
--------------------------------
Description:
format/Messages.fbs mentions DictionaryBatches with an id but does not specify where they are referenced.
We should add a {{dictionary: long}} in Field that references the dictionary id:
Field: https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L86
Dictionary id: https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L165
We need a spec in format/Layout.md that describes the dictionary layout.
When dictionary encoded the value vector is an array of signed int32 (for consistency with ).
The dictionary vector is a Vector of the type of the value. indexed by their id in the dictionary.
was:
format/Messages.fbs mentions DictionaryBatches with an id but does not specify where they are referenced.
We should add a {{dictionary: long}} in Field that references the dictionary id:
Field: https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L86
Dictionary id: https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L165
We need a spec in format/Layout.md that describes the dictionary layout.
When dictionary encoded the value vector is an array of unsigned int32.
The dictionary vector is a Vector of the type of the value. indexed by their id in the dictionary.
> Finalize Dictionary representation
> ----------------------------------
>
> Key: ARROW-255
> URL: https://issues.apache.org/jira/browse/ARROW-255
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Format
> Reporter: Julien Le Dem
>
> format/Messages.fbs mentions DictionaryBatches with an id but does not specify where they are referenced.
> We should add a {{dictionary: long}} in Field that references the dictionary id:
> Field: https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L86
> Dictionary id: https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L165
> We need a spec in format/Layout.md that describes the dictionary layout.
> When dictionary encoded the value vector is an array of signed int32 (for consistency with ).
> The dictionary vector is a Vector of the type of the value. indexed by their id in the dictionary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)