You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Liya Fan (JIRA)" <ji...@apache.org> on 2019/07/05 10:50:00 UTC
[jira] [Created] (ARROW-5862) [Java] Provide dictionary builder
Liya Fan created ARROW-5862:
-------------------------------
Summary: [Java] Provide dictionary builder
Key: ARROW-5862
URL: https://issues.apache.org/jira/browse/ARROW-5862
Project: Apache Arrow
Issue Type: New Feature
Components: Java
Reporter: Liya Fan
Assignee: Liya Fan
The dictionary builder servers for the following scenario which is frequently encountered in practice when dictionary encoding is involved: the dictionary values are not known a priori, so they are determined dynamically, as new data arrive continually.
In particular, when a new value arrives, it is tested to check if it is already in the dictionary. If so, it is simply neglected, otherwise, it is added to the dictionary.
When all values have been evaluated, the dictionary can be considered complete. So encoding can start afterward.
The code snippet using a dictionary builder should be like this:
{{DictonaryBuilder<IntVector> dictionaryBuilder = ...}}
{{dictionaryBuilder.startBuild();}}
{{...}}
{{dictionaryBuild.addValue(newValue);}}
{{...}}
{{dictionaryBuilder.endBuild();}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)