You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Liya Fan (JIRA)" <ji...@apache.org> on 2019/07/05 10:50:00 UTC

[jira] [Created] (ARROW-5862) [Java] Provide dictionary builder

Liya Fan created ARROW-5862:
-------------------------------

             Summary: [Java] Provide dictionary builder
                 Key: ARROW-5862
                 URL: https://issues.apache.org/jira/browse/ARROW-5862
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Java
            Reporter: Liya Fan
            Assignee: Liya Fan


The dictionary builder servers for the following scenario which is frequently encountered in practice when dictionary encoding is involved: the dictionary values are not known a priori, so they are determined dynamically, as new data arrive continually.

In particular, when a new value arrives, it is tested to check if it is already in the dictionary. If so, it is simply neglected, otherwise, it is added to the dictionary.
 
When all values have been evaluated, the dictionary can be considered complete. So encoding can start afterward.

The code snippet using a dictionary builder should be like this:

{{DictonaryBuilder<IntVector> dictionaryBuilder = ...}}
{{dictionaryBuilder.startBuild();}}
{{...}}
{{dictionaryBuild.addValue(newValue);}}
{{...}}
{{dictionaryBuilder.endBuild();}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)