You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/08/29 21:17:00 UTC

[jira] [Comment Edited] (ARROW-3144) [C++] Better solution for cases where dictionaries are unknown at schema reconstruction time, or for delta dictionaries

    [ https://issues.apache.org/jira/browse/ARROW-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596865#comment-16596865 ] 

Wes McKinney edited comment on ARROW-3144 at 8/29/18 9:16 PM:
--------------------------------------------------------------

The context for this issue is the design of an Arrow-native RPC system (ie ARROW-249). A "get info" request may return the schema without the dictionaries (which could be large), and the dictionaries would be sent later when the dataset is actually requested. Without some improved solution at the C++ API level, we would be unable to deserialize the schema IPC message without the corresponding dictionary batches


was (Author: wesmckinn):
The context for this issue is the design of an Arrow-native RPC system. A "get info" request may return the schema without the dictionaries (which could be large), and the dictionaries would be sent later when the dataset is actually requested. Without some improved solution at the C++ API level, we would be unable to deserialize the schema IPC message without the corresponding dictionary batches

> [C++] Better solution for cases where dictionaries are unknown at schema reconstruction time, or for delta dictionaries
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-3144
>                 URL: https://issues.apache.org/jira/browse/ARROW-3144
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 0.12.0
>
>
> There are a couple of inter-related issues:
> * Cases where a system might send the schema without the dictionaries, and the user wishes to reason about the schema and its types without knowing the dictionary values
> * Dictionaries that are changing, e.g. using delta dictionary messages
> {{arrow::DictionaryType}} has no "linkage" to any external object. I propose adding a "LinkedDictionaryType" or something similar (purely a C++ construct), which functionally would be a subclass of {{DictionaryType}}, which would allow a type to be created which will obtain its dictionary later through some kind of "Dictionary provider" interface. There is something similar in Java already. This would allow a dictionary to evolve via delta dictionaries, or for a dictionary to be retrieved later e.g. through an RPC or IPC layer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)