You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Deepak Majeti (JIRA)" <ji...@apache.org> on 2016/09/11 17:54:21 UTC

[jira] [Assigned] (PARQUET-695) C++: Better default encoding user experience

     [ https://issues.apache.org/jira/browse/PARQUET-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deepak Majeti reassigned PARQUET-695:
-------------------------------------

    Assignee: Deepak Majeti

> C++: Better default encoding user experience
> --------------------------------------------
>
>                 Key: PARQUET-695
>                 URL: https://issues.apache.org/jira/browse/PARQUET-695
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Uwe L. Korn
>            Assignee: Deepak Majeti
>
> Currently the default encoding is PLAIN. Probably making dictionary encoding the default is the best choice and let the user select an alternative encoding if the dictionary grows too large.
> The interface should be as follows:
>  * The user selects on a global and per-column basis if we should attempt dictionary encoding a column. The selection if RLE_DICTIONARY or PLAIN_DICTIONARY is used in the metadata is hidden from the user.
>  * The user specifies a fallback (!= dictionary) encoding that is used if either dictionary encoding for a column is not desired or if the dictionary grew exceeded its size limit.
> As a recap the current implement selects the encoding solely on the encoding variable. There is no fallback support implemented if the dictionary grows too large. The only magic at the moment is that the user can supply either PLAIN_DICTIONARY or RLE_DICTIONARY and the enum that is used in the metadata is the one which is suitable for the chosen Parquet version and not the one supplied by the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)