You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/09/25 11:26:00 UTC

[jira] [Created] (PARQUET-1423) [C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray

Wes McKinney created PARQUET-1423:
-------------------------------------

             Summary: [C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray
                 Key: PARQUET-1423
                 URL: https://issues.apache.org/jira/browse/PARQUET-1423
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-cpp
            Reporter: Wes McKinney
             Fix For: cpp-1.6.0


If the goal is to hash this data anyway into a categorical-type array, then it would be better to offer the option to "push down" the hashing into the Parquet read hot path rather than first fully materializing a dense vector of {{ByteArray}} values, which could use a lot of memory after decompression



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)