You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2016/02/01 20:58:40 UTC
[jira] [Assigned] (PARQUET-478) Reassembly algorithms for nested
in-memory columnar memory layout
[ https://issues.apache.org/jira/browse/PARQUET-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney reassigned PARQUET-478:
------------------------------------
Assignee: Wes McKinney
> Reassembly algorithms for nested in-memory columnar memory layout
> -----------------------------------------------------------------
>
> Key: PARQUET-478
> URL: https://issues.apache.org/jira/browse/PARQUET-478
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-cpp
> Reporter: Wes McKinney
> Assignee: Wes McKinney
>
> I plan to use parquet-cpp primarily in conjunction with columnar data structures.
> Specifically, this requires in the interpretation of repetition / definition levels:
> * Computing null bits / bytes for each logical level of nested tree (group, array, primitive leaf)
> * Computing implied array sizes for each repeated group (according to 1, 2, or 3-level array encoding)
> The results of this reconstruction will be simply C arrays accompanied by the parquet-cpp logical schema; this way we can make it easy to adapt to different in-memory columnar memory schemes.
> As far as implementation, it would make sense to proceed first with functional unit tests of the reassembly algorithms using repetition / definition levels declared in the test suite as C++ vectors -- otherwise it's going to be too tedious trying to produce valid Parquet test data files which explore all of the different edge cases.
> Several other teams (Spark, Drill, Parquet-Java) are currently working on related efforts along these lines, so we can engage when appropriate to collaborate on algorithms and nuances of this approach to avoid unnecessary code churn / bugs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)