You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Uwe L. Korn (JIRA)" <ji...@apache.org> on 2016/12/06 11:53:58 UTC

[jira] [Commented] (PARQUET-792) Skip the storage of repetition level and definition level for all-null column

    [ https://issues.apache.org/jira/browse/PARQUET-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725264#comment-15725264 ] 

Uwe L. Korn commented on PARQUET-792:
-------------------------------------

Repetition and definition levels should not take more than 2-3 bytes for a page if all fields are null (on the same level). 

How did you come to the conclusion that the repetition and definition levels take up so much space?

To get more insights into your files, you could use the new {{parquet-cli}} tool to inspect the sizes. See this PR for the new tool:
https://github.com/apache/parquet-mr/pull/384 

> Skip the storage of repetition level and definition level for all-null column
> -----------------------------------------------------------------------------
>
>                 Key: PARQUET-792
>                 URL: https://issues.apache.org/jira/browse/PARQUET-792
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Li
>            Priority: Minor
>
> I have a very sparse protobuf message in my project, with thousands of fields.
> In practise, most of the fields are all null values in one page.
> But the repetition level and definition level takes lots of storage space.
> Can parquet skip the storage of r level and d level for such all-null columns to save storage space?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)