You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Fernando Pereira (JIRA)" <ji...@apache.org> on 2017/01/26 11:09:25 UTC

[jira] [Created] (PARQUET-845) Efficient storage for several INT_8 and INT_16

Fernando Pereira created PARQUET-845:
----------------------------------------

             Summary: Efficient storage for several INT_8 and INT_16
                 Key: PARQUET-845
                 URL: https://issues.apache.org/jira/browse/PARQUET-845
             Project: Parquet
          Issue Type: Wish
            Reporter: Fernando Pereira
            Priority: Minor


In very large datasets, aggregating several INT8 into INT32 fields (or byte array) can make a big difference.
In parquet, efficient algorithms exist for INT32, so if the LogicalType is INT_8 the encoded int might take up only one byte.

However further optimizations could be made by allowing the user to better specify the types.
What about BYTE_ARRAY logical type, backed by FIXED_LEN_BYTE_ARRAY type (or eventually INT_32)?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)