You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Martin Loncaric (Jira)" <ji...@apache.org> on 2022/02/28 21:20:00 UTC

[jira] [Updated] (PARQUET-2132) Support Quantile Compression q_compress column codec

     [ https://issues.apache.org/jira/browse/PARQUET-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Loncaric updated PARQUET-2132:
-------------------------------------
    Description: 
Quantile Compression (https://github.com/mwlon/quantile-compression) is a recent but stable compression algorithm for numerical sequences that averages 35%+ higher compression ratio than the next best codec (zstd), given the same compression time. It has fairly fast decompression speed, close to that of zstd. Adding q_compress as a column codec for all numerical columns could substantially reduce the size of most Parquet files.

q_compress is implemented in Rust, which has good interop with C++ and can run in JVM via JNI (e.g. https://github.com/pancake-db/pancake-scala-client).

  was:
Quantile Compression (https://github.com/mwlon/quantile-compression) is a recent but stable compression algorithm for numerical sequences that averages 35%+ higher compression ratio than the next best codec (zstd), given the same compression time. It has fairly fast decompression speed, close to that of zstd. Adding q_compress as a column codec for all numerical columns could substantially reduce the size of most parquet files.

q_compress is implemented in Rust, which has good interop with C++ and can run in JVM via JNI (e.g. https://github.com/pancake-db/pancake-scala-client).


> Support Quantile Compression q_compress column codec
> ----------------------------------------------------
>
>                 Key: PARQUET-2132
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2132
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-cpp, parquet-format, parquet-mr
>            Reporter: Martin Loncaric
>            Priority: Major
>
> Quantile Compression (https://github.com/mwlon/quantile-compression) is a recent but stable compression algorithm for numerical sequences that averages 35%+ higher compression ratio than the next best codec (zstd), given the same compression time. It has fairly fast decompression speed, close to that of zstd. Adding q_compress as a column codec for all numerical columns could substantially reduce the size of most Parquet files.
> q_compress is implemented in Rust, which has good interop with C++ and can run in JVM via JNI (e.g. https://github.com/pancake-db/pancake-scala-client).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)