You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Martin Radev (JIRA)" <ji...@apache.org> on 2019/08/12 21:28:00 UTC

[jira] [Updated] (ARROW-6216) Allow user to select the ZSTD compression level

     [ https://issues.apache.org/jira/browse/ARROW-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Radev updated ARROW-6216:
--------------------------------
    Description: 
The compression level selected in Arrow for ZSTD is 1 which is the minimal compression level for the compressor. This leads to very high compression speed at the sacrifice of compression ratio.

The user should be allowed to select the compression level as both speed and ratio are data specific.

The proposed solution is to expose the knob via an environment variable such as ARROW_ZSTD_COMPRESSION_LEVEL.
 Example:
 export ARROW_ZSTD_COMPRESSION_LEVEL=10
 ./my_parquet_app

Here is a test run with compression levels of 1, 2 and 5:
Level   Time (s)   Size (mb)
1          13.02       181
2          13.10       177
5          19.44       148

  was:
The compression level selected in Arrow for ZSTD is 1 which is the minimal compression level for the compressor. This leads to very high compression speed at the sacrifice of compression ratio.

The user should be allowed to select the compression level as both speed and ratio are data specific.

The proposed solution is to expose the knob via an environment variable such as ARROW_ZSTD_COMPRESSION_LEVEL.
Example:
export ARROW_ZSTD_COMPRESSION_LEVEL=10
./my_parquet_app


> Allow user to select the ZSTD compression level
> -----------------------------------------------
>
>                 Key: ARROW-6216
>                 URL: https://issues.apache.org/jira/browse/ARROW-6216
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Martin Radev
>            Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The compression level selected in Arrow for ZSTD is 1 which is the minimal compression level for the compressor. This leads to very high compression speed at the sacrifice of compression ratio.
> The user should be allowed to select the compression level as both speed and ratio are data specific.
> The proposed solution is to expose the knob via an environment variable such as ARROW_ZSTD_COMPRESSION_LEVEL.
>  Example:
>  export ARROW_ZSTD_COMPRESSION_LEVEL=10
>  ./my_parquet_app
> Here is a test run with compression levels of 1, 2 and 5:
> Level   Time (s)   Size (mb)
> 1          13.02       181
> 2          13.10       177
> 5          19.44       148



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)