You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "bharath v (JIRA)" <ji...@apache.org> on 2018/10/15 17:46:00 UTC

[jira] [Created] (IMPALA-7708) Switch to faster compression strategy for incremental stats

bharath v created IMPALA-7708:
---------------------------------

             Summary: Switch to faster compression strategy for incremental stats
                 Key: IMPALA-7708
                 URL: https://issues.apache.org/jira/browse/IMPALA-7708
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog
    Affects Versions: Impala 3.1.0
            Reporter: bharath v
            Assignee: bharath v


Currently we set the Deflater mode to BEST_COMPRESSION by default.
{noformat}
public static byte[] deflateCompress(byte[] input) {
    if (input == null) return null;
    ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);
    // TODO: Benchmark other compression levels.
    DeflaterOutputStream stream =
        new DeflaterOutputStream(bos, new Deflater(Deflater.BEST_COMPRESSION));
{noformat}
In some experiments, we noticed that the fastest compression mode (BEST_SPEED) performs ~8x faster with only ~4% compression ratio penalty. 

Here are some results on a real world table with 3000 partitions with incremental stats.

 
| |Time taken for serialization (seconds)|OutputBytes size (MB)|
|Gzip best compression|92|194|
|Gzip fastest compression|11|212|
|Gzip default compression|57|195|
|No compression|5|452|

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org