You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2023/04/19 20:26:00 UTC

[jira] [Created] (IMPALA-12076) Potential performance improvement using ZSTD's ZSTD_decompressDCtx interface

Joe McDonnell created IMPALA-12076:
--------------------------------------

             Summary: Potential performance improvement using ZSTD's ZSTD_decompressDCtx interface
                 Key: IMPALA-12076
                 URL: https://issues.apache.org/jira/browse/IMPALA-12076
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 4.3.0
            Reporter: Joe McDonnell


In ORC-639, they note that ZSTD's simple interface initializes the context on each call to ZSTD_decompress(). When calling ZSTD_decompress() many times, it is better to allocate the context once and use the ZSTD_decompressDCtx() interface to avoid the repeated initialization.

The ZSTD code mentions that here:

 
{noformat}
/*= Decompression context
 *  When decompressing many times,
 *  it is recommended to allocate a context only once,
 *  and re-use it for each successive compression operation.
 *  This will make workload friendlier for system's memory.
 *  Use one context per thread for parallel execution. */
typedef struct ZSTD_DCtx_s ZSTD_DCtx;{noformat}
We should investigate using this for decompress.h/.cc's ZstandardDecompressor. We already do that for the streaming decompression mode, but this should also apply to block decompression. Something similar is possible for compression as well.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org