You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2023/04/19 20:26:00 UTC
[jira] [Created] (IMPALA-12076) Potential performance improvement using ZSTD's ZSTD_decompressDCtx interface
Joe McDonnell created IMPALA-12076:
--------------------------------------
Summary: Potential performance improvement using ZSTD's ZSTD_decompressDCtx interface
Key: IMPALA-12076
URL: https://issues.apache.org/jira/browse/IMPALA-12076
Project: IMPALA
Issue Type: Improvement
Components: Backend
Affects Versions: Impala 4.3.0
Reporter: Joe McDonnell
In ORC-639, they note that ZSTD's simple interface initializes the context on each call to ZSTD_decompress(). When calling ZSTD_decompress() many times, it is better to allocate the context once and use the ZSTD_decompressDCtx() interface to avoid the repeated initialization.
The ZSTD code mentions that here:
{noformat}
/*= Decompression context
* When decompressing many times,
* it is recommended to allocate a context only once,
* and re-use it for each successive compression operation.
* This will make workload friendlier for system's memory.
* Use one context per thread for parallel execution. */
typedef struct ZSTD_DCtx_s ZSTD_DCtx;{noformat}
We should investigate using this for decompress.h/.cc's ZstandardDecompressor. We already do that for the streaming decompression mode, but this should also apply to block decompression. Something similar is possible for compression as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org