You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2022/11/05 01:45:00 UTC

[jira] [Commented] (IMPALA-11603) Investigate using cloudflare's zlib library

    [ https://issues.apache.org/jira/browse/IMPALA-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629238#comment-17629238 ] 

Joe McDonnell commented on IMPALA-11603:
----------------------------------------

Cloudflare zlib does have a nice performance boost over regular zlib for ORC with deflate compression:
{noformat}
+----------+-------------------+---------+------------+------------+----------------+
| Workload | File Format       | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-------------------+---------+------------+------------+----------------+
| TPCH(42) | orc / def / block | 4.48    | -4.72%     | 3.63       | -4.86%         |
+----------+-------------------+---------+------------+------------+----------------+{noformat}
[https://jenkins.impala.io/job/perf-AB-test/375/artifact/Impala/perf_results/latest/performance_result.txt]

Diving into the profiles would get us the exact decompression speedup.

> Investigate using cloudflare's zlib library
> -------------------------------------------
>
>                 Key: IMPALA-11603
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11603
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 4.2.0
>            Reporter: Joe McDonnell
>            Priority: Major
>
> Amazon recommends the use of cloudflare's zlib implementation at [https://github.com/cloudflare/zlib]
> In a blog post, they claim pretty large performance boosts over the regular zlib implementation:
> [https://aws.amazon.com/blogs/opensource/improving-zlib-cloudflare-and-comparing-performance-with-other-zlib-forks/]
> {noformat}
> On Arm:
>   Compression performance: ~90 percent faster than zlib-madler (original zlib).
>   Decompression performance: ~52 percent faster than zlib-madler.
> On x86:
>   Compression performance: ~113 percent faster than zlib-madler.
>   Decompression performance: ~44 percent faster than zlib-madler.{noformat}
> The blog post is a year and a half old, so things may have changed since then, but it seems interesting. Amazon's guidebooks still recommend it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org