You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "James Yuzawa (Jira)" <ji...@apache.org> on 2020/09/25 18:37:00 UTC

[jira] [Comment Edited] (KAFKA-10470) zstd decompression with small batches is slow and causes excessive GC

    [ https://issues.apache.org/jira/browse/KAFKA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202351#comment-17202351 ] 

James Yuzawa edited comment on KAFKA-10470 at 9/25/20, 6:36 PM:
----------------------------------------------------------------

I also noticed the large amount of allocations and GC activity in my profiling. However, there is an additional issue related to the the number of calls Kafka does to ZstdOutputStream.write(int). Each of these single byte writes gets sent to the JNI for compression. I think an input buffer could improve this, by only crossing over into the JNI code when a critical mass of input has been accumulated. Option 1: We could wrap the ZstdOutputStream with a BufferedOutputStream like how it is done for GZIP currently. Option 2: Alter the library to use buffering. I have this ticket open with the zstd-jni project [https://github.com/luben/zstd-jni/issues/141]


was (Author: yuzawa-san):
I also noticed the lack of buffer reuse in my profiling. However, there is an additional issue related to the the number of calls Kafka does to ZstdOutputStream.write(int). Each of these single byte writes gets sent to the JNI for compression. I think an input buffer could improve this, by only crossing over into the JNI code when a critical mass of input has been accumulated. Option 1: We could wrap the ZstdOutputStream with a BufferedOutputStream like how it is done for GZIP currently. Option 2: the library could be updated. I have this ticket open with the zstd-jni project [https://github.com/luben/zstd-jni/issues/141]

> zstd decompression with small batches is slow and causes excessive GC
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-10470
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10470
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.5.1
>            Reporter: Robert Wagner
>            Priority: Major
>
> Similar to KAFKA-5150 but for zstd instead of LZ4, it appears that a large decompression buffer (128kb) created by zstd-jni per batch is causing a significant performance bottleneck.
> The next upcoming version of zstd-jni (1.4.5-7) will have a new constructor for ZstdInputStream that allows the client to pass its own buffer.  A similar fix as [PR #2967|https://github.com/apache/kafka/pull/2967] could be used to have the  ZstdConstructor use a BufferSupplier to re-use the decompression buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)