You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "radai rosenblatt (Jira)" <ji...@apache.org> on 2021/04/02 00:13:00 UTC

[jira] [Created] (KAFKA-12605) kafka consumer churns through buffer memory iterating over records

radai rosenblatt created KAFKA-12605:
----------------------------------------

             Summary: kafka consumer churns through buffer memory iterating over records
                 Key: KAFKA-12605
                 URL: https://issues.apache.org/jira/browse/KAFKA-12605
             Project: Kafka
          Issue Type: Improvement
    Affects Versions: 2.7.0
            Reporter: radai rosenblatt


we recently conducted analysis on memory allocations by the kafka consumer and found a significant amount of buffers that graduate out of the young gen causing GC load.

 

these are tthe buffers used to gunzip record batches in the consumer when polling. since the same iterator (and underlying streams and buffers) are likely to live through several poll() cycles these buffers graduate out of young gen and cause issues.

 

see attached memory allocation flame graph.

 

the code causing this is in CompressionTypye.GZIP (taken from current trunk):
{code:java}
@Override
public InputStream wrapForInput(ByteBuffer buffer, byte messageVersion, BufferSupplier decompressionBufferSupplier) {
    try {
        // Set output buffer (uncompressed) to 16 KB (none by default) and input buffer (compressed) to
        // 8 KB (0.5 KB by default) to ensure reasonable performance in cases where the caller reads a small
        // number of bytes (potentially a single byte)
        return new BufferedInputStream(new GZIPInputStream(new ByteBufferInputStream(buffer), 8 * 1024),
                16 * 1024);
    } catch (Exception e) {
        throw new KafkaException(e);
    }
}{code}
it allocated 2 buffers - 8K and 16K even though a BufferSupplier is available to attempt re-use.

 

i believe it is possible to actually get both tthose buffers from the supplier, and return them when iteration over the record batch is done. 

doing so will require subclassing  BufferedInputStream and GZIPInputStream (or its parent class) to allow supplying external buffers onto them. also some lifecycle hook would be needed to return said buffers to the pool when iteration is done.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)