You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jay Kreps (JIRA)" <ji...@apache.org> on 2015/03/24 21:02:52 UTC

[jira] [Commented] (KAFKA-2045) Memory Management on the consumer

    [ https://issues.apache.org/jira/browse/KAFKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378476#comment-14378476 ] 

Jay Kreps commented on KAFKA-2045:
----------------------------------

There are really two issues:
1. Bounding fetch size while still guaranteeing that you eventually get data from each partition
2. Pooling and reusing byte buffers

I actually think (1) is really pressing, but (2) is just an optimization that may or may not have high payoff.

(1) is what leads to the huge memory allocations and sudden OOM when a consumer falls behind and then suddenly has lots of data or when partition assignment changes.

For (1) I think we need to figure out whether this is (a) some heuristic in the consumer which decides to only do fetches for a subset of topic/partitions or (b) a new parameter in the fetch request that gives a total bound on the request size. I think we discussed this a while back and agreed on (b), but I can't remember now. The argument if I recall was that that was the only way for the server to monitor all the subscribed topics and avoid blocking on an empty topic while non-empty partitions have data.

Bounding the allocations should help performance a lot too.

If we do this bounding then I think reuse will be a lot easier to since each response will use at most that many bytes and you could potentially even just statically allocate the byte buffer for each partition and reuse it.

> Memory Management on the consumer
> ---------------------------------
>
>                 Key: KAFKA-2045
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2045
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Guozhang Wang
>
> We need to add the memory management on the new consumer like we did in the new producer. This would probably include:
> 1. byte buffer re-usage for fetch response partition data.
> 2. byte buffer re-usage for on-the-fly de-compression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)