You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jason Gustafson (JIRA)" <ji...@apache.org> on 2016/11/23 19:37:59 UTC

[jira] [Commented] (KAFKA-4435) Improve storage overhead of group metadata

    [ https://issues.apache.org/jira/browse/KAFKA-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691169#comment-15691169 ] 

Jason Gustafson commented on KAFKA-4435:
----------------------------------------

cc [~onurkaraman]

> Improve storage overhead of group metadata
> ------------------------------------------
>
>                 Key: KAFKA-4435
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4435
>             Project: Kafka
>          Issue Type: Improvement
>          Components: consumer
>            Reporter: Jason Gustafson
>
> The GroupMetadataManager serializes the full subscriptions and assignments of all consumer group members for each generation as a single message. This is a problem for large consumer groups with a large number of topics since each member's subscription/assignment is serialized separately. So if you have n consumers each subscribing to the same m topics, then the serialized message will contain m*n subscribed topics. At a certain size, you end up exceeding the max message size.
> Some ideas for getting around this have been 1) turning on compression and 2) adding regex support to the protocol. Both of these help, but maybe we should question whether the subscriptions/assignments need to be written at all. The reason to include this information in the log is basically it prevent a rebalance on coordinator failover. After failover, the new coordinator can consume the log and determine the full state of every group. The consumers in the group simply send heartbeats to the new coordinator, once it is found.
> In fact, preventing the rebalance is not really the main issue: it's ensuring that the last generation can commit its offsets. If nothing were written to the log, then the group would be recreated after failover from scratch and existing members would not be able to commit offsets (since their generation would no longer be valid). But the subscription/assignment is opaque to the coordinator and is not actually used when committing offsets. All it really need is the generation and the list of memberIds. 
> Supposing then that we removed the subscriptions/assignments from the group, but retained the generation/memberId information, one loose end is servicing the DescribeGroup request. After failover, we would no longer have the subscription/assignment information we need to answer that request. One option would be to trigger a rebalance after failover in order to repopulate it. The previous generation would still be able to commit offsets before rejoining the group. Potentially we could even delay this rebalance until we actually receive a DescribeGroups request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)