You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "John Gray (Jira)" <ji...@apache.org> on 2021/09/08 13:24:00 UTC

[jira] [Comment Edited] (KAFKA-10643) Static membership - repetitive PreparingRebalance with updating metadata for member reason

    [ https://issues.apache.org/jira/browse/KAFKA-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411926#comment-17411926 ] 

John Gray edited comment on KAFKA-10643 at 9/8/21, 1:23 PM:
------------------------------------------------------------

We were having this same issue with our new static consumers once their changelog topics got large enough. The group would never stabilize because of these looping metadata updates. We ended up stabilizing our groups by increasing max.poll.record.ms and metadata.max.age.ms in our streams apps to longer than however long we expected our restore consumer to take restoring our large stores. 30 minutes ended up working for us. I am not sure if it is expected that a metadata update should trigger a rebalance for a static consumer group with lots of restoring threads, but it certainly sent our groups with large state into a frenzy. It has been a while so you may have moved on from this, but I would be curious to see if these configs help your group, [~maatdeamon].


was (Author: gray.john):
We were having this same issue with our new static consumers once their changelog topics got large enough. The group would never stabilize because of these looping metadata updates. We ended up stabilizing our groups by increasing max.poll.record.ms and metadata.max.age.ms in our streams apps to longer than however long we expected our restore consumer to take restoring our large stores. 30 minutes ended up working for us. I am not sure if this is expected that a metadata update should trigger a rebalance for a static consumer group with lots of restoring threads, but it certainly sent our groups with large state into a frenzy. It has been a while so you may have moved on from this, but I would be curious to see if these configs help your group, [~maatdeamon].

> Static membership - repetitive PreparingRebalance with updating metadata for member reason
> ------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-10643
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10643
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.0
>            Reporter: Eran Levy
>            Priority: Major
>         Attachments: broker-4-11.csv, client-4-11.csv, client-d-9-11-11-2020.csv
>
>
> Kafka streams 2.6.0, brokers version 2.6.0. Kafka nodes are healthy, kafka streams app is healthy. 
> Configured with static membership. 
> Every 10 minutes (I assume cause of topic.metadata.refresh.interval.ms), I see the following group coordinator log for different stream consumers: 
> INFO [GroupCoordinator 2]: Preparing to rebalance group **--**-stream in state PreparingRebalance with old generation 12244 (__consumer_offsets-45) (reason: Updating metadata for member ****-stream-11-1-013edd56-ed93-4370-b07c-1c29fbe72c9a) (kafka.coordinator.group.GroupCoordinator)
> and right after that the following log: 
> INFO [GroupCoordinator 2]: Assignment received from leader for group **-**-stream for generation 12246 (kafka.coordinator.group.GroupCoordinator)
>  
> Looked a bit on the kafka code and Im not sure that I get why such a thing happening - is this line described the situation that happens here re the "reason:"?[https://github.com/apache/kafka/blob/7ca299b8c0f2f3256c40b694078e422350c20d19/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L311]
> I also dont see it happening too often in other kafka streams applications that we have. 
> The only thing suspicious that I see around every hour that different pods of that kafka streams application throw this exception: 
> {"timestamp":"2020-10-25T06:44:20.414Z","level":"INFO","thread":"**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1","logger":"org.apache.kafka.clients.FetchSessionHandler","message":"[Consumer clientId=**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1-restore-consumer, groupId=null] Error sending fetch request (sessionId=34683236, epoch=2872) to node 3:","context":"default","exception":"org.apache.kafka.common.errors.DisconnectException: null\n"}
> I came across this strange behaviour after stated to investigate a strange stuck rebalancing state after one of the members left the group and caused the rebalance to stuck - the only thing that I found is that maybe because that too often preparing to rebalance states, the app might affected of this bug - KAFKA-9752 ?
> I dont understand why it happens, it wasn't before I applied static membership to that kafka streams application (since around 2 weeks ago). 
> Will be happy if you can help me
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)