You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/02/20 00:15:00 UTC

[jira] [Commented] (KAFKA-7909) Ensure timely rebalance completion after pending members rejoin or fail

    [ https://issues.apache.org/jira/browse/KAFKA-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772458#comment-16772458 ] 

ASF GitHub Bot commented on KAFKA-7909:
---------------------------------------

hachikuji commented on pull request #6251: KAFKA-7909: Ensure timely rebalance completion after pending members rejoin or fail
URL: https://github.com/apache/kafka/pull/6251
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Ensure timely rebalance completion after pending members rejoin or fail
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-7909
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7909
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer, core
>            Reporter: Arjun Satish
>            Assignee: Arjun Satish
>            Priority: Blocker
>             Fix For: 2.2.0
>
>
> We recently introduced integration tests in Connect. This test spins up one or more Connect workers along with a Kafka broker and Zk in a single process and attempts to move records using a Connector. In the [Example Integration Test|https://github.com/apache/kafka/blob/3c73633/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ExampleConnectIntegrationTest.java#L105], we spin up three workers each hosting a Connector task that consumes records from a Kafka topic. When the connector starts up, it may go through multiple rounds of rebalancing. We notice the following two problems in the last few days:
>  # After members join a group, there are no pendingMembers remaining, but the join group method does not complete, and send these members a signal that they are not ready to start consuming from their respective partitions.
>  # Because of quick rebalances, a consumer might have started a group, but Connect starts  a rebalance, after we which we create three new instances of the consumer (one from each worker/task). But the group coordinator seems to have 4 members in the group. This causes the JoinGroup to indefinitely stall. 
> Even though this ticket is described in the connect of Connect, it may be applicable to general consumers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)