You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/12/30 17:40:47 UTC

[GitHub] [kafka] hachikuji commented on a change in pull request #9792: KAFKA-10870: handle REBALANCE_IN_PROGRESS error in JoinGroup

hachikuji commented on a change in pull request #9792:
URL: https://github.com/apache/kafka/pull/9792#discussion_r550271098



##########
File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java
##########
@@ -658,6 +658,10 @@ public void handle(JoinGroupResponse joinResponse, RequestFuture<ByteBuffer> fut
                     AbstractCoordinator.this.generation = new Generation(OffsetCommitRequest.DEFAULT_GENERATION_ID, memberId, null);
                 }
                 future.raise(error);
+            } else if (error == Errors.REBALANCE_IN_PROGRESS) {
+                log.info("JoinGroup failed due to the group began another rebalance. Need to re-join the group.");
+                requestRejoin();

Review comment:
       I think the call to `requestRejoin` is not needed. We only reset the `rejoinNeeded` flag after the subsequent SyncGroup request. Or is there a case that this misses?

##########
File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java
##########
@@ -658,6 +658,10 @@ public void handle(JoinGroupResponse joinResponse, RequestFuture<ByteBuffer> fut
                     AbstractCoordinator.this.generation = new Generation(OffsetCommitRequest.DEFAULT_GENERATION_ID, memberId, null);
                 }
                 future.raise(error);
+            } else if (error == Errors.REBALANCE_IN_PROGRESS) {
+                log.info("JoinGroup failed due to the group began another rebalance. Need to re-join the group.");

Review comment:
       I guess it's kind of a confusing error to see. The case on the broker is when the write to the log failed because of a timeout. I wonder if it would be useful to suggest the cause in the message. For example:
   > JoinGroup failed with a REBALANCE_IN_PROGRESS error, which could indicate a replication timeout on the broker. Will retry.
   

##########
File path: clients/src/test/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinatorTest.java
##########
@@ -823,6 +823,24 @@ public void testJoinGroupRequestWithGroupInstanceIdNotFound() {
         assertTrue(coordinator.hasUnknownGeneration());
     }
 
+    @Test
+    public void testJoinGroupRequestWithRebalanceInProgress() {
+        setupCoordinator();
+        mockClient.prepareResponse(groupCoordinatorResponse(node, Errors.NONE));
+        coordinator.ensureCoordinatorReady(mockTime.timer(0));
+
+        mockClient.prepareResponse(
+            joinGroupFollowerResponse(defaultGeneration, memberId, JoinGroupRequest.UNKNOWN_MEMBER_ID, Errors.REBALANCE_IN_PROGRESS));
+
+        RequestFuture<ByteBuffer> future = coordinator.sendJoinGroupRequest();
+
+        assertTrue(consumerClient.poll(future, mockTime.timer(REQUEST_TIMEOUT_MS)));
+        assertTrue(future.exception().getClass().isInstance(Errors.REBALANCE_IN_PROGRESS.exception()));
+        assertEquals(Errors.REBALANCE_IN_PROGRESS.message(), future.exception().getMessage());
+        // should request rejoin
+        assertTrue(coordinator.rejoinNeededOrPending());

Review comment:
       Can we verify that the JoinGroup gets retried on the next poll?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org