You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Sophie Blee-Goldman (JIRA)" <ji...@apache.org> on 2019/08/07 21:34:00 UTC

[jira] [Created] (KAFKA-8767) Optimize StickyAssignor for Cooperative mode

Sophie Blee-Goldman created KAFKA-8767:
------------------------------------------

             Summary: Optimize StickyAssignor for Cooperative mode
                 Key: KAFKA-8767
                 URL: https://issues.apache.org/jira/browse/KAFKA-8767
             Project: Kafka
          Issue Type: Sub-task
            Reporter: Sophie Blee-Goldman


In some rare cases, the StickyAssignor will fail to balance an assignment without violating stickiness despite a balanced and sticky assignment being possible. The implications of this for cooperative rebalancing are that an unnecessary additional rebalance will be triggered.

This was seen to happen for example when each consumer is subscribed to some random subset of all topics and all their subscriptions change to a different random subset, as occurs in AbstractStickyAssignorTest#testReassignmentWithRandomSubscriptionsAndChanges.

The initial assignment after the random subscription change obviously involved migrating partitions, so following the cooperative protocol those partitions are removed from the balanced first assignment, and a second rebalance is triggered. In some cases, during the second rebalance the assignor was unable to reach a balanced assignment without migrating a few partitions, even though one must have been possible (since the first assignment was balanced). A third rebalance was needed to reach a stable balanced state.

Under the conditions in the previously mentioned test (between 20-40 consumers, 10-20 topics (with 0-20 partitions) this third rebalance was required roughly 30% of the time. Some initial improvements to the sticky assignment logic reduced this to under 15%, but we should consider closing this gap and optimizing the cooperative sticky assignment

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)