You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Lucas Wang (JIRA)" <ji...@apache.org> on 2018/01/25 00:20:00 UTC
[jira] [Assigned] (KAFKA-6481) Improving performance of the function ControllerChannelManager.addUpdateMetadataRequestForBrokers

     [ https://issues.apache.org/jira/browse/KAFKA-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lucas Wang reassigned KAFKA-6481:
---------------------------------

    Assignee: Lucas Wang

> Improving performance of the function ControllerChannelManager.addUpdateMetadataRequestForBrokers
> -------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6481
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6481
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Lucas Wang
>            Assignee: Lucas Wang
>            Priority: Minor
>
> The function ControllerChannelManager.addUpdateMetadataRequestForBrokers should only process the partitions specified in the partitions parameter, i.e. the 2nd parameter, and avoid iterating through the set of partitions in TopicDeletionManager.partitionsToBeDeleted.
>  
> Here is why the current code can be a problem:
> The number of partitions-to-be-deleted stored in the field TopicDeletionManager.partitionsToBeDeleted can become quite large under certain scenarios. For instance, if a topic a0 has dead replicas, the topic a0 would be marked as ineligible for deletion, and its partitions will be retained in the field TopicDeletionManager.partitionsToBeDeleted for future retries.
> With a large set of partitions in TopicDeletionManager.partitionsToBeDeleted, if some replicas in another topic a1 needs to be transitioned to OfflineReplica state, possibly because of a broker going offline, a call stack listed as following will happen on the controller, causing a iteration of the whole partitions-to-be-deleted set for every single affected partition.
>     controller.topicDeletionManager.partitionsToBeDeleted.foreach(partition => updateMetadataRequestPartitionInfo(partition, beingDeleted = true))
>     ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers
>     ControllerBrokerRequestBatch.addLeaderAndIsrRequestForBrokers
>     inside a for-loop for each partition 
> ReplicaStateMachine.doHandleStateChanges
> ReplicaStateMachine.handleStateChanges
> KafkaController.onReplicasBecomeOffline
> KafkaController.onBrokerFailure
> How to reproduce the problem:
> 1. Cretae a cluster with 2 brokers having id 1 and 2
> 2. Create a topic having 10 partitions and deliberately assign the replicas to non-existing brokers, i.e. 
>  ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic a0 --replica-assignment `echo -n 3:4; for i in \`seq 9\`; do echo -n ,3:4; done`
> 3. Delete the topic and cause all of its partitions to be retained in the field TopicDeletionManager.partitionsToBeDeleted, since the topic has dead replicas, and is ineligible for deletion.
> 4. Create another topic a1 also having 10 partitions, i.e.
>  ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic a1 --replica-assignment `echo -n 1:2; for i in \`seq 9\`; do echo -n ,1:2; done`
> 5. Kill the broker 2 and cause the replicas on broker 2 to be transitioned to OfflineReplica state on the controller.
> 6. Verify that the following log message appear over 200 times in the controller.log file, one for each iteration of the a0 partitions
>  "Leader not yet assigned for partition [a0,..]. Skip sending UpdateMetadataRequest."
>  
>  What happened was 
>  1. During controlled shutdown, the function KafkaController.doControlledShutdown calls replicaStateMachine.handleStateChanges to transition all the replicas on broker 2 to OfflineState. That in turn generates 100 (10 x 10) entries of the logs above.
>  2. When the broker zNode is gone in ZK, the function KafkaController.onBrokerFailure calls KafkaController.onReplicasBecomeOffline to transition all the replicas on broker 2 to OfflineState. And this again generates 100 (10 x 10) entries of the logs above.
> After applying the patch in this RB, I've verified that by going through the steps above, broker 2 going offline NO LONGER generates log entries for the a0 partitions.
> Also I've verified that topic deletion for topic a1 still works fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)