You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Konstantine Karantasis (Jira)" <ji...@apache.org> on 2021/07/09 03:46:00 UTC

[jira] [Commented] (KAFKA-12493) The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper

    [ https://issues.apache.org/jira/browse/KAFKA-12493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377772#comment-17377772 ] 

Konstantine Karantasis commented on KAFKA-12493:
------------------------------------------------

[~wenbing.shen] [~junrao] is this issue a blocker for 3.0? Code freeze is only a few days away. If not, I'll postpone the issue to the next release 

> The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-12493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12493
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0
>            Reporter: Wenbing Shen
>            Assignee: Wenbing Shen
>            Priority: Major
>             Fix For: 3.0.0
>
>
> This question can be linked to this email: [https://lists.apache.org/thread.html/redf5748ec787a9c65fc48597e3d2256ffdd729de14afb873c63e6c5b%40%3Cusers.kafka.apache.org%3E]
>  
> This is a 100% recurring problem.
> Problem description:
> In the production environment of our customer’s site, the existing partitions were redistributed in the code of colleagues in other departments and written into zookeeper. This caused the controller to only judge the newly added partitions when processing partition modification events. Partition allocation plan and new partition and replica allocation in the partition state machine and replica state machine, and issue LeaderAndISR and other control requests.
> But the controller did not verify the existing partition replicas assigment in the controllerContext and whether the original partition allocation on the znode in zookeeper has changed. This seems to be no problem, but when we have to restart the broker for some reasons, such as configuration updates and upgrades Wait, this will cause this part of the topic in real-time production to be abnormal, the controller cannot complete the allocation of the new leader, and the original leader cannot correctly identify the replica allocated on the current zookeeper. The real-time business in our customer's on-site environment is interrupted and partially Data has been lost.
> This problem can be stably reproduced in the following ways:
> Adding partitions or modifying replicas of an existing topic through the following code will cause the original partition replicas to be reallocated and finally written to zookeeper.Next, the controller did not accurately process this event, restart the topic related broker, this topic will not be able to be produced and consumed.
>  
> {code:java}
> public void updateKafkaTopic(KafkaTopicVO kafkaTopicVO) {
>     ZkUtils zkUtils = ZkUtils.apply(ZK_LIST, SESSION_TIMEOUT, CONNECTION_TIMEOUT, JaasUtils.isZkSecurityEnabled());
>     try {
>         if (kafkaTopicVO.getPartitionNum() >= 0 && kafkaTopicVO.getReplicationNum() >= 0) {
>             // Get the original broker data information
>             Seq<BrokerMetadata> brokerMetadata = AdminUtils.getBrokerMetadatas(zkUtils,
>                     RackAwareMode.Enforced$.MODULE$,
>                     Option.apply(null));
>             // Generate a new partition replica allocation plan
>             scala.collection.Map<Object, Seq<Object>> replicaAssign = AdminUtils.assignReplicasToBrokers(brokerMetadata,
>                     kafkaTopicVO.getPartitionNum(), // Number of partitions
>                     kafkaTopicVO.getReplicationNum(), // Number of replicas per partition
>                     -1,
>                     -1);
>             // Modify the partition replica allocation plan
>             AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(zkUtils,
>                     kafkaTopicVO.getTopicNameList().get(0),
>                     replicaAssign,
>                     null,
>                     true);
>         }
>     } catch (Exception e) {
>         System.out.println("Adjust partition abnormal");
>         System.exit(0);
>     } finally {
>         zkUtils.close();
>     }
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)