You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jason Gustafson (Jira)" <ji...@apache.org> on 2022/05/27 20:04:00 UTC

[jira] [Created] (KAFKA-13944) Shutting down broker can be elected as partition leader in KRaft

Jason Gustafson created KAFKA-13944:
---------------------------------------

             Summary: Shutting down broker can be elected as partition leader in KRaft
                 Key: KAFKA-13944
                 URL: https://issues.apache.org/jira/browse/KAFKA-13944
             Project: Kafka
          Issue Type: Bug
            Reporter: Jason Gustafson


When a broker requests shutdown, it transitions to the CONTROLLED_SHUTDOWN state in the controller. It is possible for the broker to remain unfenced in this state until the controlled shutdown completes. When doing an election, the only thing we generally check is that the broker is unfenced, so this means we can elect a broker that is in controlled shutdown. 

Here are a few snippets from a recent system test in which this occurred:
{code:java}
// broker 2 starts controlled shutdown
[2022-05-26 21:17:26,451] INFO [Controller 3001] Unfenced broker 2 has requested and been granted a controlled shutdown. (org.apache.kafka.controller.BrokerHeartbeatManager)
 
// there is only one replica, so we set leader to -1
[2022-05-26 21:17:26,452] DEBUG [Controller 3001] partition change for _foo-1 with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: 2 -> -1, leaderEpoch: 0 -> 1, partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager)

// controlled shutdown cannot complete immediately
[2022-05-26 21:17:26,529] DEBUG [Controller 3001] The request from broker 2 to shut down can not yet be granted because the lowest active offset 177 is not greater than the broker's shutdown offset 244. (org.apache.kafka.controller.BrokerHeartbeatManager)
[2022-05-26 21:17:26,530] DEBUG [Controller 3001] Updated the controlled shutdown offset for broker 2 to 244. (org.apache.kafka.controller.BrokerHeartbeatManager)

// later on we elect leader 2 again
[2022-05-26 21:17:27,703] DEBUG [Controller 3001] partition change for _foo-1 with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: -1 -> 2, leaderEpoch: 1 -> 2, partitionEpoch: 1 -> 2 (org.apache.kafka.controller.ReplicationControlManager)

// now controlled shutdown is stuck because of the newly elected leader
[2022-05-26 21:17:28,531] DEBUG [Controller 3001] Broker 2 is in controlled shutdown state, but can not shut down because more leaders still need to be moved. (org.apache.kafka.controller.BrokerHeartbeatManager)
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)