You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jason Gustafson (Jira)" <ji...@apache.org> on 2022/10/13 20:34:00 UTC

[jira] [Resolved] (KAFKA-14292) KRaft broker controlled shutdown can be delayed indefinitely

     [ https://issues.apache.org/jira/browse/KAFKA-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Gustafson resolved KAFKA-14292.
-------------------------------------
    Fix Version/s: 3.4.0
                   3.3.2
       Resolution: Fixed

> KRaft broker controlled shutdown can be delayed indefinitely
> ------------------------------------------------------------
>
>                 Key: KAFKA-14292
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14292
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Alyssa Huang
>            Priority: Major
>             Fix For: 3.4.0, 3.3.2
>
>
> We noticed when rolling a kraft cluster that it took an unexpectedly long time for one of the brokers to shutdown. In the logs, we saw the following:
> {code:java}
> Oct 11, 2022 @ 17:53:38.277	[Controller 1] The request from broker 8 to shut down can not yet be granted because the lowest active offset 2283357 is not greater than the broker's shutdown offset 2283358.	org.apache.kafka.controller.BrokerHeartbeatManager	DEBUG	
> 2Oct 11, 2022 @ 17:53:38.277	[Controller 1] Updated the controlled shutdown offset for broker 8 to 2283362.	org.apache.kafka.controller.BrokerHeartbeatManager	DEBUG	
> 3Oct 11, 2022 @ 17:53:40.278	[Controller 1] Updated the controlled shutdown offset for broker 8 to 2283366.	org.apache.kafka.controller.BrokerHeartbeatManager	DEBUG	
> 4Oct 11, 2022 @ 17:53:40.278	[Controller 1] The request from broker 8 to shut down can not yet be granted because the lowest active offset 2283361 is not greater than the broker's shutdown offset 2283362.	org.apache.kafka.controller.BrokerHeartbeatManager	DEBUG	
> 5Oct 11, 2022 @ 17:53:42.279	[Controller 1] The request from broker 8 to shut down can not yet be granted because the lowest active offset 2283365 is not greater than the broker's shutdown offset 2283366.	org.apache.kafka.controller.BrokerHeartbeatManager	DEBUG	
> 6Oct 11, 2022 @ 17:53:42.279	[Controller 1] Updated the controlled shutdown offset for broker 8 to 2283370.	org.apache.kafka.controller.BrokerHeartbeatManager	DEBUG	
> 7Oct 11, 2022 @ 17:53:44.280	[Controller 1] The request from broker 8 to shut down can not yet be granted because the lowest active offset 2283369 is not greater than the broker's shutdown offset 2283370.	org.apache.kafka.controller.BrokerHeartbeatManager	DEBUG	
> 8Oct 11, 2022 @ 17:53:44.281	[Controller 1] Updated the controlled shutdown offset for broker 8 to 2283374.	org.apache.kafka.controller.BrokerHeartbeatManager	DEBUG	 {code}
> From what I can tell, it looks like the controller waits until all brokers have caught up to the {{controlledShutdownOffset}} of the broker that is shutting down before allowing it to proceed. Probably the intent is to make sure they have all the leader and ISR state.
> The problem is that the {{controlledShutdownOffset}} seems to be updated after every heartbeat that the controller receives: https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L1996. Unless all other brokers can catch up to that offset before the next heartbeat from the shutting down broker is received, then the broker remains in the shutting down state indefinitely.
> In this case, it took more than 40 minutes before the broker completed shutdown:
> {code:java}
> 1Oct 11, 2022 @ 18:36:36.105	[Controller 1] The request from broker 8 to shut down has been granted since the lowest active offset 2288510 is now greater than the broker's controlled shutdown offset 2288510.	org.apache.kafka.controller.BrokerHeartbeatManager	INFO	
> 2Oct 11, 2022 @ 18:40:35.197	[Controller 1] The request from broker 8 to unfence has been granted because it has caught up with the offset of it's register broker record 2288906.	org.apache.kafka.controller.BrokerHeartbeatManager	INFO{code}
> It seems like the bug here is that we should not keep updating {{controlledShutdownOffset}} if it has already been set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)