You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Ismael Juma (JIRA)" <ji...@apache.org> on 2018/05/09 06:54:00 UTC

[jira] [Resolved] (KAFKA-6879) Controller deadlock following session expiration

     [ https://issues.apache.org/jira/browse/KAFKA-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ismael Juma resolved KAFKA-6879.
--------------------------------
    Resolution: Fixed

> Controller deadlock following session expiration
> ------------------------------------------------
>
>                 Key: KAFKA-6879
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6879
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 1.1.0
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Critical
>             Fix For: 2.0.0, 1.1.1
>
>
> We have observed an apparent deadlock situation which occurs following a session expiration. The suspected deadlock occurs between the zookeeper "initializationLock" and the latch inside the Expire event which we use to ensure all events have been handled.
> In the logs, we see the "Session expired" message following acquisition of the initialization lock: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/zookeeper/ZooKeeperClient.scala#L358
> But we never see any logs indicating that the new session is being initialized. In fact, the controller logs are basically empty from that point on. The problem we suspect is that completion of the {{beforeInitializingSession}} callback requires that all events have finished processing in order to count down the latch: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L1525.
> But an event which was dequeued just prior to the acquisition of the write lock may be unable to complete because it is awaiting acquisition of the initialization lock: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/zookeeper/ZooKeeperClient.scala#L137.
> The impact is that the broker continues in a zombie state. It continues fetching and is periodically added to ISRs, but it never receives any further requests from the controller since it is not registered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)