You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joel Koshy (JIRA)" <ji...@apache.org> on 2014/04/09 20:36:17 UTC

[jira] [Commented] (KAFKA-1310) Zookeeper timeout causes deadlock in Controller

    [ https://issues.apache.org/jira/browse/KAFKA-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964521#comment-13964521 ] 

Joel Koshy commented on KAFKA-1310:
-----------------------------------

Fixed by KAFKA-1317

> Zookeeper timeout causes deadlock in Controller
> -----------------------------------------------
>
>                 Key: KAFKA-1310
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1310
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Fedor Korotkiy
>            Assignee: Neha Narkhede
>            Priority: Blocker
>             Fix For: 0.8.1.1
>
>
> Steps to reproduce:
> 1. Checkout and build 0.8.1 branch from github:
> git clone git@github.com:apache/kafka.git && cd kafka && git checkout origin/0.8.1 && ./gradlew jar
> 2. Start zookeeper server:
> ./bin/zookeeper-server-start.sh config/zookeeper.properties
> 3. Start kafka server:
> ./bin/kafka-server-start.sh config/server.properties
> 4. Suspend zookeeper process for 10 seconds (ctrl-Z, then %1).
> 5. And kafka hasn't been re-registered in zookeeper.
> ./bin/zookeeper-shell.sh
> ls /brokers/ids
> >> []
> Root cause of the problem seems to be the deadlock between DeleteTopicsThread and SessionExpirationListener in KafkaController.
> 1. DeleteTopicsThread acquires controllerLock and await()-s on deleteTopicsCond in awaitTopicDeletionNotification()
> 2. SessionExpirationListener fires. It acquires controllerLock and tries to shutdown deleteTopicManager(in onControllerResignation()). This interrupts DeleteTopicsThread.
> 3. DeleteTopicsThread can't return from deleteTopicsCond.await() because controllerLock is taken. We got a deadlock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)