You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Guozhang Wang (JIRA)" <ji...@apache.org> on 2016/11/09 18:38:58 UTC

[jira] [Resolved] (KAFKA-4360) Controller may deadLock when autoLeaderRebalance encounter zk expired

     [ https://issues.apache.org/jira/browse/KAFKA-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guozhang Wang resolved KAFKA-4360.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 0.10.2.0

Issue resolved by pull request 2094
[https://github.com/apache/kafka/pull/2094]

> Controller may deadLock when autoLeaderRebalance encounter zk expired
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-4360
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4360
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>            Reporter: Json Tu
>              Labels: bugfix
>             Fix For: 0.10.2.0
>
>         Attachments: deadlock_patch, yf-mafka2-common02_jstack.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> when controller has checkAndTriggerPartitionRebalance task in autoRebalanceScheduler,and then zk expired at that time. It will
> run into deadlock.
> we can restore the scene as below,when zk session expired,zk thread will call handleNewSession which defined in SessionExpirationListener, and it will get controllerContext.controllerLock,and then it will autoRebalanceScheduler.shutdown(),which need complete all the task in the autoRebalanceScheduler,but that threadPoll also need get controllerContext.controllerLock,but it has already owned by zk callback thread,which will then run into deadlock.
> because of that,it will cause two problems at least, first is the broker’s id is cannot register to the zookeeper,and it will be considered as dead by new controller,second this procedure can not be stop by kafka-server-stop.sh, because shutdown function
> can not get controllerContext.controllerLock also, we cannot shutdown kafka except using kill -9.
> In my attachment, I upload a jstack file, which was created when my kafka procedure cannot shutdown by kafka-server-stop.sh.
> I have met this scenes for several times,I think this may be a bug that not solved in kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)