You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/12/21 21:55:58 UTC

[jira] [Commented] (KAFKA-2980) ZookeeperConsumerConnector may enter deadlock if a rebalance occurs during a stream creation.

    [ https://issues.apache.org/jira/browse/KAFKA-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768278#comment-15768278 ] 

ASF GitHub Bot commented on KAFKA-2980:
---------------------------------------

Github user becketqin closed the pull request at:

    https://github.com/apache/kafka/pull/660


> ZookeeperConsumerConnector may enter deadlock if a rebalance occurs during a stream creation.
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2980
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2980
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>
> The following sequence caused problems:
> 1. Multiple ZookeeperConsumerConnector in the same group start at the same time.
> 2. The user consumer thread called createMessageStreamsByFilter()
> 3. Right before the user consumer thread enters syncedRebalance(), a rebalance was triggered by another consumer joining the group.
> 4. Because the watcher executor has been up and running at this point, the executor watcher will start to rebalance. Now both the user consumer thread and the executor watcher are trying to rebalance.
> 5. The executor watcher wins this time. It finishes the rebalance, so the fetchers started to run.
> 6. After that the user consumer thread will try to rebalance again, but it blocks when trying to stop the fetchers. Since the fetcher threads are blocked on putting data chunk into data chunk queue.
> 7. In this case, because there is no thread taking messages out of data chunk queue, the fetcher thread will not be able to make process. Neither does the user consumer thread. So we have a deadlock here.
> The current code works if there is no fetcher thread running when createMessageStreams/createMessageStreamsByFilter is called. The simple fix is to let those two methods acquire the rebalance lock.
> Although it is a fix to old consumer, but since the fix is quite small and important for people who are still using old consumer. I think it still worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)