You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/12/10 20:34:10 UTC
[jira] [Commented] (KAFKA-2980) ZookeeperConsumerConnector may enter deadlock if a rebalance occurs during a stream creation.

    [ https://issues.apache.org/jira/browse/KAFKA-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051512#comment-15051512 ] 

ASF GitHub Bot commented on KAFKA-2980:
---------------------------------------

GitHub user becketqin reopened a pull request:

    https://github.com/apache/kafka/pull/660

    KAFKA-2980 Fix deadlock when ZookeeperConsumerConnector create messag…

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/becketqin/kafka KAFKA-2980

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/660.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #660
    
----
commit 6ad40206f354512b1f2db1e3784754ea29415ce7
Author: Jiangjie Qin <be...@gmail.com>
Date:   2015-12-10T19:08:15Z

    KAKFA-2980 Fix deadlock when ZookeeperConsumerConnector create message streams.

----


> ZookeeperConsumerConnector may enter deadlock if a rebalance occurs during a stream creation.
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2980
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2980
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>
> The following sequence caused problems:
> 1. Multiple ZookeeperConsumerConnector in the same group start at the same time.
> 2. The user consumer thread called createMessageStreamsByFilter()
> 3. Right before the user consumer thread enters syncedRebalance(), a rebalance was triggered by another consumer joining the group.
> 4. Because the watcher executor has been up and running at this point, the executor watcher will start to rebalance. Now both the user consumer thread and the executor watcher are trying to rebalance.
> 5. The executor watcher wins this time. It finishes the rebalance, so the fetchers started to run.
> 6. After that the user consumer thread will try to rebalance again, but it blocks when trying to stop the fetchers. Since the fetcher threads are blocked on putting data chunk into data chunk queue.
> 7. In this case, because there is no thread taking messages out of data chunk queue, the fetcher thread will not be able to make process. Neither does the user consumer thread. So we have a deadlock here.
> The current code works if there is no fetcher thread running when createMessageStreams/createMessageStreamsByFilter is called. The simple fix is to let those two methods acquire the rebalance lock.
> Although it is a fix to old consumer, but since the fix is quite small and important for people who are still using old consumer. I think it still worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)