You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Swapnil Ghike (JIRA)" <ji...@apache.org> on 2013/08/06 23:34:48 UTC

[jira] [Updated] (KAFKA-999) Controlled shutdown never succeeds until the broker is killed

     [ https://issues.apache.org/jira/browse/KAFKA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swapnil Ghike updated KAFKA-999:
--------------------------------

    Attachment: kafka-991-v1.patch

Small-scale fix for 0.8. 
                
> Controlled shutdown never succeeds until the broker is killed
> -------------------------------------------------------------
>
>                 Key: KAFKA-999
>                 URL: https://issues.apache.org/jira/browse/KAFKA-999
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>         Attachments: kafka-991-v1.patch
>
>
> A race condition in the way leader and isr request is handled by the broker and controlled shutdown can lead to a situation where controlled shutdown can never succeed and the only way to bounce the broker is to kill it.
> The root cause is that broker uses a smart to avoid fetching from a leader that is not alive according to the controller. This leads to the broker aborting a become follower request. And in cases where replication factor is 2, the leader can never be transferred to a follower since it keeps rejecting the become follower request and stays out of the ISR. This causes controlled shutdown to fail forever
> One sequence of events that led to this bug is as follows -
> - Broker 2 is leader and controller
> - Broker 2 is bounced (uncontrolled shutdown)
> - Controller fails over
> - Controlled shutdown is invoked on broker 1
> - Controller starts leader election for partitions that broker 2 led
> - Controller sends become follower request with leader as broker 1 to broker 2. At the same time, it does not include broker 1 in alive broker list sent as part of leader and isr request
> - Broker 2 rejects leaderAndIsr request since leader is not in the list of alive brokers
> - Broker 1 fails to transfer leadership to broker 2 since broker 2 is not in ISR
> - Controlled shutdown can never succeed on broker 1
> Since controlled shutdown is a config option, if there are bugs in controlled shutdown, there is no option but to kill the broker

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira