You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Manikumar (JIRA)" <ji...@apache.org> on 2018/04/18 11:24:00 UTC

[jira] [Commented] (KAFKA-5813) Unexpected unclean leader election due to leader/controller's unusual event handling order

    [ https://issues.apache.org/jira/browse/KAFKA-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442343#comment-16442343 ] 

Manikumar commented on KAFKA-5813:
----------------------------------

This might have fixed in async zk controller changes.

> Unexpected unclean leader election due to leader/controller's unusual event handling order 
> -------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-5813
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5813
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.10.2.1
>            Reporter: Allen Wang
>            Priority: Minor
>
> We experienced an unexpected unclean leader election after network glitch happened on the leader of partition. We have replication factor 2.
> Here is the sequence of event gathered from various logs:
> 1. ZK session timeout happens for leader of partition 
> 2. New ZK session is established for leader 
> 3. Leader removes the follower from ISR (which might be caused by replication delay due to the network problem) and updates the ISR in ZK 
> 4. Controller processes the BrokerChangeListener event happened at step 1 where the leader seems to be offline 
> 5. Because the ISR in ZK is already updated by leader to remove the follower, controller makes an unclean leader election 
> 6. Controller processes the second BrokerChangeListener event happened at step 2 to mark the broker online again
> It seems to me that step 4 happens too late. If it happens right after step 1, it will be a clean leader election and hopefully the producer will immediately switch to the new leader, thus avoiding consumer offset reset. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)