You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Michael Han (JIRA)" <ji...@apache.org> on 2017/03/13 16:22:10 UTC

[jira] [Updated] (ZOOKEEPER-2164) fast leader election keeps failing

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Han updated ZOOKEEPER-2164:
-----------------------------------
    Fix Version/s:     (was: 3.5.3)
                   3.5.4

> fast leader election keeps failing
> ----------------------------------
>
>                 Key: ZOOKEEPER-2164
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection
>    Affects Versions: 3.4.5
>            Reporter: Michi Mutsuzaki
>             Fix For: 3.5.4, 3.6.0
>
>
> I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. When I shut down 2, 1 and 3 keep going back to leader election. Here is what seems to be happening.
> - Both 1 and 3 elect 3 as the leader.
> - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a follower.
> - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't timeout for 5 seconds: https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346
> - By the time 3 receives votes, 1 has given up trying to connect to 3: https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247
> I'm using 3.4.5, but it looks like this part of the code hasn't changed for a while, so I'm guessing later versions have the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)