You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "pengfei (Jira)" <ji...@apache.org> on 2020/12/28 10:03:00 UTC

[jira] [Created] (ZOOKEEPER-4039) accpetedEpoch过大导致对应的节点无法加入集群

pengfei created ZOOKEEPER-4039:
----------------------------------

             Summary: accpetedEpoch过大导致对应的节点无法加入集群
                 Key: ZOOKEEPER-4039
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4039
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.5.5
            Reporter: pengfei
         Attachments: image-2020-12-28-17-54-09-661.png, image-2020-12-28-17-58-11-673.png, image-2020-12-28-18-01-46-005.png, image-2020-12-28-18-02-21-563.png, image-2020-12-28-18-03-58-557.png

!image-2020-12-28-17-54-09-661.png!

leader会在收到过半的节点的accpetedEpoch后会将本身的accpetedEpoch设置为这些节点的最大值加1,但是此时leader宕机会导致leader节点的accpetedEpoch比其他节点大1,然后此节点再重启,再次被选为leader,再次宕机,然后剩下的节点再重新选举一个leader,这个leader的epoch会比原来的leader的accpetedEpoch要小,从而导致原来的节点一直在looking和follower状态切换

 
h4. 复现步骤:

3个节点,server1,server2,server3
 * 启动server1,server2,然后在下面红点位置停止server1和server2此时server2的对应的accpetedEpoch=1 !image-2020-12-28-18-01-46-005.png!
 * 再启动server1,server2,然后再在下面红点位置停止server1和server2此时server2的对应的accpetedEpoch=2 !image-2020-12-28-18-02-21-563.png!
 * 再启动server1,server3,等server1和server3选举出对应的leader为server3,然后再启动server2,就会一直重复下面的异常 !image-2020-12-28-18-03-58-557.png!

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)