You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Damien Diederen (Jira)" <ji...@apache.org> on 2020/12/30 20:07:00 UTC
[jira] [Resolved] (ZOOKEEPER-4039) accpetedEpoch过大导致对应的节点无法加入集群
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Damien Diederen resolved ZOOKEEPER-4039.
----------------------------------------
Assignee: Damien Diederen
Resolution: Duplicate
> accpetedEpoch过大导致对应的节点无法加入集群
> ----------------------------
>
> Key: ZOOKEEPER-4039
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4039
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.5
> Reporter: pengfei
> Assignee: Damien Diederen
> Priority: Major
> Attachments: image-2020-12-28-17-54-09-661.png, image-2020-12-28-17-58-11-673.png, image-2020-12-28-18-01-46-005.png, image-2020-12-28-18-02-21-563.png, image-2020-12-28-18-03-58-557.png
>
>
> !image-2020-12-28-17-54-09-661.png!
> leader会在收到过半的节点的accpetedEpoch后会将本身的accpetedEpoch设置为这些节点的最大值加1,但是此时leader宕机会导致leader节点的accpetedEpoch比其他节点大1,然后此节点再重启,再次被选为leader,再次宕机,然后剩下的节点再重新选举一个leader,这个leader的epoch会比原来的leader的accpetedEpoch要小,从而导致原来的节点一直在looking和follower状态切换
>
> h4. 复现步骤:
> 3个节点,server1,server2,server3
> * 启动server1,server2,然后在下面红点位置停止server1和server2此时server2的对应的accpetedEpoch=1 !image-2020-12-28-18-01-46-005.png!
> * 再启动server1,server2,然后再在下面红点位置停止server1和server2此时server2的对应的accpetedEpoch=2 !image-2020-12-28-18-02-21-563.png!
> * 再启动server1,server3,等server1和server3选举出对应的leader为server3,然后再启动server2,就会一直重复下面的异常 !image-2020-12-28-18-03-58-557.png!
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)