You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "pengfei (Jira)" <ji...@apache.org> on 2021/01/12 03:21:00 UTC

[jira] [Updated] (ZOOKEEPER-4040) java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

pengfei updated ZOOKEEPER-4040:
-------------------------------
    Affects Version/s: 3.6.2

> java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2
> --------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4040
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4040
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.5, 3.6.2
>            Reporter: pengfei
>            Priority: Major
>         Attachments: image-2020-12-28-18-20-07-842.png, image-2020-12-28-18-23-14-073.png, image-2020-12-28-18-25-31-960.png, image-2020-12-28-18-28-07-015.png
>
>
> h4. Overview (mechanically translated from ZOOKEEPER-4039):
> The acceptedEpoch is too large and the corresponding node cannot join the cluster
> After the leader receives the acceptedEpoch of more than half of the nodes, it will set its acceptedEpoch to the maximum value of these nodes plus 1, but at this time, the leader’s downtime will cause the leader node’s acceptedEpoch to be 1 larger than other nodes, and then this node will restart again Be elected as the leader, go down again, and then the remaining nodes re-elect a leader. The epoch of this leader will be smaller than the acceptedEpoch of the original leader, which causes the original node to always look and switch the follower state
> Steps to reproduce:
> 3 nodes, server1, server2, server3
> Start server1, server2, and then stop server1 and server2 at the red dot below. At this time, the corresponding acceptedEpoch=1 of server2
> Restart server1, server2, and then stop server1 and server2 at the red dot below. At this time, the corresponding acceptedEpoch=2 of server2
> Restart server1, server3, wait for server1 and server3 to elect the corresponding leader as server3, and then start server2, the following exception will be repeated
> h4. errorlog:
> java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2 at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:353) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)2020-12-28 18:09:25,176 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2182)(secure=disabled):Follower@201] - shutdown calledjava.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275)
>  
> h4. sample:
> cluster all servers server1,server2,server3
>  * start server1 and server2 ,then shutdown them when they arrive below, now the accpetedEpoch of server2 is 1 , server1 is 0, server3 is 0  !image-2020-12-28-18-23-14-073.png!
>  * then repeat step 1 , now the accpetedEpoch of server1 is 0,server2 is 2,server3 is 0  !image-2020-12-28-18-25-31-960.png!
>  * then start server1 and server3 , wait unti the leader of the cluster is server3 , start server2 ,now generate the error below  !image-2020-12-28-18-28-07-015.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)