You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Raul Gutierrez Segales (JIRA)" <ji...@apache.org> on 2015/12/06 21:01:10 UTC

[jira] [Commented] (ZOOKEEPER-2307) ZooKeeper not starting because acceptedEpoch is less than the currentEpoch

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044099#comment-15044099 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2307:
---------------------------------------------------

The patch generally lgtm, a few nits and observations:

* do you mind adding java docs for QuorumPeer#setCurrentEpoch and QuorumPeer#setAcceptedEpoch? Specially, noting why they don't have to be synchronized would be useful (i.e.: they can only be called from the Leader and Learner classes and it all happens from one thread

* in src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerTestBase.java why is getQuorumPeer added? Can't see where it's used. Also, there's some whitespaces/tabs before the method...

* i think waitUptoAFileWriteTime() is a bit racy and hackish and will break in the future. as much as i dislike sleep() calls in tests, in this case it's probably alright to loop a few times until getAcceptedEpoch() and getCurrentEpoch() converge

Other than, it looks good. Thanks [~arshad.mohammad]!

cc [~fpj] for further thoughts. 

> ZooKeeper not starting because acceptedEpoch is less than the currentEpoch
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2307
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2307
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>            Reporter: Arshad Mohammad
>            Assignee: Arshad Mohammad
>            Priority: Critical
>             Fix For: 3.5.2
>
>         Attachments: ZOOKEEPER-2307-01.patch, ZOOKEEPER-2307-02.patch, ZOOKEEPER-2307-03.patch
>
>
> This issue occurred in one of our test environment where disk was being changed to read only very frequently.
> The the scenario is as follows:
> # Configure three node ZooKeeper cluster, lets say nodes are A, B and C
> # Start A and B. Both A and B start successfully, quorum is running.
> # Start C, because of IO error C fails to update acceptedEpoch file. But C also starts successfully, joins the quorum as follower
> # Stop C
> # Start C, bellow exception with message "The accepted epoch, 0 is less than the current epoch, 1" is thrown
> {code}
> 2015-10-29 16:52:32,942 [myid:3] - ERROR [main:QuorumPeer@784] - Unable to load database on disk
> java.io.IOException: The accepted epoch, 0 is less than the current epoch, 1
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:781)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:720)
> 	at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:202)
> 	at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:139)
> 	at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:88)
> 2015-10-29 16:52:32,946 [myid:3] - ERROR [main:QuorumPeerMain@111] - Unexpected exception, exiting abnormally
> java.lang.RuntimeException: Unable to run quorum server 
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:785)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:720)
> 	at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:202)
> 	at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:139)
> 	at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:88)
> Caused by: java.io.IOException: The accepted epoch, 0 is less than the current epoch, 1
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:781)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)