You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Germán Blanco (JIRA)" <ji...@apache.org> on 2013/09/12 14:49:51 UTC

[jira] [Commented] (ZOOKEEPER-1747) Zookeeper server fails to start if transaction log file is corrupted

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765392#comment-13765392 ] 

Germán Blanco commented on ZOOKEEPER-1747:
------------------------------------------

We have seen this issue also with inconsistent state between acceptedEpoch, currentEpoch and the transaction log. In that case the error is:
{noformat}
2013-09-12 12:30:51,586 [myid:10] - ERROR [main:QuorumPeer@453] - Unable to load database on disk
java.io.IOException: The current epoch, 6, is older than the last zxid, 34359738487
        at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:435)
        at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2013-09-12 12:30:51,587 [myid:10] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
        at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
        at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.io.IOException: The current epoch, 6, is older than the last zxid, 34359738487
        at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:435)
        ... 4 more
{noformat}
I guess "force-ignore" means that the server just ignores whatever is in the disk and starts with zxid=0, or?
                
> Zookeeper server fails to start if transaction log file is corrupted
> --------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1747
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1747
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.5
>         Environment: Solaris10/x86, Java 1.6
>            Reporter: Sergey Maslyakov
>
> On multiple occasions when ZK was not able to write out a transaction log or a snapshot file, the consequent attempt to restart the server fails. Usually it happens when the underlying file system filled up; thus, preventing ZK server from writing out consistent data file.
> Upon start-up, the server reads in the snapshot and the transaction log. If the deserializer fails and throws an exception, server terminates. Please see the stack trace below.
> Server not coming up for whatever reason is often an undesirable condition. It would be nice to have an option to force-ignore parsing errors, especially, in the transaction log. A check sum on the data could be a possible solution to ensure the integrity and "parsability".
> Another robustness enhancement could be via proper handling of the condition when snapshot or transaction log cannot be completely written to disk. Basically, better handling of write errors.
> {noformat}
> 2013-08-28 12:05:30,732 ERROR [ZooKeeperServerMain] Unexpected exception, exiting abnormally
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
>         at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
>         at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
>         at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
>         at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
>         at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:160)
>         at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>         at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
>         at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:383)
>         at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
>         at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
>         at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
>         at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
>         at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:129)
>         at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira