You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Frank Kelly (JIRA)" <ji...@apache.org> on 2015/11/30 16:01:11 UTC

[jira] [Commented] (ZOOKEEPER-1955) EOFException on Reading Snapshot

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031895#comment-15031895 ] 

Frank Kelly commented on ZOOKEEPER-1955:
----------------------------------------

I am seeing this too on ZooKeeper 3.4.6

{noformat}
2015-11-26 18:51:58,514 [myid:3] - ERROR [LearnerHandler-/54.172.221.124:57966:LearnerHandler@633] - Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at java.io.DataInputStream.readInt(Unknown Source)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546)
2015-11-26 18:51:58,514 [myid:3] - WARN  [LearnerHandler-/54.172.221.124:57966:LearnerHandler@646] - ******* GOODBYE /54.172.221.124:57966 ********

{noformat}

> EOFException on Reading Snapshot
> --------------------------------
>
>                 Key: ZOOKEEPER-1955
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1955
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.5
>            Reporter: Aaron Zimmerman
>         Attachments: snapshot
>
>
> We have a 5 node zookeeper cluster that has been operating normally for several months.  Starting a few days ago, the entire cluster crashes a few times per day, all nodes at the exact same time.  We can't track down the exact issue, but deleting the snapshots and logs and restarting allows the cluster to come back up.  
> We are running exhibitor to monitor the cluster.  
> It appears that something bad gets into the logs, causing an EOFException and this cascades through the entire cluster:
> 2014-07-04 12:55:26,328 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
>         at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
> 2014-07-04 12:55:26,328 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
>         at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
> Then the server dies, exhibitor tries to restart each node, and they all get stuck trying to replay the bad transaction, logging things like:
>  
> 2014-07-04 12:58:52,734 [myid:1] - INFO  [main:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/snapshot.300011fc0
> 2014-07-04 12:58:52,896 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@575] - Created new input stream /var/lib/zookeeper/version-2/log.300000021
> 2014-07-04 12:58:52,915 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@578] - Created new input archive /var/lib/zookeeper/version-2/log.300000021
> 2014-07-04 12:59:25,870 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@618] - EOF excepton java.io.EOFException: Failed to read /var/lib/zookeeper/version-2/log.300000021
> 2014-07-04 12:59:25,871 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@575] - Created new input stream /var/lib/zookeeper/version-2/log.300011fc2
> 2014-07-04 12:59:25,872 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@578] - Created new input archive /var/lib/zookeeper/version-2/log.300011fc2
> 2014-07-04 12:59:48,722 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@618] - EOF excepton java.io.EOFException: Failed to read /var/lib/zookeeper/version-2/log.300011fc2
> And the cluster is dead.  The only way we have found to recover is to delete all of the data and restart.
> [~fournc] Appreciate any assistance you can offer.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)