You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "shixiaoxiao (Jira)" <ji...@apache.org> on 2021/11/25 07:02:00 UTC

[jira] [Commented] (ZOOKEEPER-2104) Sudden crash of all nodes in the cluster

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448968#comment-17448968 ] 

shixiaoxiao commented on ZOOKEEPER-2104:
----------------------------------------

I encountered the same problem. Four followers have this problem at the same time. 
In my environment,the snapshot is very large  6.4G, also log size is 1.4G
my snapshot and  log in the same disk, But the capacity is very sufficient(I don’t know why the socket problem involves the disk,but Previous commenter mentioned this question)
2021-11-25 07:35:07,589 [myid:5] - WARN  [LearnerHandler-/10.46.225.153:54516:LearnerHandler@661] - ******* GOODBYE /10.46.225.153:54516 ********
2021-11-25 07:35:07,589 [myid:5] - ERROR [LearnerHandler-/10.46.225.154:60634:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
2021-11-25 07:35:07,588 [myid:5] - ERROR [LearnerHandler-/10.46.225.155:43302:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
2021-11-25 07:35:07,588 [myid:5] - ERROR [LearnerHandler-/10.46.225.168:55908:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
        at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
2021-11-25 07:35:07,588 [myid:5] - INFO  [SessionTracker:ZooKeeperServer@355] - Expiring session 0x4068529b2920014, timeout of 30000ms exceeded
2021-11-25 07:35:07,589 [myid:5] - WARN  [LearnerHandler-/10.46.225.168:55908:LearnerHandler@661] - ******* GOODBYE /10.46.225.168:55908 ********
2021-11-25 07:35:07,589 [myid:5] - WARN  [LearnerHandler-/10.46.225.155:43302:LearnerHandler@661] - ******* GOODBYE /10.46.225.155:43302 ********
2021-11-25 07:35:07,589 [myid:5] - WARN  [LearnerHandler-/10.46.225.154:60634:LearnerHandler@661] - ******* GOODBYE /10.46.225.154:60634 ********




> Sudden crash of all nodes in the cluster
> ----------------------------------------
>
>                 Key: ZOOKEEPER-2104
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2104
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.6
>            Reporter: Benjamin Jaton
>            Priority: Major
>         Attachments: zookeeper-errors.txt, zookeeper-warns.txt
>
>
> In a 3 nodes ensemble, suddenly all the nodes seem to fail, displaying "ZooKeeper is not running" messages.
> Not retry seems to be happening after that.
> This a request to understand what happened and probably to improve the logs when it does.
> See logs below:
> NODE1:
> -- no log for several days before this --
> 2015-01-04 16:18:22,259 [myid:1] - WARN  [SyncThread:1:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:1 took 11024ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2015-01-04 16:18:22,380 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
>         at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>         at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:23,384 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
> 2015-01-04 16:18:23,492 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
> 2015-01-04 16:18:24,060 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
> NODE2:
> -- no log for several days before this --
> 2015-01-04 16:18:21,899 [myid:3] - WARN  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
>         at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>         at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:22,760 [myid:3] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
> 2015-01-04 16:18:22,801 [myid:3] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
> 2015-01-04 16:18:22,886 [myid:3] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
> NODE3 (leader):
> -- no log for several days before this --
> 2015-01-04 16:18:21,897 [myid:2] - WARN  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,898 [myid:2] - WARN  [LearnerHandler-/204.53.107.249:43402:LearnerHandler@646] - ******* GOODBYE /204.53.107.249:43402 ********
> 2015-01-04 16:18:21,905 [myid:2] - WARN  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,907 [myid:2] - WARN  [LearnerHandler-/204.53.107.247:45953:LearnerHandler@646] - ******* GOODBYE /204.53.107.247:45953 ********
> 2015-01-04 16:18:21,918 [myid:2] - WARN  [LearnerHandler-/204.53.107.247:45953:LearnerHandler@658] - Ignoring unexpected exception
> java.lang.InterruptedException
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>         at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>         at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>         at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)
>         at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)
> 2015-01-04 16:18:23,003 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
> 2015-01-04 16:18:23,007 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
> 2015-01-04 16:18:23,115 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running



--
This message was sent by Atlassian Jira
(v8.20.1#820001)