You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Young Xu (Jira)" <ji...@apache.org> on 2022/10/19 04:23:00 UTC

[jira] [Created] (ZOOKEEPER-4624) Zookeeper service cannot restarted because the IO Inject filesystem fd is used up.

Young Xu created ZOOKEEPER-4624:
-----------------------------------

             Summary: Zookeeper service cannot restarted because the IO Inject filesystem fd is used up.
                 Key: ZOOKEEPER-4624
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4624
             Project: ZooKeeper
          Issue Type: Bug
         Environment: environment: *{color:#FF0000}K8S{color}*

deployment: *{color:#FF0000}statefulset replicas 3{color}*

zookeeper version: *{color:#FF0000}3.8.0{color}*
            Reporter: Young Xu


We're running a chaos test. and we've using this scenarios:
ZooKeeper pod is deployed on three nodes. We use {color:#FF0000}*IO injection*{color} to fill up the fd of one node(test one pod), and filesytem all operations return "Too many files". After a period of time, the ZooKeeper service stops running. Then we stopped the injection. When I manually start the process again, the ZooKeeper reports an error.
{code:java}
2022-10-19 02:03:07,876 [myid:3] - INFO  [main:o.a.z.s.q.QuorumPeer@2549] - QuorumPeer communication is not secured! (SASL auth disabled)2022-10-19 02:03:07,876 [myid:3] - INFO  [main:o.a.z.s.q.QuorumPeer@2574] - quorum.cnxn.threads.size set to 202022-10-19 02:03:07,877 [myid:3] - INFO  [main:o.a.z.s.p.FileSnap@85] - Reading snapshot /home/edge/middleware/zookeeper/data/data/version-2/snapshot.1409ce9ac72022-10-19 02:03:07,883 [myid:3] - INFO  [main:o.a.z.s.DataTree@1705] - The digest in the snapshot has digest version of 2, with zxid as 0x1409ce9acc, and digest value as 816041257652022-10-19 02:03:11,662 [myid:3] - ERROR [main:o.a.z.s.q.QuorumPeer@1200] - Unable to load database on diskjava.io.EOFException: null    at java.base/java.io.DataInputStream.readInt(Unknown Source)    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:67)    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:707)    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:725)    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:693)    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:774)    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:361)    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:267)    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:312)    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285)    at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1146)    at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1132)    at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:229)    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:137)    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:91)2022-10-19 02:03:11,663 [myid:3] - INFO  [main:o.a.z.m.p.PrometheusMetricsProvider@570] - Shutdown executor service with timeout 10002022-10-19 02:03:11,739 [myid:3] - INFO  [main:o.e.j.s.AbstractConnector@383] - Stopped ServerConnector@5b03b9fe{HTTP/1.1, (http/1.1)}{zookeeper-default-2.zookeeper.default.svc.cluster.local:8080}2022-10-19 02:03:11,742 [myid:3] - INFO  [main:o.e.j.s.h.ContextHandler@1159] - Stopped o.e.j.s.ServletContextHandler@17bffc17{/,null,STOPPED}2022-10-19 02:03:11,746 [myid:3] - ERROR [main:o.a.z.s.q.QuorumPeerMain@114] - Unexpected exception, exiting abnormallyjava.lang.RuntimeException: Unable to run quorum server     at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1201)    at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1132)    at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:229)    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:137)    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:91)Caused by: java.io.EOFException: null    at java.base/java.io.DataInputStream.readInt(Unknown Source)    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)    at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:67)    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:707)    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:725)    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:693)    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:774)    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:361)    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:267)    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:312)    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285)    at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1146)    ... 4 common frames omitted2022-10-19 02:03:11,747 [myid:3] - INFO  [main:o.a.z.a.ZKAuditProvider@42] - ZooKeeper audit is disabled.2022-10-19 02:03:11,749 [myid:3] - ERROR [main:o.a.z.u.ServiceUtils@48] - Exiting JVM with code 1 {code}
Now I know delete data directory can fix this and get the service up and running. but I dont know why the file is corrupted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)