You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Camille Fournier (JIRA)" <ji...@apache.org> on 2014/06/28 18:03:25 UTC
[jira] [Assigned] (ZOOKEEPER-1900) NullPointerException in
truncate
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Camille Fournier reassigned ZOOKEEPER-1900:
-------------------------------------------
Assignee: Camille Fournier
> NullPointerException in truncate
> ---------------------------------
>
> Key: ZOOKEEPER-1900
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1900
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.5, 3.4.6
> Environment: linux java 1.6
> Reporter: Steven Bower
> Assignee: Camille Fournier
> Priority: Blocker
> Fix For: 3.4.7, 3.5.0
>
>
> The other day we started up a ZK instance that had been down for a bit (1day) and started getting NPEs all over the place...
> {noformat}
> 2014-20-03 11:15:42.320 INFO QuorumPeerConfig [main] - Reading configuration from: /xxx/bin/zk/etc/zk.cfg
> 2014-20-03 11:15:42.350 INFO QuorumPeerConfig [main] - Defaulting to majority quorums
> 2014-20-03 11:15:42.353 INFO DatadirCleanupManager [main] - autopurge.snapRetainCount set to 3
> 2014-20-03 11:15:42.353 INFO DatadirCleanupManager [main] - autopurge.purgeInterval set to 0
> 2014-20-03 11:15:42.353 INFO DatadirCleanupManager [main] - Purge task is not scheduled.
> 2014-20-03 11:15:42.385 INFO QuorumPeerMain [main] - Starting quorum peer
> 2014-20-03 11:15:42.399 INFO NIOServerCnxnFactory [main] - binding to port 0.0.0.0/0.0.0.0:5555
> 2014-20-03 11:15:42.413 INFO QuorumPeer [main] - tickTime set to 2000
> 2014-20-03 11:15:42.413 INFO QuorumPeer [main] - minSessionTimeout set to -1
> 2014-20-03 11:15:42.413 INFO QuorumPeer [main] - maxSessionTimeout set to -1
> 2014-20-03 11:15:42.413 INFO QuorumPeer [main] - initLimit set to 10
> 2014-20-03 11:15:42.456 INFO FileSnap [main] - Reading snapshot /xxx/zk_data/version-2/snapshot.2c00000000
> 2014-20-03 11:15:42.463 INFO QuorumCnxManager [Thread-3] - My election bind port: 0.0.0.0/0.0.0.0:7555
> 2014-20-03 11:15:42.470 INFO QuorumPeer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - LOOKING
> 2014-20-03 11:15:42.471 INFO FastLeaderElection [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - New election. My id = 3, proposed zxid=0x8000000000000000
> 2014-20-03 11:15:42.479 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 11:15:42.479 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 11:15:42.482 INFO QuorumCnxManager [WorkerSender[myid=3]] - Have smaller server identifier, so dropping the connection: (5, 3)
> 2014-20-03 11:15:42.482 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), LEADING (n.state), 2 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 11:15:42.482 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), LEADING (n.state), 2 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 11:15:42.482 INFO QuorumPeer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - OBSERVING
> 2014-20-03 11:15:42.486 INFO Learner [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - TCP NoDelay set to: true
> 2014-20-03 11:15:42.488 INFO QuorumCnxManager [host1/###.###.###.###:7555] - Received connection request /###.###.###.###:64528
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:host.name=host1
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:java.version=1.6.0_20
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:java.vendor=Sun Microsystems Inc.
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:java.home=/xxx/util/common/jdk1.6.0_20_64bit/jre
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:java.class.path=/xxx/bin/zk/etc:/xxx/bin/zk/lib/slf4j-log4j12-1.7.2.jar:/xxx/bin/zk/lib/jline-0.9.94.jar:/xxx/bin/zk/lib/jul-to-slf4j-1.7.2.jar:/xxx/bin/zk/lib/ZooInspector-3.4.5.jar:/xxx/bin/zk/lib/jcl-over-slf4j-1.7.2.jar:/xxx/bin/zk/lib/log4j-1.2.17.jar:/xxx/bin/zk/lib/zookeeper-3.4.5.jar:/xxx/bin/zk/lib/slf4j-api-1.7.2.jar:/xxx/bin/zk/lib/netty-3.2.2.Final.jar
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:java.library.path=/xxx/util/common/jdk1.6.0_20_64bit/jre/lib/amd64/server:/xxx/util/common/jdk1.6.0_20_64bit/jre/lib/amd64:/xxx/util/common/jdk1.6.0_20_64bit/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:java.io.tmpdir=/tmp
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:java.compiler=<NA>
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:os.name=Linux
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:os.arch=amd64
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:os.version=2.6.32-220.2.1.el6.x86_64
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:user.name=op
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:user.home=/xxx/bin
> 2014-20-03 11:15:42.490 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:user.dir=/xxx/bin
> 2014-20-03 11:15:42.491 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /xxx/zk_log/version-2 snapdir /xxx/zk_data/version-2
> 2014-20-03 11:15:42.493 INFO Learner [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Observing host4/###.###.###.###:6555
> 2014-20-03 11:15:42.495 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING (n.state), 5 (n.sid), 0x2b (n.peerEPoch), OBSERVING (my state)
> 2014-20-03 11:15:42.498 WARN Learner [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Truncating log to get in sync with the leader 0x2b00000002
> 2014-20-03 11:15:42.499 WARN QuorumPeer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Unexpected exception
> java.lang.NullPointerException
> at org.apache.zookeeper.server.persistence.FileTxnLog.truncate(FileTxnLog.java:352)
> at org.apache.zookeeper.server.persistence.FileTxnSnapLog.truncateLog(FileTxnSnapLog.java:259)
> at org.apache.zookeeper.server.ZKDatabase.truncateLog(ZKDatabase.java:438)
> at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:339)
> at org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:72)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
> 2014-20-03 11:15:42.500 INFO Learner [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - shutdown called
> java.lang.Exception: shutdown Observer
> at org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:731)
> 2014-20-03 11:15:42.500 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - shutting down
> 2014-20-03 11:15:42.500 INFO QuorumPeer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - LOOKING
> 2014-20-03 11:15:42.501 INFO FastLeaderElection [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - New election. My id = 3, proposed zxid=0x8000000000000000
> 2014-20-03 11:15:42.503 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 11:15:42.503 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), LEADING (n.state), 2 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 11:15:42.503 INFO QuorumPeer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - OBSERVING
> 2014-20-03 11:15:42.503 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /xxx/zk_log/version-2 snapdir /xxx/zk_data/version-2
> 2014-20-03 11:15:42.504 INFO Learner [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Observing host4/###.###.###.###:6555
> 2014-20-03 11:15:42.504 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING (n.state), 5 (n.sid), 0x2b (n.peerEPoch), OBSERVING (my state)
> 2014-20-03 11:15:42.514 INFO FileSnap [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Reading snapshot /xxx/zk_data/version-2/snapshot.2c00000000
> 2014-20-03 11:15:42.517 WARN Learner [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Truncating log to get in sync with the leader 0x2b00000002
> 2014-20-03 11:15:42.518 WARN QuorumPeer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Unexpected exception
> java.lang.NullPointerException
> at org.apache.zookeeper.server.persistence.FileTxnLog.truncate(FileTxnLog.java:352)
> at org.apache.zookeeper.server.persistence.FileTxnSnapLog.truncateLog(FileTxnSnapLog.java:259)
> at org.apache.zookeeper.server.ZKDatabase.truncateLog(ZKDatabase.java:438)
> at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:339)
> at org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:72)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
> {noformat}
> This exception went on and on over and over again (more than 1M times in a day) until it then began spewing this exception:
> {noformat}
> 2014-20-03 13:45:32.843 INFO QuorumPeer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - LOOKING
> 2014-20-03 13:45:32.843 INFO FastLeaderElection [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - New election. My id = 3, proposed zxid=0x8000000000000000
> 2014-20-03 13:45:32.844 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 13:45:32.845 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), LEADING (n.state), 2 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)
> 2014-20-03 13:45:32.845 INFO QuorumPeer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - OBSERVING
> 2014-20-03 13:45:32.845 INFO FastLeaderElection [WorkerReceiver[myid=3]] - Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING (n.state), 5 (n.sid), 0x2b (n.peerEPoch), OBSERVING (my state)
> 2014-20-03 13:45:32.845 INFO ZooKeeperServer [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /xxx/zk_log/version-2 snapdir /xxx/zk_data/version-2
> 2014-20-03 13:45:32.845 INFO Learner [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Observing host4/###.###.###.###:6555
> 2014-20-03 13:45:32.853 WARN Learner [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Unexpected exception, tries=0, connecting to host4/###.###.###.###:6555
> java.net.ConnectException: Cannot assign requested address
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
> at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> at java.net.Socket.connect(Socket.java:529)
> at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:224)
> at org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:69)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
> 2014-20-03 13:45:33.863 INFO FileSnap [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Reading snapshot /xxx/zk_data/version-2/snapshot.2c00000000
> {noformat}
> This exception for a while was interspersed with the NPEs but eventually it just was spewing the ConnectionException.
> Looking through the code a bit it seems if the FileTxnIterator when initialized cannot find any log files the {{inputStream}} is set to null which causes truncate() to NPE.. I see in 3.4.6 this has been wrapped in a try/finally which closes the iterator.. but i presume that this issue would still remain.
> Looking at the system in this state there were 29k+ sockets in CLOSE_WAIT state on the system and looking at a heap dump there were tons of Socket objects waiting for GC (ie not getting properly closed).. this eventually ran the system out of ephemeral ports and hence the ConnectionExceptions..
> It would seem that a quick check of {{itr.next()}} prior to attempting truncation would resolve the NPE, but it seems somewhere a connection is not being closed properly when an exception occurs.
--
This message was sent by Atlassian JIRA
(v6.2#6252)