You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Yongcheng Liu (JIRA)" <ji...@apache.org> on 2016/04/12 15:32:25 UTC

[jira] [Created] (ZOOKEEPER-2412) leader zk out of memory, and leader db lastZxid is not update when process set data.

Yongcheng Liu created ZOOKEEPER-2412:
----------------------------------------

             Summary: leader zk out of memory,  and leader db lastZxid is not update when process set data. 
                 Key: ZOOKEEPER-2412
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2412
             Project: ZooKeeper
          Issue Type: Bug
         Environment: 3 zk, the follower (myid=6) server is injected into the network delay fault for test, so this follower(id=6) data is behind leader, leader need send SNAP to this follower every time.
            Reporter: Yongcheng Liu


the time begin to have problem is 20:59:47

(1) out of memory log:

2016-03-24 23:01:21,355 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:35112:LearnerHandler@330] - Follower sid: 6 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@5cf112b0
2016-03-24 23:01:21,355 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:35112:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 23:01:21,355 [myid:4] - WARN  [LearnerHandler-/192.168.50.26:35112:LearnerHandler@446] - Unhandled proposal scenario
2016-03-24 23:01:21,355 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:35112:LearnerHandler@462] - Sending SNAP
2016-03-24 23:01:21,893 [myid:4] - INFO  [NIOServerCxn.Factory:/192.168.50.24:10540:NIOServerCnxnFactory@207] - Current connection (from /192.168.50.22 Cnxns = 4; totalCnxns = 15)
2016-03-24 23:01:22,625 [myid:4] - WARN  [NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@832] - Connection request from old client /192.168.50.22:49695; will be dropped if server is in r-o mode
2016-03-24 23:01:23,283 [myid:4] - INFO  [QuorumPeer[myid=4]/192.168.50.24:10540:Leader@493] - Shutting down
2016-03-24 23:01:24,102 [myid:4] - INFO  [QuorumPeer[myid=4]/192.168.50.24:10540:Leader@499] - Shutdown called
2016-03-24 23:01:24,040 [myid:4] - INFO  [SessionTracker:ZooKeeperServer@347] - Expiring session 0x453a6dc5b7a007e, timeout of 3500ms exceeded

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "NIOServerCxn.Factory:/192.168.50.24:10540"
2016-03-24 23:01:25,001 [myid:4] - WARN  [QuorumPeer[myid=4]/192.168.50.24:10540:QuorumPeer@827] - QuorumPeer main thread exited

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "LearnerHandler-/192.168.50.26:35112"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "QuorumPeer[myid=4]/192.168.50.24:10540"
2016-03-24 23:01:24,227 [myid:4] - ERROR [LearnerHandler-/192.168.50.26:35112:NIOServerCnxnFactory$1@44] - Thread LearnerHandler Socket[addr=/192.168.50.26,port=35112,localport=10550] tickOfNextAckDeadline:38310 synced?:true queuedPacketLength:7910 died
2016-03-24 23:01:28,492 [myid:4] - INFO  [main:QuorumPeerMain@93] - Exiting normally



(2) this leader is very strange, db lastzxid not update, log as follow (after grep "Synchronizing with"), we can see max commit zxid from leader is not update any more. From the beginning of 20:59:47, leader lastZxid not update.

2016-03-24 20:56:10,266 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:48439:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x90000333f minCommittedLog=0x90000314b peerLastZxid=0x90000280a
2016-03-24 20:57:59,203 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:46956:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x900003398 minCommittedLog=0x9000031a4 peerLastZxid=0x90000280a
2016-03-24 20:59:47,928 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:20601:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:01:26,475 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:29622:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:03:16,552 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:35717:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:03:44,427 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:48197:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:05:01,125 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:57826:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:06:54,187 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:30137:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:07:42,780 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:24255:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:08:41,279 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:40909:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:10:23,137 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:64166:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:11:50,003 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:56070:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:12:11,956 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:41423:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:13:08,286 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:26757:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:13:59,960 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:62785:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:15:41,103 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:53141:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:16:11,125 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:39551:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:17:25,541 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:24638:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:18:25,039 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:54723:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
2016-03-24 21:19:04,148 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:37450:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a


(3) we can see leader Snapshotting to the same file in different time, but receive different TxnZxid(0x900003b70 and 0x9000049dd), this show leader has not been updated lastZxid in db.

Snapshotting 1:
2016-03-24 21:33:27,214 [myid:4] - INFO  [Snapshot Thread:FileTxnSnapLog@253] - Snapshotting: 0x9000033a1 to /opt/dsware/agent/zk/data/version-2/snapshot.9000033a1
2016-03-24 21:33:27,333 [myid:4] - INFO  [SyncThread:4:FileTxnLog@199] - Creating new log file: log.900003b70

Snapshotting 2:
2016-03-24 22:41:26,601 [myid:4] - INFO  [Snapshot Thread:FileTxnSnapLog@253] - Snapshotting: 0x9000033a1 to /opt/dsware/agent/zk/data/version-2/snapshot.9000033a1
2016-03-24 22:41:26,662 [myid:4] - INFO  [SyncThread:4:FileTxnLog@199] - Creating new log file: log.9000049dd


(4) finally, this node(leader server) zxid is behind zk c client, log as follow:

2016-03-24 23:00:53,712 [myid:4] - WARN  [NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@832] - Connection request from old client /192.168.50.23:35043; will be dropped if server is in r-o mode
2016-03-24 23:00:53,713 [myid:4] - INFO  [NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@851] - Refusing session request for client /192.168.50.23:35043 as it has seen zxid 0x900004e1f our last zxid is 0x9000033a1 client must try another server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)