You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Flavio Junqueira (JIRA)" <ji...@apache.org> on 2016/04/16 15:32:25 UTC

[jira] [Commented] (ZOOKEEPER-2412) leader zk out of memory, and leader db lastZxid is not update when process set data.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244222#comment-15244222 ] 

Flavio Junqueira commented on ZOOKEEPER-2412:
---------------------------------------------

[~Yongcheng] Thanks for reporting this issue. Can you help me understand a bit better what's going on here?

>From the logs you posted, there is a leader and it throws an OOO exception and exits. Another leader is elected and this leader starts behaving erratically, e.g., not incrementing the last zxid. Is that what is going on?

Note that the value of peerLastZxid also does not seem to change, which is odd. 


> leader zk out of memory,  and leader db lastZxid is not update when process set data. 
> --------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2412
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.6
>         Environment: 3 zk, the follower (myid=6) server is injected into the network delay fault for test, so this follower(id=6) data is behind leader, leader need send SNAP to this follower every time.
>            Reporter: Yongcheng Liu
>
> the time begin to have problem is 20:59:47
> (1) out of memory log:
> 2016-03-24 23:01:21,355 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:35112:LearnerHandler@330] - Follower sid: 6 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@5cf112b0
> 2016-03-24 23:01:21,355 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:35112:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 23:01:21,355 [myid:4] - WARN  [LearnerHandler-/192.168.50.26:35112:LearnerHandler@446] - Unhandled proposal scenario
> 2016-03-24 23:01:21,355 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:35112:LearnerHandler@462] - Sending SNAP
> 2016-03-24 23:01:21,893 [myid:4] - INFO  [NIOServerCxn.Factory:/192.168.50.24:10540:NIOServerCnxnFactory@207] - Current connection (from /192.168.50.22 Cnxns = 4; totalCnxns = 15)
> 2016-03-24 23:01:22,625 [myid:4] - WARN  [NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@832] - Connection request from old client /192.168.50.22:49695; will be dropped if server is in r-o mode
> 2016-03-24 23:01:23,283 [myid:4] - INFO  [QuorumPeer[myid=4]/192.168.50.24:10540:Leader@493] - Shutting down
> 2016-03-24 23:01:24,102 [myid:4] - INFO  [QuorumPeer[myid=4]/192.168.50.24:10540:Leader@499] - Shutdown called
> 2016-03-24 23:01:24,040 [myid:4] - INFO  [SessionTracker:ZooKeeperServer@347] - Expiring session 0x453a6dc5b7a007e, timeout of 3500ms exceeded
> Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "NIOServerCxn.Factory:/192.168.50.24:10540"
> 2016-03-24 23:01:25,001 [myid:4] - WARN  [QuorumPeer[myid=4]/192.168.50.24:10540:QuorumPeer@827] - QuorumPeer main thread exited
> Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "LearnerHandler-/192.168.50.26:35112"
> Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "QuorumPeer[myid=4]/192.168.50.24:10540"
> 2016-03-24 23:01:24,227 [myid:4] - ERROR [LearnerHandler-/192.168.50.26:35112:NIOServerCnxnFactory$1@44] - Thread LearnerHandler Socket[addr=/192.168.50.26,port=35112,localport=10550] tickOfNextAckDeadline:38310 synced?:true queuedPacketLength:7910 died
> 2016-03-24 23:01:28,492 [myid:4] - INFO  [main:QuorumPeerMain@93] - Exiting normally
> (2) this leader is very strange, db lastzxid not update, log as follow (after grep "Synchronizing with"), we can see max commit zxid from leader is not update any more. From the beginning of 20:59:47, leader lastZxid not update.
> 2016-03-24 20:56:10,266 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:48439:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x90000333f minCommittedLog=0x90000314b peerLastZxid=0x90000280a
> 2016-03-24 20:57:59,203 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:46956:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x900003398 minCommittedLog=0x9000031a4 peerLastZxid=0x90000280a
> 2016-03-24 20:59:47,928 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:20601:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:01:26,475 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:29622:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:03:16,552 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:35717:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:03:44,427 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:48197:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:05:01,125 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:57826:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:06:54,187 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:30137:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:07:42,780 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:24255:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:08:41,279 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:40909:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:10:23,137 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:64166:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:11:50,003 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:56070:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:12:11,956 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:41423:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:13:08,286 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:26757:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:13:59,960 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:62785:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:15:41,103 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:53141:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:16:11,125 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:39551:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:17:25,541 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:24638:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:18:25,039 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:54723:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> 2016-03-24 21:19:04,148 [myid:4] - INFO  [LearnerHandler-/192.168.50.26:37450:LearnerHandler@385] - Synchronizing with Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad peerLastZxid=0x90000280a
> (3) we can see leader Snapshotting to the same file in different time, but receive different TxnZxid(0x900003b70 and 0x9000049dd), this show leader has not been updated lastZxid in db.
> Snapshotting 1:
> 2016-03-24 21:33:27,214 [myid:4] - INFO  [Snapshot Thread:FileTxnSnapLog@253] - Snapshotting: 0x9000033a1 to /opt/dsware/agent/zk/data/version-2/snapshot.9000033a1
> 2016-03-24 21:33:27,333 [myid:4] - INFO  [SyncThread:4:FileTxnLog@199] - Creating new log file: log.900003b70
> Snapshotting 2:
> 2016-03-24 22:41:26,601 [myid:4] - INFO  [Snapshot Thread:FileTxnSnapLog@253] - Snapshotting: 0x9000033a1 to /opt/dsware/agent/zk/data/version-2/snapshot.9000033a1
> 2016-03-24 22:41:26,662 [myid:4] - INFO  [SyncThread:4:FileTxnLog@199] - Creating new log file: log.9000049dd
> (4) finally, this node(leader server) zxid is behind zk c client, log as follow:
> 2016-03-24 23:00:53,712 [myid:4] - WARN  [NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@832] - Connection request from old client /192.168.50.23:35043; will be dropped if server is in r-o mode
> 2016-03-24 23:00:53,713 [myid:4] - INFO  [NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@851] - Refusing session request for client /192.168.50.23:35043 as it has seen zxid 0x900004e1f our last zxid is 0x9000033a1 client must try another server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)