You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Kathleen Ting <ka...@apache.org> on 2013/04/04 07:39:04 UTC

Re: Query regarding zookeeper

Hi Subramanyeswara,

It can take quite a bit of time for the leader to communicate the
znode data to the followers during startup. initLimit is the amount of
time the leader will allow the followers to download the data from the
leader and get to a point where they are ready to serve requests to
clients. If this time is exceeded, the leader will close the
connection to the follower and the quorum process will restart. Your
current initLimit value of 20 sec is typically more than enough time.
That said, if you have a slow connection or the data is very large it
might be necessary to increase this.

The syncLimit controls the amount of time the quorum members will
allow each other to communicate. If the leader doesn't hear from a
follower in the syncLimit time, it will drop the follower from the
quorum, and vice versa. It's important not to set this parameter too
high as it's one of the ways a server detects networking issues and
causes recovery to take place (i.e. the follower will drop out of the
quorum and try to reconnect, all clients on that follower will
disconnect and reconnect to another server, etc...).

Can you try increasing both initLimit and syncLimit to 100 on all ZK
servers and restarting? Note that it's not good to have syncLimit be
such a high value - fixed by ZOOKEEPER-1521.

Regards,
Kate



On Sat, Mar 30, 2013 at 4:12 PM, Subramanyeswara Bhavirisetty
<bs...@yahoo-inc.com> wrote:
>
> Hi,
>
> I have 5 box zookeeper quorum. 2 / 5 are unable to deserialize the message from leader and hence they are not coming up. I tried restarting few times but still seeing the same issue. This issue is observed when we were trying to do the following:
>
>
> 1>   We have a script which brings down the zookeeper server on the leader. We are trying to test if new follower takes over once leader is down.
>
> 2>   The script has been running since 2-3 days. Surprisingly we don't see any issue for the first 2 days but after that we hit the below issue wherein 2 out 5 zookeeper machines are unable to deserialize the message.
>
> zookeeper server = 3.4.5
>
> Following are the zookeeper settings:
>
> tickTime=4000
> initLimit=20
> syncLimit=15
> snapCount=100000
>
> Logs from Follower
>
> ---------------
>
> Thu Mar 28 19:14:14 2013: 2013-03-28 19:14:14,158 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 0 (n.zxid), 3
> (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,811 - WARN
> [QuorumPeer:/0.0.0.0:2181:Follower@82] - Exception when following the leader
> Thu Mar 28 19:14:41 2013: java.io.EOFException
> Thu Mar 28 19:14:41 2013:       at
> java.io.DataInputStream.readInt(DataInputStream.java:392)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,811 - INFO
> [QuorumPeer:/0.0.0.0:2181:Follower@165] - shutdown called
> Thu Mar 28 19:14:41 2013: java.lang.Exception: shutdown Follower
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
> Thu Mar 28 19:14:41 2013:       at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,812 - INFO
> [QuorumPeer:/0.0.0.0:2181:QuorumPeer@621] - LOOKING
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,812 - INFO
> [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id =  1,
> Proposed zxid = 0
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,813 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 0 (n.zxid), 3
> (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,814 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 85899346051
> (n.zxid), 2 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,814 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 85899346051
> (n.zxid), 2 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,814 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 85899346051
> (n.zxid), 2 (n.round), LEADING (n.state), 4 (n.sid), LOOKING (my state)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,814 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 85899346051
> (n.zxid), 2 (n.round), FOLLOWING (n.state), 5 (n.sid), FOLLOWING (my state)
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,814 - INFO
> [QuorumPeer:/0.0.0.0:2181:QuorumPeer@643] - FOLLOWING
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,815 - INFO
> [QuorumPeer:/0.0.0.0:2181:ZooKeeperServer@151] - Created server with tickTime
> 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir
> /home/y/var/zookeeper/version-2 snapdir /home/y/var/zookeeper/version-2
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,817 - INFO
> [QuorumPeer:/0.0.0.0:2181:Learner@294] - Getting a snapshot from leader
> Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,819 - INFO
> [QuorumPeer:/0.0.0.0:2181:Learner@325] - Setting leader epoch 15
> Thu Mar 28 19:14:54 2013: 2013-03-28 19:14:54,208 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 0 (n.zxid), 3
> (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
>
>
> Logs from Leader
>
> ----------------------------------------------------
>
> Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,416 - ERROR
> [LearnerHandler-/98.139.207.73:37505:LearnerHandler@562] - Unexpected exception
> causing shutdown while sock still open
> Thu Mar 28 19:18:14 2013: java.net.SocketTimeoutException: Read timed out
> Thu Mar 28 19:18:14 2013:       at
> java.net.SocketInputStream.socketRead0(Native Method)
> Thu Mar 28 19:18:14 2013:       at
> java.net.SocketInputStream.read(SocketInputStream.java:150)
> Thu Mar 28 19:18:14 2013:       at
> java.net.SocketInputStream.read(SocketInputStream.java:121)
> Thu Mar 28 19:18:14 2013:       at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> Thu Mar 28 19:18:14 2013:       at
> java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> Thu Mar 28 19:18:14 2013:       at
> java.io.DataInputStream.readInt(DataInputStream.java:387)
> Thu Mar 28 19:18:14 2013:       at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> Thu Mar 28 19:18:14 2013:       at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> Thu Mar 28 19:18:14 2013:       at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> Thu Mar 28 19:18:14 2013:       at
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450)
> Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,417 - WARN
> [LearnerHandler-/98.139.207.73:37505:LearnerHandler@575] - ******* GOODBYE
> /98.139.207.73:37505 ********
> Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,420 - INFO
> [WorkerReceiver[myid=4]:FastLeaderElection$Messenger$WorkerReceiver@273] -
> Backward compatibility mode, server id=2
> Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,420 - INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader),
> 0x0 (n.zxid), 0x3 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch),
> LEADING (my state)
> Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,423 - INFO
> [LearnerHandler-/98.139.207.73:37511:LearnerHandler@263] - Follower sid: 2 :
> info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@265d646b
> Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,424 - INFO
> [LearnerHandler-/98.139.207.73:37511:LearnerHandler@318] - Synchronizing with
> Follower sid: 2 maxCommittedLog=0x15000000d8 minCommittedLog=0x1400000001
> peerLastZxid=0x0
> Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,424 - WARN
> [LearnerHandler-/98.139.207.73:37511:LearnerHandler@379] - Unhandled proposal
> scenario
> Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,424 - INFO
> [LearnerHandler-/98.139.207.73:37511:LearnerHandler@395] - Sending SNAP
> Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,424 - INFO
> [LearnerHandler-/98.139.207.73:37511:LearnerHandler@419] - Sending snapshot
> last zxid of peer is 0x0  zxid of leader is 0x15000000d8sent zxid of db as
> 0x15000000d8
>
> Can you please help how we can fix this issue?