You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Rakesh Kumar Singh (JIRA)" <ji...@apache.org> on 2016/09/06 10:01:20 UTC

[jira] [Updated] (ZOOKEEPER-2555) zookeeper started in observer mode takes long time to come to observer state once leader restarted and so client connected to observer mode has to wait for longer time to get service

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh Kumar Singh updated ZOOKEEPER-2555:
------------------------------------------
    Description: 
zookeeper started in observer mode takes long time (some times 25 seconds) to come to observer state once leader restarted and so client connected to observer mode has to wait for longer time to get service

Steps to reproduce:-
1. Start zookeeper in cluster mode in which one node is in observer mode
2. stop the leader node (some times we need to wait for 30 secs to reproduce this issue)
3. Start the leader node
4. Check the observer node status -
It will be in 
"Error contacting service. It is probably not running."

and takes long time (25 secs) to come to observer mode. And hence client connected to this node will not get service during this time.
Log at observer node is as below:-

2016-09-06 17:49:14,774 [myid:2] - WARN  [WorkerSender[myid=2]:QuorumCnxManager@459] - Cannot open channel to 3 at election address /10.18.221.194:3888
java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
	at java.net.Socket.connect(Socket.java:579)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:444)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:485)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:421)
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:486)
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:465)
	at java.lang.Thread.run(Thread.java:722)
{color:red}2016-09-06 17:49:14,776 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), LOOKING (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
2016-09-06 17:49:40,377 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FastLeaderElection@928] - Notification time out: 51200 {color}
2016-09-06 17:49:40,378 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@278] - Have smaller server identifier, so dropping the connection: (3, 2)
2016-09-06 17:49:40,379 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x2c (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
2016-09-06 17:49:40,381 [myid:2] - INFO  [/10.18.219.50:3888:QuorumCnxManager$Listener@637] - Received connection request /10.18.221.194:34085
2016-09-06 17:49:40,388 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), LEADING (n.state), 3 (n.sid), 0x2c (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
2016-09-06 17:49:40,388 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id2,name1=replica.2,name2=LeaderElection]
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumPeer@1049] - OBSERVING
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@858] - minSessionTimeout set to 4000
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@867] - maxSessionTimeout set to 40000
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@156] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2 snapdir /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ObserverZooKeeperServer@56] - syncEnabled =true
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Observer@72] - Observing /10.18.221.194:2888
2016-09-06 17:49:40,396 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FileSnap@83] - Reading snapshot /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2/snapshot.2a00000001
2016-09-06 17:49:40,410 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@369] - Getting a snapshot from leader
2016-09-06 17:49:40,411 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@509] - Learner received NEWLEADER message
2016-09-06 17:49:40,411 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FileTxnSnapLog@298] - Snapshotting: 0x2c00000000 to /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2/snapshot.2c00000000
2016-09-06 17:49:40,417 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@493] - Learner received UPTODATE message
2016-09-06 17:49:40,417 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):CommitProcessor@254] - Configuring CommitProcessor with 8 worker threads.


  was:
zookeeper started in observer mode takes long time (some times 25 seconds) to come to observer state once leader restarted and so client connected to observer mode has to wait for longer time to get service

Steps to reproduce:-
1. Start zookeeper in cluster mode in which one node is in observer mode
2. stop the leader node (some times we need to wait for 30 secs to reproduce this issue)
3. Start the leader node
4. Check the observer node status -
It will be in 
"Error contacting service. It is probably not running."

and takes long time (25 secs) to come to observer mode. And hence client connected to this node will not get service during this time.
Log at observer node is as below:-

2016-09-06 17:49:14,774 [myid:2] - WARN  [WorkerSender[myid=2]:QuorumCnxManager@459] - Cannot open channel to 3 at election address /10.18.221.194:3888
java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
	at java.net.Socket.connect(Socket.java:579)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:444)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:485)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:421)
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:486)
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:465)
	at java.lang.Thread.run(Thread.java:722)
2016-09-06 17:49:14,776 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), LOOKING (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
2016-09-06 17:49:40,377 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FastLeaderElection@928] - Notification time out: 51200
2016-09-06 17:49:40,378 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@278] - Have smaller server identifier, so dropping the connection: (3, 2)
2016-09-06 17:49:40,379 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x2c (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
2016-09-06 17:49:40,381 [myid:2] - INFO  [/10.18.219.50:3888:QuorumCnxManager$Listener@637] - Received connection request /10.18.221.194:34085
2016-09-06 17:49:40,388 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), LEADING (n.state), 3 (n.sid), 0x2c (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
2016-09-06 17:49:40,388 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id2,name1=replica.2,name2=LeaderElection]
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumPeer@1049] - OBSERVING
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@858] - minSessionTimeout set to 4000
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@867] - maxSessionTimeout set to 40000
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@156] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2 snapdir /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ObserverZooKeeperServer@56] - syncEnabled =true
2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Observer@72] - Observing /10.18.221.194:2888
2016-09-06 17:49:40,396 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FileSnap@83] - Reading snapshot /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2/snapshot.2a00000001
2016-09-06 17:49:40,410 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@369] - Getting a snapshot from leader
2016-09-06 17:49:40,411 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@509] - Learner received NEWLEADER message
2016-09-06 17:49:40,411 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FileTxnSnapLog@298] - Snapshotting: 0x2c00000000 to /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2/snapshot.2c00000000
2016-09-06 17:49:40,417 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@493] - Learner received UPTODATE message
2016-09-06 17:49:40,417 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):CommitProcessor@254] - Configuring CommitProcessor with 8 worker threads.



> zookeeper started in observer mode takes long time to come to observer state once leader restarted and so client connected to observer mode has to wait for longer time to get service
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2555
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2555
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.1
>            Reporter: Rakesh Kumar Singh
>
> zookeeper started in observer mode takes long time (some times 25 seconds) to come to observer state once leader restarted and so client connected to observer mode has to wait for longer time to get service
> Steps to reproduce:-
> 1. Start zookeeper in cluster mode in which one node is in observer mode
> 2. stop the leader node (some times we need to wait for 30 secs to reproduce this issue)
> 3. Start the leader node
> 4. Check the observer node status -
> It will be in 
> "Error contacting service. It is probably not running."
> and takes long time (25 secs) to come to observer mode. And hence client connected to this node will not get service during this time.
> Log at observer node is as below:-
> 2016-09-06 17:49:14,774 [myid:2] - WARN  [WorkerSender[myid=2]:QuorumCnxManager@459] - Cannot open channel to 3 at election address /10.18.221.194:3888
> java.net.ConnectException: Connection refused
> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
> 	at java.net.Socket.connect(Socket.java:579)
> 	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:444)
> 	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:485)
> 	at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:421)
> 	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:486)
> 	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:465)
> 	at java.lang.Thread.run(Thread.java:722)
> {color:red}2016-09-06 17:49:14,776 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), LOOKING (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
> 2016-09-06 17:49:40,377 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FastLeaderElection@928] - Notification time out: 51200 {color}
> 2016-09-06 17:49:40,378 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@278] - Have smaller server identifier, so dropping the connection: (3, 2)
> 2016-09-06 17:49:40,379 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x2c (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
> 2016-09-06 17:49:40,381 [myid:2] - INFO  [/10.18.219.50:3888:QuorumCnxManager$Listener@637] - Received connection request /10.18.221.194:34085
> 2016-09-06 17:49:40,388 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), LEADING (n.state), 3 (n.sid), 0x2c (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
> 2016-09-06 17:49:40,388 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id2,name1=replica.2,name2=LeaderElection]
> 2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumPeer@1049] - OBSERVING
> 2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@858] - minSessionTimeout set to 4000
> 2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@867] - maxSessionTimeout set to 40000
> 2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@156] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2 snapdir /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2
> 2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ObserverZooKeeperServer@56] - syncEnabled =true
> 2016-09-06 17:49:40,389 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Observer@72] - Observing /10.18.221.194:2888
> 2016-09-06 17:49:40,396 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FileSnap@83] - Reading snapshot /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2/snapshot.2a00000001
> 2016-09-06 17:49:40,410 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@369] - Getting a snapshot from leader
> 2016-09-06 17:49:40,411 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@509] - Learner received NEWLEADER message
> 2016-09-06 17:49:40,411 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FileTxnSnapLog@298] - Snapshotting: 0x2c00000000 to /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2/snapshot.2c00000000
> 2016-09-06 17:49:40,417 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@493] - Learner received UPTODATE message
> 2016-09-06 17:49:40,417 [myid:2] - INFO  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):CommitProcessor@254] - Configuring CommitProcessor with 8 worker threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)