You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by ravisinha0506 <ro...@gmail.com> on 2017/04/18 20:24:37 UTC

zookeeper node fails to communicate with Leader node

I have a zookeeper cluster which includes 3 nodes. Zookeeper config is
mentioned below. While restarting it shows a success message but it shows
status as failure.

zoo.cfg

dataDir=/ngs/app/<app>/zookeeper-3.4.6/zookeeperdata/1
clientPort=2181
initLimit=5
syncLimit=2
server.1=pr2-ligerp-lapp27.<domain.com>:2888:3888
server.2=pr2-ligerp-lapp28.<domain.com>:2889:3889
server.3=pr2-ligerp-lapp29.<domain.com>:2890:3890

sh zkServer.sh start

JMX enabled by default
Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
-bash-4.1$ 
-bash-4.1$ cat zookeeper.out 
2017-04-18 18:58:13,840 [myid:] - INFO  [main:QuorumPeerConfig@103] -
Reading configuration from:
/ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
2017-04-18 18:58:13,843 [myid:] - INFO  [main:QuorumPeerConfig@340] -
Defaulting to majority quorums
2017-04-18 18:58:13,845 [myid:1] - INFO  [main:DatadirCleanupManager@78] -
autopurge.snapRetainCount set to 3
2017-04-18 18:58:13,845 [myid:1] - INFO  [main:DatadirCleanupManager@79] -
autopurge.purgeInterval set to 0
2017-04-18 18:58:13,846 [myid:1] - INFO  [main:DatadirCleanupManager@101] -
Purge task is not scheduled.
2017-04-18 18:58:13,854 [myid:1] - INFO  [main:QuorumPeerMain@127] -
Starting quorum peer
2017-04-18 18:58:13,861 [myid:1] - INFO  [main:NIOServerCnxnFactory@94] -
binding to port 0.0.0.0/0.0.0.0:2181
2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@959] - tickTime
set to 3000
2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@979] -
minSessionTimeout set to -1
2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@990] -
maxSessionTimeout set to -1
2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@1005] - initLimit
set to 5
2017-04-18 18:58:13,884 [myid:1] - INFO  [main:FileSnap@83] - Reading
snapshot
/ngs/app/ligerp/solr/zookeeper-3.4.6/zookeeperdata/1/version-2/snapshot.1300000032
2017-04-18 18:58:13,954 [myid:1] - INFO 
[Thread-1:QuorumCnxManager$Listener@504] - My election bind port:
pr2-ligerp-lapp27.<domain>/10.136.145.38:3888
2017-04-18 18:58:13,960 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
2017-04-18 18:58:13,961 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New
election. My id =  1, proposed zxid=0x130000024b
2017-04-18 18:58:13,962 [myid:1] - INFO 
[WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
format version), 1 (n.leader), 0x130000024b (n.zxid), 0x1 (n.round), LOOKING
(n.state), 1 (n.sid), 0x13 (n.peerEpoch) LOOKING (my state)
2017-04-18 18:58:13,964 [myid:1] - INFO 
[WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server
identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:13,964 [myid:1] - INFO 
[WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server
identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:14,165 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:14,166 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:14,166 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 400
2017-04-18 18:58:15,566 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:15,567 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:15,567 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 800
2017-04-18 18:58:16,368 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:16,368 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:16,368 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 1600
2017-04-18 18:58:17,969 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:17,969 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:17,970 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 3200


sh zkServer.sh status
JMX enabled by default
Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.






--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/zookeeper-node-fails-to-communicate-with-Leader-node-tp7583047.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: zookeeper node fails to communicate with Leader node

Posted by Michael Han <ha...@cloudera.com>.
The script should be simple enough to debug. Maybe try executing the
command yourself and see what happens? Could it be that JAVA_HOME was not
set correctly?


On Tue, Apr 18, 2017 at 1:24 PM, ravisinha0506 <ro...@gmail.com>
wrote:

> I have a zookeeper cluster which includes 3 nodes. Zookeeper config is
> mentioned below. While restarting it shows a success message but it shows
> status as failure.
>
> zoo.cfg
>
> dataDir=/ngs/app/<app>/zookeeper-3.4.6/zookeeperdata/1
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.1=pr2-ligerp-lapp27.<domain.com>:2888:3888
> server.2=pr2-ligerp-lapp28.<domain.com>:2889:3889
> server.3=pr2-ligerp-lapp29.<domain.com>:2890:3890
>
> sh zkServer.sh start
>
> JMX enabled by default
> Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
> Starting zookeeper ... STARTED
> -bash-4.1$
> -bash-4.1$ cat zookeeper.out
> 2017-04-18 18:58:13,840 [myid:] - INFO  [main:QuorumPeerConfig@103] -
> Reading configuration from:
> /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
> 2017-04-18 18:58:13,843 [myid:] - INFO  [main:QuorumPeerConfig@340] -
> Defaulting to majority quorums
> 2017-04-18 18:58:13,845 [myid:1] - INFO  [main:DatadirCleanupManager@78] -
> autopurge.snapRetainCount set to 3
> 2017-04-18 18:58:13,845 [myid:1] - INFO  [main:DatadirCleanupManager@79] -
> autopurge.purgeInterval set to 0
> 2017-04-18 18:58:13,846 [myid:1] - INFO  [main:DatadirCleanupManager@101]
> -
> Purge task is not scheduled.
> 2017-04-18 18:58:13,854 [myid:1] - INFO  [main:QuorumPeerMain@127] -
> Starting quorum peer
> 2017-04-18 18:58:13,861 [myid:1] - INFO  [main:NIOServerCnxnFactory@94] -
> binding to port 0.0.0.0/0.0.0.0:2181
> 2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@959] - tickTime
> set to 3000
> 2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@979] -
> minSessionTimeout set to -1
> 2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@990] -
> maxSessionTimeout set to -1
> 2017-04-18 18:58:13,875 [myid:1] - INFO  [main:QuorumPeer@1005] -
> initLimit
> set to 5
> 2017-04-18 18:58:13,884 [myid:1] - INFO  [main:FileSnap@83] - Reading
> snapshot
> /ngs/app/ligerp/solr/zookeeper-3.4.6/zookeeperdata/1/version-2/snapshot.
> 1300000032
> 2017-04-18 18:58:13,954 [myid:1] - INFO
> [Thread-1:QuorumCnxManager$Listener@504] - My election bind port:
> pr2-ligerp-lapp27.<domain>/10.136.145.38:3888
> 2017-04-18 18:58:13,960 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
> 2017-04-18 18:58:13,961 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New
> election. My id =  1, proposed zxid=0x130000024b
> 2017-04-18 18:58:13,962 [myid:1] - INFO
> [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message
> format version), 1 (n.leader), 0x130000024b (n.zxid), 0x1 (n.round),
> LOOKING
> (n.state), 1 (n.sid), 0x13 (n.peerEpoch) LOOKING (my state)
> 2017-04-18 18:58:13,964 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server
> identifier, so dropping the connection: (2, 1)
> 2017-04-18 18:58:13,964 [myid:1] - INFO
> [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server
> identifier, so dropping the connection: (3, 1)
> 2017-04-18 18:58:14,165 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
> smaller server identifier, so dropping the connection: (2, 1)
> 2017-04-18 18:58:14,166 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
> smaller server identifier, so dropping the connection: (3, 1)
> 2017-04-18 18:58:14,166 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
> Notification time out: 400
> 2017-04-18 18:58:15,566 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
> smaller server identifier, so dropping the connection: (2, 1)
> 2017-04-18 18:58:15,567 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
> smaller server identifier, so dropping the connection: (3, 1)
> 2017-04-18 18:58:15,567 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
> Notification time out: 800
> 2017-04-18 18:58:16,368 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
> smaller server identifier, so dropping the connection: (2, 1)
> 2017-04-18 18:58:16,368 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
> smaller server identifier, so dropping the connection: (3, 1)
> 2017-04-18 18:58:16,368 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
> Notification time out: 1600
> 2017-04-18 18:58:17,969 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
> smaller server identifier, so dropping the connection: (2, 1)
> 2017-04-18 18:58:17,969 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have
> smaller server identifier, so dropping the connection: (3, 1)
> 2017-04-18 18:58:17,970 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
> Notification time out: 3200
>
>
> sh zkServer.sh status
> JMX enabled by default
> Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
> Error contacting service. It is probably not running.
>
>
>
>
>
>
> --
> View this message in context: http://zookeeper-user.578899.
> n2.nabble.com/zookeeper-node-fails-to-communicate-with-
> Leader-node-tp7583047.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.
>



-- 
Cheers
Michael.