You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Hector Yuen <he...@nimblestorage.com> on 2009/10/02 00:59:23 UTC

problem starting ensemble mode

Hi all,

I am trying to start zookeeper in two nodes, the configuration file I have
is

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
server.1=hec-bp1:2888:3888
server.2=hec-bp2:2888:3888


i also have two files /var/zookeeper/myid  on each of the machines, the
files contain 1 and 2 on each of the servers


When I start, I get the following

Starting zookeeper ...
STARTED
hector@hec-bp2:/zookeeper$ 2009-10-01 15:48:15,786 - INFO
[main:QuorumPeerConfig@80] - Reading configuration from:
/zookeeper/bin/../conf/zoo.cfg
2009-10-01 15:48:15,882 - INFO  [main:QuorumPeerConfig@232] - Defaulting to
majority quorums
2009-10-01 15:48:15,899 - INFO  [main:QuorumPeerMain@118] - Starting quorum
peer
2009-10-01 15:48:15,943 - INFO  [Thread-1:QuorumCnxManager$Listener@409] -
My election bind port: 3888
2009-10-01 15:48:15,961 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@487] - LOOKING
2009-10-01 15:48:15,963 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@579] - New election: -1
2009-10-01 15:48:15,978 - WARN  [WorkerSender Thread:QuorumCnxManager@336] -
Cannot open channel to 1 at election address
hec-bp1.admin.nimblestorage.com/10.12.6.192:3888
java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.Net.connect(Native Method)
        at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
        at java.nio.channels.SocketChannel.open(Unknown Source)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
        at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323)
        at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296)
        at java.lang.Thread.run(Unknown Source)
2009-10-01 15:48:15,981 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@618] - Notification: 2,
-1, 1, 2, LOOKING, LOOKING, 2
2009-10-01 15:48:15,981 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@642] - Adding vote
2009-10-01 15:48:16,184 - WARN
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@336] - Cannot open
channel to 1 at election address
hec-bp1.admin.nimblestorage.com/10.12.6.192:3888


I can expect these kind of messages when the other server hasn't been
started, but even after a while keeps sending these messages.

I can ping and ssh between the machines.
I noticed that just port 3888 is listening when I do netstat -an, why is
port 2888 not being used?

Any ideas?

Thanks
-h

Re: problem starting ensemble mode

Posted by Patrick Hunt <ph...@apache.org>.
Hi Hector, looks like a connectivity issue to me: NoRouteToHostException.

3888 is the election port
2888 is the quorum port

basically, the ensemble uses the election port for leader election. Once 
a leader is elected it then uses the quorum port for subsequent 
communication.

Could it be a firewall issue? Your configs/logs look ok to me otw.

Try using something like telnet to verify connectivity on the 3888 & 
2888 ports between the two servers.

Patrick

Hector Yuen wrote:
> Hi all,
> 
> I am trying to start zookeeper in two nodes, the configuration file I have
> is
> 
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=/var/zookeeper
> clientPort=2181
> server.1=hec-bp1:2888:3888
> server.2=hec-bp2:2888:3888
> 
> 
> i also have two files /var/zookeeper/myid  on each of the machines, the
> files contain 1 and 2 on each of the servers
> 
> 
> When I start, I get the following
> 
> Starting zookeeper ...
> STARTED
> hector@hec-bp2:/zookeeper$ 2009-10-01 15:48:15,786 - INFO
> [main:QuorumPeerConfig@80] - Reading configuration from:
> /zookeeper/bin/../conf/zoo.cfg
> 2009-10-01 15:48:15,882 - INFO  [main:QuorumPeerConfig@232] - Defaulting to
> majority quorums
> 2009-10-01 15:48:15,899 - INFO  [main:QuorumPeerMain@118] - Starting quorum
> peer
> 2009-10-01 15:48:15,943 - INFO  [Thread-1:QuorumCnxManager$Listener@409] -
> My election bind port: 3888
> 2009-10-01 15:48:15,961 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@487] - LOOKING
> 2009-10-01 15:48:15,963 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@579] - New election: -1
> 2009-10-01 15:48:15,978 - WARN  [WorkerSender Thread:QuorumCnxManager@336] -
> Cannot open channel to 1 at election address
> hec-bp1.admin.nimblestorage.com/10.12.6.192:3888
> java.net.NoRouteToHostException: No route to host
>         at sun.nio.ch.Net.connect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
>         at java.nio.channels.SocketChannel.open(Unknown Source)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
>         at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323)
>         at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296)
>         at java.lang.Thread.run(Unknown Source)
> 2009-10-01 15:48:15,981 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@618] - Notification: 2,
> -1, 1, 2, LOOKING, LOOKING, 2
> 2009-10-01 15:48:15,981 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@642] - Adding vote
> 2009-10-01 15:48:16,184 - WARN
> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@336] - Cannot open
> channel to 1 at election address
> hec-bp1.admin.nimblestorage.com/10.12.6.192:3888
> 
> 
> I can expect these kind of messages when the other server hasn't been
> started, but even after a while keeps sending these messages.
> 
> I can ping and ssh between the machines.
> I noticed that just port 3888 is listening when I do netstat -an, why is
> port 2888 not being used?
> 
> Any ideas?
> 
> Thanks
> -h
>