You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by "Bae, Jae Hyeon" <me...@gmail.com> on 2013/11/06 09:51:03 UTC

New server cannot join quorum

Hi Zookeeper users

With the same zoo.cfg, new server with empty zk data directory cannot join
quorum with the same IP, same version of zk and the port. I didn't see any
significant error messages but the following lines repeated:

2013-11-05 17:42:08,287 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
:QuorumPeer@670] - LOOKING
2013-11-05 17:42:08,290 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
:FastLeaderElection@740] - New election. My id =  1, proposed zxid=0x0
2013-11-05 17:42:08,293 - INFO
 [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 1
(n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0
(n.peerEPoch), LOOKING (my state)
2013-11-05 17:42:08,301 - INFO  [WorkerSender[myid=1]:QuorumCnxManager@190]
- Have smaller server identifier, so dropping the connection: (2, 1)
2013-11-05 17:42:08,304 - INFO  [WorkerSender[myid=1]:QuorumCnxManager@190]
- Have smaller server identifier, so dropping the connection: (3, 1)
2013-11-05 17:42:08,308 - INFO  [WorkerSender[myid=1]:QuorumCnxManager@190]
- Have smaller server identifier, so dropping the connection: (4, 1)
2013-11-05 17:42:08,311 - INFO  [WorkerSender[myid=1]:QuorumCnxManager@190]
- Have smaller server identifier, so dropping the connection: (5, 1)
2013-11-05 17:42:08,511 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
:QuorumCnxManager@190] - Have smaller server identifier, so dropping the
connection: (5, 1)
2013-11-05 17:42:08,515 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
:QuorumCnxManager@190] - Have smaller server identifier, so dropping the
connection: (2, 1)
2013-11-05 17:42:08,518 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
:QuorumCnxManager@190] - Have smaller server identifier, so dropping the
connection: (3, 1)
2013-11-05 17:42:08,522 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
:QuorumCnxManager@190] - Have smaller server identifier, so dropping the
connection: (4, 1)
2013-11-05 17:42:08,523 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
:FastLeaderElection@774] - Notification time out: 400

Do you have any idea what I am doing wrong here? I asked the same question
yesterday and I got response the new server should start normally, sync and
join quorum successfully.

Thank you
Best, Jae

Re: New server cannot join quorum

Posted by German Blanco <ge...@gmail.com>.
Hello again,

I don't think it is a good a idea to start a new thread with the same issue.

could this be a DNS resolution caching problem?
See https://issues.apache.org/jira/browse/ZOOKEEPER-1506

The new server has the lowest sid. It is able to connect to all other
servers, but the rest of the servers don't seem able to connect to it.
Connections from this server to the rest are useless, since they are
dropped because of the sid comparison that you see in the log.

You could try to change the server address in the configuration for the AWS
public IP address of the peers, just to test if that works ok. Or try
replacing the server with the highest sid, that should also work. Otherwise
(assuming the problem is DNS resolution), the only current workaround that
I can think of is the rolling restart, as you have noticed.



On Wed, Nov 6, 2013 at 9:51 AM, Bae, Jae Hyeon <me...@gmail.com> wrote:

> Hi Zookeeper users
>
> With the same zoo.cfg, new server with empty zk data directory cannot join
> quorum with the same IP, same version of zk and the port. I didn't see any
> significant error messages but the following lines repeated:
>
> 2013-11-05 17:42:08,287 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
> :QuorumPeer@670] - LOOKING
> 2013-11-05 17:42:08,290 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
> :FastLeaderElection@740] - New election. My id =  1, proposed zxid=0x0
> 2013-11-05 17:42:08,293 - INFO
>  [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 1
> (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0
> (n.peerEPoch), LOOKING (my state)
> 2013-11-05 17:42:08,301 - INFO  [WorkerSender[myid=1]:QuorumCnxManager@190
> ]
> - Have smaller server identifier, so dropping the connection: (2, 1)
> 2013-11-05 17:42:08,304 - INFO  [WorkerSender[myid=1]:QuorumCnxManager@190
> ]
> - Have smaller server identifier, so dropping the connection: (3, 1)
> 2013-11-05 17:42:08,308 - INFO  [WorkerSender[myid=1]:QuorumCnxManager@190
> ]
> - Have smaller server identifier, so dropping the connection: (4, 1)
> 2013-11-05 17:42:08,311 - INFO  [WorkerSender[myid=1]:QuorumCnxManager@190
> ]
> - Have smaller server identifier, so dropping the connection: (5, 1)
> 2013-11-05 17:42:08,511 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
> :QuorumCnxManager@190] - Have smaller server identifier, so dropping the
> connection: (5, 1)
> 2013-11-05 17:42:08,515 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
> :QuorumCnxManager@190] - Have smaller server identifier, so dropping the
> connection: (2, 1)
> 2013-11-05 17:42:08,518 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
> :QuorumCnxManager@190] - Have smaller server identifier, so dropping the
> connection: (3, 1)
> 2013-11-05 17:42:08,522 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
> :QuorumCnxManager@190] - Have smaller server identifier, so dropping the
> connection: (4, 1)
> 2013-11-05 17:42:08,523 - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181
> :FastLeaderElection@774] - Notification time out: 400
>
> Do you have any idea what I am doing wrong here? I asked the same question
> yesterday and I got response the new server should start normally, sync and
> join quorum successfully.
>
> Thank you
> Best, Jae
>