You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by Vishal Kher <vi...@gmail.com> on 2011/01/12 08:34:41 UTC

peer goes in LEADING state even if ensemble is online

Hi,

Scenario:
1. 2 of the 3 ZK nodes are online
2. Third node is attempting to join
3. Third node unnecessarily goes in "LEADING" state
4. Then third goes back to LOOKING (no majority of followers) and finally
goes to FOLLOWING state.

While going through the logs I noticed that a peer C that is trying to join
an already formed cluster goes in LEADING state. This is because
QuorumCnxManager of A and B sends the entire history of notification
messages to C.
C receives the notification messages that were exchanged between A and B
when they were forming the cluster.

In FastLeaderElection.lookForLeader(), due to the following piece of code, C
quits lookForLeader assuming that it is supposed to lead.

740                             //If have received from all nodes, then
terminate
741                             if ((self.getVotingView().size() ==
recvset.size()) &&
742
(self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
743                                 self.setPeerState((proposedLeader ==
self.getId()) ?
744                                         ServerState.LEADING:
learningState());
745                                 leaveInstance();
746                                 return new Vote(proposedLeader,
proposedZxid);
747
748                             } else if (termPredicate(recvset,


In general, this does not affect correctness of FLE since C will eventually
go back to FOLLOWING state (A and B won't vote for C). However, this delays
C from joining the cluster. This can in turn affect recovery time of an
application.

I think A and B should send only the latest notification (most recent)
instead of the entire history. Does this sound resonable?

Thanks.
-Vishal

Re: peer goes in LEADING state even if ensemble is online

Posted by Vishal Kher <vi...@gmail.com>.

Folks,

Opened a jira for this: https://issues.apache.org/jira/browse/ZOOKEEPER-975

Please let me know if my proposal seems ok. I will do the change once we
agree.

Thanks.

On Wed, Jan 12, 2011 at 10:00 PM, Mahadev Konar <ma...@yahoo-inc.com>wrote:

> Forwarding it to the dev list.
>
> Thanks
> mahadev
>
>
> On 1/11/11 11:34 PM, "Vishal Kher" <vi...@gmail.com> wrote:
>
> > Hi,
> >
> > Scenario:
> > 1. 2 of the 3 ZK nodes are online
> > 2. Third node is attempting to join
> > 3. Third node unnecessarily goes in "LEADING" state
> > 4. Then third goes back to LOOKING (no majority of followers) and finally
> > goes to FOLLOWING state.
> >
> > While going through the logs I noticed that a peer C that is trying to
> join
> > an already formed cluster goes in LEADING state. This is because
> > QuorumCnxManager of A and B sends the entire history of notification
> > messages to C.
> > C receives the notification messages that were exchanged between A and B
> > when they were forming the cluster.
> >
> > In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C
> > quits lookForLeader assuming that it is supposed to lead.
> >
> > 740                             //If have received from all nodes, then
> > terminate
> > 741                             if ((self.getVotingView().size() ==
> > recvset.size()) &&
> > 742
> > (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> > 743                                 self.setPeerState((proposedLeader ==
> > self.getId()) ?
> > 744                                         ServerState.LEADING:
> > learningState());
> > 745                                 leaveInstance();
> > 746                                 return new Vote(proposedLeader,
> > proposedZxid);
> > 747
> > 748                             } else if (termPredicate(recvset,
> >
> >
> > In general, this does not affect correctness of FLE since C will
> eventually
> > go back to FOLLOWING state (A and B won't vote for C). However, this
> delays
> > C from joining the cluster. This can in turn affect recovery time of an
> > application.
> >
> > I think A and B should send only the latest notification (most recent)
> > instead of the entire history. Does this sound resonable?
> >
> > Thanks.
> > -Vishal
> >
>
>

Re: peer goes in LEADING state even if ensemble is online

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

Forwarding it to the dev list.

Thanks
mahadev


On 1/11/11 11:34 PM, "Vishal Kher" <vi...@gmail.com> wrote:

> Hi,
> 
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally
> goes to FOLLOWING state.
> 
> While going through the logs I noticed that a peer C that is trying to join
> an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C.
> C receives the notification messages that were exchanged between A and B
> when they were forming the cluster.
> 
> In FastLeaderElection.lookForLeader(), due to the following piece of code, C
> quits lookForLeader assuming that it is supposed to lead.
> 
> 740                             //If have received from all nodes, then
> terminate
> 741                             if ((self.getVotingView().size() ==
> recvset.size()) &&
> 742
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743                                 self.setPeerState((proposedLeader ==
> self.getId()) ?
> 744                                         ServerState.LEADING:
> learningState());
> 745                                 leaveInstance();
> 746                                 return new Vote(proposedLeader,
> proposedZxid);
> 747
> 748                             } else if (termPredicate(recvset,
> 
> 
> In general, this does not affect correctness of FLE since C will eventually
> go back to FOLLOWING state (A and B won't vote for C). However, this delays
> C from joining the cluster. This can in turn affect recovery time of an
> application.
> 
> I think A and B should send only the latest notification (most recent)
> instead of the entire history. Does this sound resonable?
> 
> Thanks.
> -Vishal
>