You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "gaoxiao (JIRA)" <ji...@apache.org> on 2012/06/20 14:52:43 UTC

[jira] [Created] (ZOOKEEPER-1492) leader cannot switch to LOOKING state when lost the majority

gaoxiao created ZOOKEEPER-1492:
----------------------------------

             Summary: leader cannot switch to LOOKING state when lost the majority
                 Key: ZOOKEEPER-1492
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1492
             Project: ZooKeeper
          Issue Type: Bug
          Components: quorum
    Affects Versions: 3.4.3
         Environment: eclipse linux
            Reporter: gaoxiao
            Priority: Critical


When a follower leave the cluster, and the cluster cannot achieve a majority, the leader should get out from Leading stat and get into Looking state, but if the there are some observers, the leader will not get away and the client cannot use the cluster.

eg:

The servers config:

server.1=z1:2888:3888
server.2=z2:2888:3888
server.3=z3:2888:3888:observer

At first, 1,2,3 are all started, it's all ok, 2 is the leader, but at this time, if 1 is stopped, 2 will not leave the Leading state, and client cannot connect to cluster.

I think the problem is:
(Leader.java  method:lead)

Line 388-407
                syncedSet.add(self.getId());
                synchronized (learners) {
                    for (LearnerHandler f : learners) {
                        if (f.synced()) {
                            syncedCount++;
                            syncedSet.add(f.getSid());
                        }
                        f.ping();
                    }
                }
              if (!tickSkip && !self.getQuorumVerifier().containsQuorum(syncedSet)) {
                //if (!tickSkip && syncedCount < self.quorumPeers.size() / 2) {
                    // Lost quorum, shutdown
                  // TODO: message is wrong unless majority quorums used
                    shutdown("Only " + syncedCount + " followers, need "
                            + (self.getVotingView().size() / 2));
                    // make sure the order is the same!
                    // the leader goes to looking
                    return;
              } 

The code add all learners' ping to syncedSet, and I think at this place, only followers should be added to syncedSet, so the method 'containsQuorum' can figure out the majority.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (ZOOKEEPER-1492) leader cannot switch to LOOKING state when lost the majority

Posted by "gaoxiao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaoxiao resolved ZOOKEEPER-1492.
--------------------------------

    Resolution: Duplicate

ZOOKEEPER-1113
                
> leader cannot switch to LOOKING state when lost the majority
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1492
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1492
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.3
>         Environment: eclipse linux
>            Reporter: gaoxiao
>            Priority: Critical
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When a follower leave the cluster, and the cluster cannot achieve a majority, the leader should get out from Leading stat and get into Looking state, but if the there are some observers, the leader will not get away and the client cannot use the cluster.
> eg:
> The servers config:
> server.1=z1:2888:3888
> server.2=z2:2888:3888
> server.3=z3:2888:3888:observer
> At first, 1,2,3 are all started, it's all ok, 2 is the leader, but at this time, if 1 is stopped, 2 will not leave the Leading state, and client cannot connect to cluster.
> I think the problem is:
> (Leader.java  method:lead)
> Line 388-407
>                 syncedSet.add(self.getId());
>                 synchronized (learners) {
>                     for (LearnerHandler f : learners) {
>                         if (f.synced()) {
>                             syncedCount++;
>                             syncedSet.add(f.getSid());
>                         }
>                         f.ping();
>                     }
>                 }
>               if (!tickSkip && !self.getQuorumVerifier().containsQuorum(syncedSet)) {
>                 //if (!tickSkip && syncedCount < self.quorumPeers.size() / 2) {
>                     // Lost quorum, shutdown
>                   // TODO: message is wrong unless majority quorums used
>                     shutdown("Only " + syncedCount + " followers, need "
>                             + (self.getVotingView().size() / 2));
>                     // make sure the order is the same!
>                     // the leader goes to looking
>                     return;
>               } 
> The code add all learners' ping to syncedSet, and I think at this place, only followers should be added to syncedSet, so the method 'containsQuorum' can figure out the majority.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1492) leader cannot switch to LOOKING state when lost the majority

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397517#comment-13397517 ] 

Flavio Junqueira commented on ZOOKEEPER-1492:
---------------------------------------------

Thanks for reporting this issue. I actually think that this has been pointed out in a different form here: ZOOKEEPER-1113.
                
> leader cannot switch to LOOKING state when lost the majority
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1492
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1492
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.3
>         Environment: eclipse linux
>            Reporter: gaoxiao
>            Priority: Critical
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When a follower leave the cluster, and the cluster cannot achieve a majority, the leader should get out from Leading stat and get into Looking state, but if the there are some observers, the leader will not get away and the client cannot use the cluster.
> eg:
> The servers config:
> server.1=z1:2888:3888
> server.2=z2:2888:3888
> server.3=z3:2888:3888:observer
> At first, 1,2,3 are all started, it's all ok, 2 is the leader, but at this time, if 1 is stopped, 2 will not leave the Leading state, and client cannot connect to cluster.
> I think the problem is:
> (Leader.java  method:lead)
> Line 388-407
>                 syncedSet.add(self.getId());
>                 synchronized (learners) {
>                     for (LearnerHandler f : learners) {
>                         if (f.synced()) {
>                             syncedCount++;
>                             syncedSet.add(f.getSid());
>                         }
>                         f.ping();
>                     }
>                 }
>               if (!tickSkip && !self.getQuorumVerifier().containsQuorum(syncedSet)) {
>                 //if (!tickSkip && syncedCount < self.quorumPeers.size() / 2) {
>                     // Lost quorum, shutdown
>                   // TODO: message is wrong unless majority quorums used
>                     shutdown("Only " + syncedCount + " followers, need "
>                             + (self.getVotingView().size() / 2));
>                     // make sure the order is the same!
>                     // the leader goes to looking
>                     return;
>               } 
> The code add all learners' ping to syncedSet, and I think at this place, only followers should be added to syncedSet, so the method 'containsQuorum' can figure out the majority.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira