You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Edward Ribeiro (JIRA)" <ji...@apache.org> on 2012/12/14 21:24:12 UTC

[jira] [Commented] (ZOOKEEPER-1404) leader election pseudo code probably incorrect

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13532633#comment-13532633 ] 

Edward Ribeiro commented on ZOOKEEPER-1404:
-------------------------------------------

Unfortunately, letting the SEQUENCE number be the highest, instead of the smallest, lend it itself to a scenario that is both unstable, more complex, and with more operations. Consider the following scenarios where the highest number identifies the leader.

A server connect and creates a sequential-ephemeral node. It's the first one, so it elects itself the leader. Following that a couple of servers connect and each one will have the largest number, even if for very a brief period of time, so the leadership will start to "hop" from one server to the other until it stabilizes. This generates a couple of net messages and watch setup/delivery.
 
Furthermore, a servers looses its connection and connects again, it will "usurp" the leadership even if only the connection of this specific is troublesome and transient. In a super stable server scenario, this will not be a problem (after the initial burst of leader elections), but the number of messages send and received (and watches setup) will be considerably higher. But on a faulty scenario, this will cause a lot of serious liveness problems.

On the other hand, a server with the lowest number would probably stay on-line for a longer period of time, and once elected it doesn't need to change. If the leader looses the connection, the second most oldest will take the place. There can be "n" new connections, but the leader will stay stable and well known for a longer period of time.
                
> leader election pseudo code probably incorrect
> ----------------------------------------------
>
>                 Key: ZOOKEEPER-1404
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1404
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 3.4.3
>            Reporter: Robert Varga
>
> The pseudo code for leader election in the recipes.html page of 3.4.3 documentation is the following...
> {quote}
> Let ELECTION be a path of choice of the application. To volunteer to be a leader: 
> 1.Create znode z with path "ELECTION/guid-n_" with both SEQUENCE and EPHEMERAL flags;
> 2.Let C be the children of "ELECTION", and i be the sequence number of z;
> 3.Watch for changes on "ELECTION/guid-n_j", where j is the {color:red}*smallest*{color} sequence number such that j < i and n_j is a znode in C;
> Upon receiving a notification of znode deletion: 
> 1.Let C be the new set of children of ELECTION; 
> 2.If z is the smallest node in C, then execute leader procedure;
> 3.Otherwise, watch for changes on "ELECTION/guid-n_j", where j is the {color:red}*smallest*{color} sequence number such that j < i and n_j is a znode in C; 
> {quote}
> I think, in both third steps *highest* should appear instead of {color:red}*smallest*{color}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira