You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Diogo (JIRA)" <ji...@apache.org> on 2010/09/17 12:39:39 UTC

[jira] Commented: (ZOOKEEPER-869) Support for election of leader with arbitrary zxid

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910531#action_12910531 ] 

Diogo commented on ZOOKEEPER-869:
---------------------------------

While trying to implement this, I found an interesting issue. Say we have an ensemble with 3 nodes. Say we start all nodes together and all have the state synchronized, meaning, all replicas return the same value with ZKDatabase().getLastLoggedZxid(). It seems that the leader will send a snapshot to all followers, although that is not necessary. They need no state transfer.

The leader (quorum/Leader.java:283) reads its lastLoggedZxid() and adds a new epoch on it and stores it as lastProposed. In LearnerHandler.java:308 the thread will decide if the replica needs an empty DIFF otherwise a SNAP. (I am assuming the state of the system described above). But startForwarding will return lastProposed, which is necessarily larger than any other zxid. Then SNAP will be selected and sent.

Here there is the part of an output, where 2 replicas have the same state stored and one is behind.

2010-09-17 12:11:27,296 [myid:3] - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:FileSnap@82] - Reading snapshot /tmp/zoo3/version-2/snapshot.700000000
2010-09-17 12:11:27,298 [myid:3] - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:FileSnap@82] - Reading snapshot /tmp/zoo3/version-2/snapshot.700000000
2010-09-17 12:11:27,301 [myid:3] - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:FileTxnSnapLog@208] - Snapshotting: 700000000
2010-09-17 12:11:27,303 [myid:3] - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@285] - lastLoggedZxid = 700000000 lastProposed = 800000000   <---------- added line just after leader sets its lastProposed
2010-09-17 12:11:27,309 [myid:3] - INFO  [LearnerHandler-/127.0.0.1:48318:LearnerHandler@247] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@12d3205
2010-09-17 12:11:27,310 [myid:3] - WARN  [LearnerHandler-/127.0.0.1:48318:LearnerHandler@326] - Sending snapshot last zxid of peer is 0x700000000  zxid of leader is 0x800000000   <------ snapshot being sent!
2010-09-17 12:11:27,312 [myid:3] - WARN  [LearnerHandler-/127.0.0.1:48318:Leader@474] - Commiting zxid 0x800000000 from /127.0.0.1:2890 not first!
2010-09-17 12:11:27,313 [myid:3] - WARN  [LearnerHandler-/127.0.0.1:48318:Leader@476] - First is 0
2010-09-17 12:11:27,313 [myid:3] - INFO  [LearnerHandler-/127.0.0.1:48318:Leader@500] - Have quorum of supporters; starting up and setting last processed zxid: 34359738368
2010-09-17 12:11:28,290 [myid:3] - INFO  [LearnerHandler-/127.0.0.1:48319:LearnerHandler@247] - Follower sid: 2 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1319c
2010-09-17 12:11:28,291 [myid:3] - WARN  [LearnerHandler-/127.0.0.1:48319:LearnerHandler@326] - Sending snapshot last zxid of peer is 0x600000000  zxid of leader is 0x800000000  <---- this follower needs the snapshot.


Am I understanding something wrong?

> Support for election of leader with arbitrary zxid
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-869
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-869
>             Project: Zookeeper
>          Issue Type: New Feature
>            Reporter: Diogo
>            Priority: Minor
>
> Currently, the leader election algorithm implemented guarantees that the leader has the maximum zxid of the ensemble. The state synchronization after the election was built based on this assumption. However, other leader elections algorithms might elect leaders with arbitrary zxid. 
> To support other leader election algorithms, the state synchronization should allow the leader to have an arbitrary zxid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.