You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by "Alexandre Hardy (JIRA)" <ji...@apache.org> on 2010/11/03 13:35:37 UTC

[jira] Created: (ZOOKEEPER-917) Leader election selected incorrect leader

Leader election selected incorrect leader
-----------------------------------------

                 Key: ZOOKEEPER-917
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
             Project: Zookeeper
          Issue Type: Bug
          Components: leaderElection, server
    Affects Versions: 3.2.2
         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
Debian lenny
            Reporter: Alexandre Hardy
            Priority: Critical


We had three nodes running zookeeper:
  * 192.168.130.10
  * 192.168.130.11
  * 192.168.130.14

192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).

DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.

The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.

This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929354#action_12929354 ] 

Flavio Junqueira commented on ZOOKEEPER-917:
--------------------------------------------

Hi Vishal, It is certainly understand not having dedicated development time being an issue. I actually didn't know you're interested in the cluster membership... I'm glad to hear, though.

On your questions:
# Suppose we have an ensemble comprising 3 servers: A, B, and C. Now suppose that C is the leader, and both A and B follow C. If A disconnects from C for whatever reason (e.g., network partition) and it tries to elect a leader, it won't get any other process in the LOOKING state. It will actually receive a notification from C saying that it is leading and one from B saying that it is following C, both with an earlier leader election epoch. To avoid having A locked out (not able to elect C as leader), we implemented this exception: a process accepts going back to an earlier leader election only if it receives a notification from the leader saying that it is leading and from a quorum saying that it is following;
# I'm not sure if you referring to specific problem of this jira or if you are asking about my hypothetical example. Assuming it is the former, the follower (Follower:followLeader()) checks if the leader is proposing an earlier epoch, and if not, it accepts the leader snapshot. Because the epoch is the same, all followers will accept the leader snapshot follow it. 

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-917:
------------------------------------

    Fix Version/s: 3.4.0
                   3.3.3

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928171#action_12928171 ] 

Flavio Junqueira commented on ZOOKEEPER-917:
--------------------------------------------

The program I was using to open your logs was hiding some of the messages for some reason unknown to me. I now understand why the leader was elected in your case and the behavior is legitimate. Let me try to explain.

We currently repeat the last notification sent to a given server upon reconnecting to it. This is to avoid problems with messages partially sent, and, assuming no further bugs, the protocol is resilient to messages duplicates. At the same time, a server A decides to follow another server B if it receives a message from B saying that B is leading and from a quorum saying that they are following, even if A is in a later election epoch. This mechanism is there to avoid A being locked out of the ensemble in the case it partitions away and comes back later. 

>From you logs, what happens is:

# Fresh server 2 receives previous notifications from 0 and 1, and decide to lead;
# Server 1 receives the last message from server 0 saying that it is following 2 (which was the previous leader), and the notification from 2 saying that it is leading. Server 1 consequently decides to follow 2;
# Server 0 receives the last message from server 1 saying that it is following 2 (which was the previous leader), and the notification from 2 saying that it is leading. Server 0 consequently decides to follow 2.

Now the main problem I see is that the followers accept the snapshot from the leader, and they shouldn't given that they have moved to a later epoch. I suspect that we currently allow a server to come back to an epoch it has been in the past to again avoid having a server locked out after being partitioned away and healing, but I need to do some further inspection.

My overall take is that your case is unfortunately not legitimate, meaning that we don't currently provision for configuration changes. The case you expose in general constitutes a loss of quorum, and that violates one of our core assumptions. In more detail, a quorum supporting a leader must have a non-empty intersection with the quorum of servers that have accepted requests in the previous epoch. Wiping out the state of server 2, by replacing it with a fresh server, leads to the situation in which just one server contains all transactions accepted by a quorum (and possibly committed). If you hadn't replaced server 2 with a fresh server, then either server 2 would have been elected again just the same, and it would be fine because it was previously the leader, or it wouldn't have been elected because the leader was previously another server and the last notifications of 0 and 1 would be supporting a different server.

On reconfigurations, we have talked about it (http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership), but we haven't made enough progress recently and it is currently not implemented. It would be great to get some help here.

Let me know if this analysis makes any sense to you, please.

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Alexandre Hardy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927888#action_12927888 ] 

Alexandre Hardy commented on ZOOKEEPER-917:
-------------------------------------------

Hi Flavio,

The three zookeeper servers are zookeeper1, zookeeper2 and zookeeper3.
Initially the servers were
    * 192.168.130.10: zookeeper1
    * 192.168.130.11: zookeeper3
    * 192.168.130.14: zookeeper2

After .11 was removed the servers were:
    * 192.168.130.10: zookeeper1
    * 192.168.130.13: zookeeper3
    * 192.168.130.14: zookeeper2

All other settings were set by hbase:
    * tickTime=2000
    * initLimit=10
    * syncLimit=5  
    * peerport=2888
    * leaderport=3888

zookeeper1 would have node id 0
zookeeper2 would have node id 1
zookeeper3 would have node id 2

I'm not sure what else I can give you concerning the configuration.

I note that in 192.168.130.14 (node id 1) we have 
{noformat}
2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election: 4294967742
2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 1, 4294967742, 2, 1, LOOKING, LOOKING, 1
2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (2, 1)
2010-11-02 09:36:27,988 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding vote
2010-11-02 09:36:27,989 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 1, LOOKING, FOLLOWING, 0
{noformat}
 
I don't think there is much chance of some kind of networking configuration, but could that explain what we are seeing?



> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flavio Junqueira resolved ZOOKEEPER-917.
----------------------------------------

    Resolution: Not A Problem

My pleasure to help. I'm marking it as not a problem for now, but feel free to come back and ask for more clarification if needed.

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928240#action_12928240 ] 

Patrick Hunt commented on ZOOKEEPER-917:
----------------------------------------

Sounds like we need more documentation detailing the election process and what expected behavior is. Flavio perhaps you could
create a JIRA for that and start collecting this type of information? In particular you could link to jiras of this type, with the intent
of general documentation, including detail about these specific types of questions.

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928605#action_12928605 ] 

Vishal K commented on ZOOKEEPER-917:
------------------------------------

Hi Flavio,

Sorry for not making much progress on (http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership). I have spent some time to understand the code. But It is a bit difficult to focus on development without dedicated development time. I am pushing to get dedicated development time at work for this so that I don't have to rely on my spare time. 

Few questions related to your comments:
1. Can you please elaborate on : "At the same time, a server A decides to follow another server B if it receives a message from B saying that B is leading and from a quorum saying that they are following, even if A is in a later election epoch. This mechanism is there to avoid A being locked out of the ensemble in the case it partitions away and comes back later."

2. Why is it not OK for B to give up leadership when it sees that its <epoch,zxid> is lower than others?

Thanks.


> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928179#action_12928179 ] 

Flavio Junqueira commented on ZOOKEEPER-917:
--------------------------------------------

Hi Alexandre, It is an key premise of important replication algorithms, like Paxos, that there is a portion of the state that persists across crashes (and recoveries). By replacing server 2 with a fresh server, you simply got rid of the persistent state. In general, making that replacement you've made may lead you to trouble due to the problem I exposed a few postings up. Of course, if you wait for a successful election, the problem is supposed to go away because you have reestablished a quorum and this quorum does not contain the faulty server, but then you have to make sure the election happens before you introduce the fresh server perhaps through jmx or by inspecting the logs. Simply setting a reasonable timeout will work in most cases, but the leader election is not guaranteed to succeed, and there is a chance, likely to be small, that you'll end up with a corrupt state. 



> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Alexandre Hardy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927839#action_12927839 ] 

Alexandre Hardy commented on ZOOKEEPER-917:
-------------------------------------------

We noted that certain ephemeral nodes were no longer behaving as expected (started incrementing from zero again) and are concerned about the potential of data loss since the latest zxid's don't seem to be recognized by the leader.

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Alexandre Hardy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927833#action_12927833 ] 

Alexandre Hardy commented on ZOOKEEPER-917:
-------------------------------------------

Excerpt from logs on 192.168.130.10:
{noformat}
2010-11-02 09:36:28,060 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election: 4294967742
2010-11-02 09:36:28,061 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2010-11-02 09:36:28,061 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.10:37781]
2010-11-02 09:36:28,061 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 0, 4294967742, 2, 0, LOOKING, LOOKING, 0
2010-11-02 09:36:28,063 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding vote
2010-11-02 09:36:28,064 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2010-11-02 09:36:28,064 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.14:50222]
2010-11-02 09:36:28,064 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 0, LOOKING, FOLLOWING, 1
2010-11-02 09:36:28,065 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2010-11-02 09:36:28,065 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.14:50223]
2010-11-02 09:36:28,068 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2010-11-02 09:36:28,068 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.12:59044]
2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 0, LOOKING, LEADING, 2
2010-11-02 09:36:28,073 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/192.168.130.10:2181 remote=/192.168.130.10:37786]
2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.quorum.QuorumPeer: FOLLOWING
2010-11-02 09:36:28,073 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server 
2010-11-02 09:36:28,074 INFO org.apache.zookeeper.server.quorum.Follower: Following zookeeper3/192.168.130.13:2888
{noformat}

Excerpt from logs on 192.168.130.11:
{noformat}
2010-11-02 09:36:14,065 INFO org.apache.zookeeper.server.quorum.QuorumPeerConfig: Defaulting to majority quorums
2010-11-02 09:36:14,120 INFO org.apache.zookeeper.server.quorum.QuorumPeerMain: Starting quorum peer
2010-11-02 09:36:14,172 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: My election bind port: 3888
2010-11-02 09:36:14,182 INFO org.apache.zookeeper.server.quorum.QuorumPeer: LOOKING
2010-11-02 09:36:14,183 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election: -1
2010-11-02 09:36:14,191 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 2, LOOKING, LOOKING, 2
2010-11-02 09:36:14,191 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding vote
2010-11-02 09:36:14,193 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Interrupted while waiting for message on queue
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1952)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
    at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:345)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:532)
2010-11-02 09:36:14,194 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Send worker leaving thread
2010-11-02 09:36:14,194 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 2, LOOKING, LOOKING, 1
2010-11-02 09:36:14,194 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Adding vote
2010-11-02 09:36:14,195 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Interrupted while waiting for message on queue
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1952)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
    at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:345)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:532)
2010-11-02 09:36:14,195 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Send worker leaving thread
2010-11-02 09:36:14,202 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken:
java.nio.channels.AsynchronousCloseException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:281)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:593)
2010-11-02 09:36:14,401 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: About to leave instance:2, -1, 2, LEADING
2010-11-02 09:36:14,402 INFO org.apache.zookeeper.server.quorum.QuorumPeer: LEADING
{noformat}

I'm not sure why 192.168.130.13 ended up as the leader when it did not have the most up to date transaction ID. Also, I don't see the notification messages of the other nodes in the logs of 192.168.130.13.

Is there any reason why other nodes would accept 192.168.130.13 as the leader?

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Alexandre Hardy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927836#action_12927836 ] 

Alexandre Hardy commented on ZOOKEEPER-917:
-------------------------------------------

Sorry, that should have been 192.168.130.13, not 192.168.130.11.

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927908#action_12927908 ] 

Flavio Junqueira commented on ZOOKEEPER-917:
--------------------------------------------

I downloaded your logs, but the out files are empty and I couldn't find the notification messages. By looking at the excerpts you posted, it sounds like node 1 tells 0 that it is following 2 and node says that it is following (this is fine as node 2 might have received some old messages), so node 0 must follow 2. Now the question is why node 1 decided to follow 2, specially because it has a higher zxid and the follower code should have rejected an attempt to follow a leader from an earlier epoch. 

It would be nice to have a look at the output of node 1. 

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927869#action_12927869 ] 

Flavio Junqueira commented on ZOOKEEPER-917:
--------------------------------------------

Hi Alexandre, Could you please post your configuration parameters?

I noticed the following in both excerpts:
{noformat}
INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 2, LOOKING, LOOKING, 1
INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 2, LOOKING, LOOKING, 2
{noformat}

which implies that both servers, 1 and 2, were starting from scratch and in an ensemble of 3 servers they form a quorum.

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929602#action_12929602 ] 

Vishal K commented on ZOOKEEPER-917:
------------------------------------

Hi Flavio,

Thanks for the clarification.

Looks like this scenario is not likely to happen if replace is done carefully. For the sake of argument, shouldn't we make the exception that you described only if the joining peer is going to be a follower? For the problem reported in this jira,  the server that was replaced (server 2) will not have seen the zxid received in the notifications from the other two nodes. Therefore, is this case, will it make sense for server 2 to start a new round of election?

-Vishal



> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927949#action_12927949 ] 

Flavio Junqueira commented on ZOOKEEPER-917:
--------------------------------------------

Even though the logs do not make a lot of sense for me at this point, I was thinking that your scenario is not supposed to work given our guarantees. Let's look at an example.

Suppose we have 3 servers: A, B, and  C. Suppose that C is initially the leader and proposes operations that B is able to ack, but A doesn't. Now, suppose that I come and replace C with a fresh server, same id but empty state, and I do it before A and B are able to elect a new leader and recover. In this case, A and C may form a quorum and the state of the ZooKeeper ensemble would be empty. The replacement of server C with a fresh server violates our assumptions. 

It should work, though, if you add a fresh server with a working ensemble. That is, you let A and B elect a new leader, and then you start the new C server. In your case, I'm still not sure why it happens because the initial zxid of node 1 is 4294967742 according to your excerpt. 

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Alexandre Hardy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928184#action_12928184 ] 

Alexandre Hardy commented on ZOOKEEPER-917:
-------------------------------------------

Thanks Flavio, 

We will take the delay approach for the moment. I think the risks are acceptable for our purposes.

You are welcome to close the issue if there are no outstanding questions that need to be addressed on your part.

Thanks again for spending so much time on this issue and explaining what the reasoning and consequences are.


> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929678#action_12929678 ] 

Flavio Junqueira commented on ZOOKEEPER-917:
--------------------------------------------

Hi Vishal, There is possibly a misunderstanding here. Server 2 reported in this jira (the leader) does not go back to an earlier epoch, but the other two do, and they are following, so if I understand your argument correctly, the exception is being applied as you suggest.



> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Alexandre Hardy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928137#action_12928137 ] 

Alexandre Hardy commented on ZOOKEEPER-917:
-------------------------------------------

The excerpts are extracted from {{hbase-0.20/hbase*.log}}, so the information should be readily available.
The tar file contents should be as follows:
{noformat}
drwxr-xr-x ah/users          0 2010-11-02 14:42 192.168.130.10/
drwxr-xr-x ah/users          0 2010-11-03 13:33 192.168.130.10/hbase-0.20/
-rw-r--r-- ah/users          0 2010-11-02 14:42 192.168.130.10/hbase-0.20/hbase--zookeeper-e0-cb-4e-71-8-d3.out
-rw-r--r-- ah/users   62922921 2010-11-02 14:42 192.168.130.10/hbase-0.20/hbase--zookeeper-e0-cb-4e-71-8-d3.log
drwxr-xr-x ah/users          0 2010-11-02 14:42 192.168.130.12/
drwxr-xr-x ah/users          0 2010-11-03 13:27 192.168.130.12/hbase-0.20/
drwxr-xr-x ah/users          0 2010-11-02 14:42 192.168.130.13/
drwxr-xr-x ah/users          0 2010-11-03 13:27 192.168.130.13/hbase-0.20/
-rw-r--r-- ah/users   65903411 2010-11-02 14:42 192.168.130.13/hbase-0.20/hbase--zookeeper-e0-cb-4e-65-4d-4e.log
-rw-r--r-- ah/users          0 2010-11-02 14:42 192.168.130.13/hbase-0.20/hbase--zookeeper-e0-cb-4e-65-4d-4e.out
drwxr-xr-x ah/users          0 2010-11-02 14:42 192.168.130.14/
drwxr-xr-x ah/users          0 2010-11-03 13:27 192.168.130.14/hbase-0.20/
-rw-r--r-- ah/users          0 2010-11-02 14:42 192.168.130.14/hbase-0.20/hbase--zookeeper-e0-cb-4e-71-8-a8.out
-rw-r--r-- ah/users   62835121 2010-11-02 14:42 192.168.130.14/hbase-0.20/hbase--zookeeper-e0-cb-4e-71-8-a8.log
{noformat}

The only logs that are missing are those for .11, but that should not influence the analysis of the leader election (I hope).

We are using monitoring software which determines when a zookeeper instance is no longer reachable, and automatically starts a fresh zookeeper instance as replacement. This software can determine the failure and start a new zookeeper instance fairly rapidly. Would it be better to delay the start of a fresh zookeeper instance to allow the existing instances to elect a new leader? If so, do you have any guidelines regarding this delay? (We are considering this approach, but would like to avoid it if possible).

{quote}
In your case, I'm still not sure why it happens because the initial zxid of node 1 is 4294967742 according to your excerpt. 
{quote}
That is indeed the key question that I am trying to find an answer for! :-)

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Alexandre Hardy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexandre Hardy updated ZOOKEEPER-917:
--------------------------------------

    Attachment: zklogs-20101102144159SAST.tar.gz

Logs for the remaining nodes attached.

> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Alexandre Hardy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928174#action_12928174 ] 

Alexandre Hardy commented on ZOOKEEPER-917:
-------------------------------------------

Hi Flavio,

At first pass this seems to indicate that we can't replace a failed zookeeper server by a new one, but that statement is probably way too strong. If I understand correctly, what you are saying is that the server can be replaced only after a new leader has been elected? i.e. Any fresh server should only be restarted once the quorum has been reestablished?

I'm not sure I understand exactly why the election went wrong. Were the old election messages resent when the Fresh server was contactable? I would have thought that election messages should be based on the current state, and never send old state. 

This will take some time to digest and think through properly. In the meantime, can you suggest how we should deal with this situation, can we simply wait for the two remaining nodes to establish a quorum, and then reintroduce the third node? I suppose we could test if a quorum has been established by testing if we can establish a new zookeeper session.

Thanks for the help


> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

Posted by "Vishal K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930169#action_12930169 ] 

Vishal K commented on ZOOKEEPER-917:
------------------------------------

Hi Flavio,

Lets see if I understand this right. Server 2 was replaced and became the leader. Server 2 receives old notifications from others and accepts leadership even if its <epoch, zxid> prior to receiving any notifications was <1,0>.

Server 2 accepts leadership because 0, 1 vote for 2 and we allow 2 to become leader based on point 1. in your comment on 07/Nov/10.

My question with regards to point 1.:
- In your example, it is OK to allow A to join the cluster and become a follower (so that A does not remain locked out). But is it OK for A to accept leadership even if it has not seen the zxid reported by others (regardless of the votes)? Shouldn't it reject leadership?

Am I still misunderstanding the problem?


> Leader election selected incorrect leader
> -----------------------------------------
>
>                 Key: ZOOKEEPER-917
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: leaderElection, server
>    Affects Versions: 3.2.2
>         Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries)
> Debian lenny
>            Reporter: Alexandre Hardy
>            Priority: Critical
>             Fix For: 3.3.3, 3.4.0
>
>         Attachments: zklogs-20101102144159SAST.tar.gz
>
>
> We had three nodes running zookeeper:
>   * 192.168.130.10
>   * 192.168.130.11
>   * 192.168.130.14
> 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).
> DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.
> The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.
> This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.