You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Lin Yiqun (JIRA)" <ji...@apache.org> on 2015/12/25 09:09:49 UTC

[jira] [Created] (HADOOP-12680) Loss of zookeeper quorum lead all the namenode to be standby state

Lin Yiqun created HADOOP-12680:
----------------------------------

             Summary: Loss of zookeeper quorum lead all the namenode to be standby state
                 Key: HADOOP-12680
                 URL: https://issues.apache.org/jira/browse/HADOOP-12680
             Project: Hadoop Common
          Issue Type: Bug
          Components: ha
    Affects Versions: 2.7.1
            Reporter: Lin Yiqun


When I am upgrading my zookeeper cluster, and will change the ip address of zk nodes. And I found two namenodes of my hadoop cluster got loss of connection with zk. And when I revocer the zk cluster, the two namenodes are both transitioned to standby state and this makes cluster can't provide service. I found the reason may be is following:
{code}
/**
     * If the elector gets disconnected from Zookeeper and does not know about
     * the lock state, then it will notify the service via the enterNeutralMode
     * interface. The service may choose to ignore this or stop doing state
     * changing operations. Upon reconnection, the elector verifies the leader
     * status and calls back on the becomeActive and becomeStandby app
     * interfaces. <br/>
     * Zookeeper disconnects can happen due to network issues or loss of
     * Zookeeper quorum. Thus enterNeutralMode can be used to guard against
     * split-brain issues. In such situations it might be prudent to call
     * becomeStandby too. However, such state change operations might be
     * expensive and enterNeutralMode can help guard against doing that for
     * transient issues.
     */
    void enterNeutralMode();
{code}
May be we should create a thread to monitor the stat of namenodes and don't let them all to be standby state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)