You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Lin Yiqun (JIRA)" <ji...@apache.org> on 2015/12/25 09:09:49 UTC
[jira] [Created] (HADOOP-12680) Loss of zookeeper quorum lead all
the namenode to be standby state
Lin Yiqun created HADOOP-12680:
----------------------------------
Summary: Loss of zookeeper quorum lead all the namenode to be standby state
Key: HADOOP-12680
URL: https://issues.apache.org/jira/browse/HADOOP-12680
Project: Hadoop Common
Issue Type: Bug
Components: ha
Affects Versions: 2.7.1
Reporter: Lin Yiqun
When I am upgrading my zookeeper cluster, and will change the ip address of zk nodes. And I found two namenodes of my hadoop cluster got loss of connection with zk. And when I revocer the zk cluster, the two namenodes are both transitioned to standby state and this makes cluster can't provide service. I found the reason may be is following:
{code}
/**
* If the elector gets disconnected from Zookeeper and does not know about
* the lock state, then it will notify the service via the enterNeutralMode
* interface. The service may choose to ignore this or stop doing state
* changing operations. Upon reconnection, the elector verifies the leader
* status and calls back on the becomeActive and becomeStandby app
* interfaces. <br/>
* Zookeeper disconnects can happen due to network issues or loss of
* Zookeeper quorum. Thus enterNeutralMode can be used to guard against
* split-brain issues. In such situations it might be prudent to call
* becomeStandby too. However, such state change operations might be
* expensive and enterNeutralMode can help guard against doing that for
* transient issues.
*/
void enterNeutralMode();
{code}
May be we should create a thread to monitor the stat of namenodes and don't let them all to be standby state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)