You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Vinay (JIRA)" <ji...@apache.org> on 2014/01/22 13:16:25 UTC
[jira] [Updated] (HADOOP-10251) Both NameNodes could be in STANDBY
State if SNN network is unstable
[ https://issues.apache.org/jira/browse/HADOOP-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinay updated HADOOP-10251:
---------------------------
Attachment: HADOOP-10251.patch
Attaching a patch for the above case. Please review
> Both NameNodes could be in STANDBY State if SNN network is unstable
> -------------------------------------------------------------------
>
> Key: HADOOP-10251
> URL: https://issues.apache.org/jira/browse/HADOOP-10251
> Project: Hadoop Common
> Issue Type: Bug
> Components: ha
> Affects Versions: 2.2.0
> Reporter: Vinay
> Assignee: Vinay
> Priority: Critical
> Attachments: HADOOP-10251.patch
>
>
> Following corner scenario happened in one of our cluster.
> 1. NN1 was Active and NN2 was Standby
> 2. NN2 machine's network was slow
> 3. NN1 got shutdown.
> 4. NN2 ZKFC got the notification and trying to check for old active for fencing. (This took little more time, again due to slow network)
> 5. In between, NN1 got restarted by our automatic monitoring, and ZKFC made it Active.
> 6. Now NN2 ZKFC got Old Active as NN2 and it did graceful fencing of NN1 to STANBY.
> 7. Before writing ActiveBreadCrumb to ZK, NN2 ZKFC got session timeout and got shutdown before making NN2 Active.
> *Now cluster having both NameNodes as STANDBY.*
> NN1 ZKFC still thinks that its nameNode is in Active state.
> NN2 ZKFC waiting for election.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)