You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org> on 2012/01/05 01:38:39 UTC

[jira] [Commented] (HADOOP-7924) 
FailoverController for client-based configuration

    [ https://issues.apache.org/jira/browse/HADOOP-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180045#comment-13180045 ] 

Todd Lipcon commented on HADOOP-7924:
-------------------------------------

in {{preFailoverChecks}}, I think it's clearer to structure the code like:
{code}
HAServiceState toSvcState;
try {
  toSvcState = toSvc.getServiceState();
} catch (Exception e) {
  // throw the FailoverFailed
}

// now check toSvcState.equals(STANDBY)
{code}
rather than trying to collapse both exceptions into one throw. Also, should log the exception thrown by getServiceState.

----

In {{failover()}}, I think you probably want to catch all Throwables in another catch clause - eg what if it's in some bad state and your failover attempt caused it to crash, which would give your IPC a SocketTimeoutException.

----
{code}
+  public FailoverFailedException(String message, Throwable cause) {
+      super(message, cause);
{code}
indentation

----
{code}
+        new UsageInfo("<host:port> <host:port>",
+            "Failover from the 1st daemon to the 2nd"))
{code}
I think better to not abbreviate "first" and "second"

----
- Can you add some javadoc to {{testManualFailoverCanResultInTwoActives}} -- it's strange that this is a test case... it's more like you're showing that a particular user error can cause a problem, rather than showing something about the bug, right? Or else it should be a test case that fails, with an @Ignore explaining why it fails, maybe?

- Just to confirm, the manual test you mentioned was done with two NNs in a running HA cluster?
                
> 
FailoverController for client-based configuration
> --------------------------------------------------
>
>                 Key: HADOOP-7924
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7924
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: HA Branch (HDFS-1623)
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-7924.txt, hadoop-7924.txt
>
>
> Basic FailoverController to coordinate fail-over using a client-based config (ie fail-over from NameNode x to NameNode y). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira