You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (Commented) (JIRA)" <ji...@apache.org> on 2012/01/03 22:22:39 UTC

[jira] [Commented] (MAPREDUCE-3360) Provide information about lost nodes in the UI.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179027#comment-13179027 ] 

Jason Lowe commented on MAPREDUCE-3360:
---------------------------------------

Thanks for the updates.  A couple of things about the handling of UNHEALTHY nodes:

- They are not removed from the list of nodes being tracked in the context ( {{rmNode.context.getRMNodes()}} ), so I don't think we want to add them to the list of inactive nodes.  Otherwise the node would be in the two node lists simultaneously, and that's probably not desireable.  Specifically we'd want to remove this code insertion from the patch:

{code}
@@ -394,6 +411,8 @@ public class RMNodeImpl implements RMNode, EventHandler<RMNodeEvent> {
         // Inform the scheduler
         rmNode.context.getDispatcher().getEventHandler().handle(
             new NodeRemovedSchedulerEvent(rmNode));
+        rmNode.context.getInactiveRMNodes()
+            .put(rmNode.nodeId.getHost(), rmNode);
         ClusterMetrics.getMetrics().incrNumUnhealthyNMs();
         return RMNodeState.UNHEALTHY;
       }
{code}

- A node that's marked UNHEALTHY could still have a working nodemanager web page, so we don't want to remove the link to it on the status page.  Since the UNHEALTHY nodes are tracked in the normal node list, it's simplest to remove the UNHEALTHY case from the switch statement in NodesPages.java.


At some point unit tests need to be added/updated for this change (e.g.: updating TestNodesPage.java to verify nodes that transition into the LOST state appear on the LOST page, etc.)

                
> Provide information about lost nodes in the UI.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3360
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3360
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 0.23.0
>         Environment: NA
>            Reporter: Bhallamudi Venkata Siva Kamesh
>            Priority: Critical
>         Attachments: LostNodes.png, MAPREDUCE-3360-1.patch, MAPREDUCE-3360-2.patch, MAPREDUCE-3360.patch, lostNodes.png
>
>
> Currently there is no information provided about *lost nodes*. Provide information in the UI. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira