You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2015/02/09 14:38:35 UTC

[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

    [ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312225#comment-14312225 ] 

Junping Du commented on YARN-41:
--------------------------------

I think I could have a little misunderstand before. After checking again, your patch is actually working on decommission node, not "shutdown" (let's define call yarn daemon stop or kill -9 on NodeManager as shutdown, just for get rid of any confusion), so the patch here shouldn't affect the work on YARN-1336 (containers can still be running after "shutdown" NM, which is different from decommission).
>From what I am understanding, now the new flow in your current patch is: when user decommission a Node, the RM heartbeat back to NM with a SHUTDOWN message, NM prepare service stop and send a unRegister message to RM (via RPC call) again before it killing itself and RM (ResourceTrackerService) try to do some cleanup work. 
IMO, there are several concerns with this approach:
1.  Another round of RPC between (NM and RM) is unnecessary, RM could do the same thing (code in unRegisterNodeManager()) during sending SHUTDOWN message back.
2. Some work is already being covered (like sending DECOMMISSION event to RMNode) in NodeListManager when doing decommission (refresh) node operation. It seems new work in unRegisterNodeManager() only be unregister in NMLivenessMonitor.
Am I missing anything?


> The RM should handle the graceful shutdown of the NM.
> -----------------------------------------------------
>
>                 Key: YARN-41
>                 URL: https://issues.apache.org/jira/browse/YARN-41
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Ravi Teja Ch N V
>            Assignee: Devaraj K
>         Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41.patch
>
>
> Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)