You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2014/01/13 23:41:59 UTC

[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

    [ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870080#comment-13870080 ] 

Ming Ma commented on YARN-914:
------------------------------

Junping/Luke, have you looked into the checkpointing framework being done to support preemption? One possible design to support this scenario could be something like:

1. Drain NM with a timeout. When NM is being drained, no more tasks will be assigned to this node.
2. After the timeout, RM -> AM -> tasks checkpointing will kick in. Task state and application-level state such as map outputs will be preserved; tasks will be rescheduled to other nodes.

> Support graceful decommission of nodemanager
> --------------------------------------------
>
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a node manager.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)