You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2014/08/12 04:23:12 UTC

[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

    [ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093655#comment-14093655 ] 

Junping Du commented on YARN-2331:
----------------------------------

[~jlowe], for rollup when NM is not supervised, I think another way is to add a command line in RM Admin to bring down specific NM without killing containers (by notifying RMNode and heartbeat back) given no admin port to NM so far. The NM services shutdown (no matter decommission or failed occasionally) without supervised won't trigger this CLI so won't preserve running containers. Thoughts?

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> ------------------------------------------------------------------------
>
>                 Key: YARN-2331
>                 URL: https://issues.apache.org/jira/browse/YARN-2331
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>
> When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being performed.  In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being performed.  In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)