You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/07/22 17:16:39 UTC

[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

    [ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070372#comment-14070372 ] 

Jason Lowe commented on YARN-2331:
----------------------------------

We can distinguish between supervised/unsupervised via a config.  Determining whether an unsupervised shutdown is due to a rolling upgrade is a bit trickier.  Some of the options there include:

- Add an admin port to NMs and a corresponding CLI command to send commands to the port.  There's a lot of boilerplate that goes along with this, but it is the most flexible option if we ever want to add other admin commands to an NM.
- Add a REST API to do this (with appropriate authentication to make sure not just anyone can cause an NM shutdown)
- Use another signal handler to indicate the shutdown just like the SIGTERM handler today for a normal shutdown but for another signal like SIGINT.   The shell scripts could have a new command that would perform the rolling upgrade shutdown with the new signal rather than SIGTERM.  This would be relatively simple to implement on POSIX platforms like Linux but has portability ramifications for non-POSIX platforms like Windows.

> Distinguish shutdown during supervision vs. shutdown for rolling upgrade
> ------------------------------------------------------------------------
>
>                 Key: YARN-2331
>                 URL: https://issues.apache.org/jira/browse/YARN-2331
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>
> When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly:
> # The NM is running under supervision.  In that case containers should be preserved so the automatic restart can recover them.
> # The NM is not running under supervision and a rolling upgrade is not being performed.  In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them.
> # The NM is not running under supervision and a rolling upgrade is being performed.  In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)