You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org> on 2013/05/12 02:11:16 UTC

[jira] [Resolved] (YARN-73) nodemanager should cleanup running containers when it starts

     [ https://issues.apache.org/jira/browse/YARN-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli resolved YARN-73.
-----------------------------------------

    Resolution: Duplicate

With YARN-495 in, we changed NM reboot behaviour to be a simple resync - kill all containers and re-register with RM.

So in sum, YARN-72 cleans up containers on shutdown, YARN-495 does so on resync. 

There is still case when operator issues a shutdown but because NM_SLEEP_DELAY_BEFORE_SIGKILL_MS + NM_PROCESS_KILL_WAIT_MS + SHUTDOWN_CLEANUP_SLOP_MS is not enough to cleanup all containers. We can make the later configurable or can mandate operators to kill containers explicitly in that case.

Closing this as a duplicate.
                
> nodemanager should cleanup running containers when it starts
> ------------------------------------------------------------
>
>                 Key: YARN-73
>                 URL: https://issues.apache.org/jira/browse/YARN-73
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>
> Currently the nodemanager doesn't cleanup running containers when it gets restarted. This can cause containers to get lost and stick around forever. We've seen this happen multiple times when the RM is restarted. When the RM is brought back up, it doesn't know about what was running on the cluster, it tells the NMs to reboot and when the NM reboots it loses what it had running. If there are any containers that are behaving badly there is no one left that knows about them to kill them.
> We should kill any running containers when the nodemanager is being started.  Note that when the NM is being brought up it needs to somehow figure out what containers were running and be sure it doesn't kill anything it shouldn't.
> Note, we should also try to kill any running containers when the node manager is shutting down (jira 4213 was filed for this).
> This might change a bit when RM restart is implemented if tasks can actually survive across RM/NM being rebooted, but that can be addressed at that point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira