You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Varun Saxena (JIRA)" <ji...@apache.org> on 2015/10/16 13:51:05 UTC
[jira] [Created] (YARN-4273) Containers can be leaked due to race
between application being killed and NM registering back after recovery
Varun Saxena created YARN-4273:
----------------------------------
Summary: Containers can be leaked due to race between application being killed and NM registering back after recovery
Key: YARN-4273
URL: https://issues.apache.org/jira/browse/YARN-4273
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Varun Saxena
Assignee: Varun Saxena
This issue is based on discussion on YARN-4000
Consider this scenario :
1) Application is recovered and added into scheduler, some slow NM has not re-registered back, so those containers are not yet recovered.
2) User kills this app
3) CapacityScheduler#doneApplicationAttempt is called, containers tracked by RM so far are killed. Note that CapacityScheduler#doneApplication is not called, so scheduler still has the SchedulerApplication in memory
4) Slow NM now re-registers and try to recover the containers. If application is set to keep containers across attempts, these containers will be recovered even though application is in the process of being killed. These container will not be killed later on. Hence, these containers are leaked.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)