You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Yuqi Wang (JIRA)" <ji...@apache.org> on 2018/03/08 06:14:00 UTC

[jira] [Comment Edited] (YARN-8012) Support Unmanaged Container Cleanup

    [ https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390782#comment-16390782 ] 

Yuqi Wang edited comment on YARN-8012 at 3/8/18 6:13 AM:
---------------------------------------------------------

*Initial patch for review:*

For the initial patch, the unmanaged container cleanup feature on Windows, only can cleanup the container job object of the unmanaged container. {color:#f79232}{color:#59afe1}Cleanup for more container resources will be supported. And the UT will be added if the design is agreed.{color}{color}

The current container will be considered as unmanaged when:
 # NM is dead:
 ** Failed to check whether container is managed by NM within timeout.
 # NM is alive but container is
 org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
 or not found:
 ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE or
 not found in the NM container list.


was (Author: yqwang):
*Initial patch for review:*

For the initial patch, the unmanaged container cleanup feature on Windows, only can cleanup the container job object of the unmanaged container. {color:#59afe1}Cleanup for more container resources will be supported. And the UT will be added if the design is agreed.{color}

The current container will be considered as unmanaged when:
 # NM is dead:
 ** Failed to check whether container is managed by NM within timeout.
 # NM is alive but container is
org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
or not found:
 ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE or
not found in the NM container list.

> Support Unmanaged Container Cleanup
> -----------------------------------
>
>                 Key: YARN-8012
>                 URL: https://issues.apache.org/jira/browse/YARN-8012
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>    Affects Versions: 2.7.1
>            Reporter: Yuqi Wang
>            Assignee: Yuqi Wang
>            Priority: Major
>             Fix For: 2.7.1
>
>         Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Things become worse if work-preserving NM restart enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org