You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Yuqi Wang (JIRA)" <ji...@apache.org> on 2018/03/08 05:51:00 UTC

[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

     [ https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuqi Wang updated YARN-8012:
----------------------------
    Attachment: YARN-8012-branch-2.7.1.001.patch

> Support Unmanaged Container Cleanup
> -----------------------------------
>
>                 Key: YARN-8012
>                 URL: https://issues.apache.org/jira/browse/YARN-8012
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>    Affects Versions: 2.7.1
>            Reporter: Yuqi Wang
>            Assignee: Yuqi Wang
>            Priority: Major
>             Fix For: 2.7.1
>
>         Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container* is a container which is no longer managed by NM. Thus, it is cannot be managed by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  # For container resource managed by YARN, such as container job object
>  and disk data:
>  ** NM service is disabled or removed on the node.
>  ** NM is unable to start up again on the node, such as depended configuration, or resources cannot be ready.
>  ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  ** NM has bugs, such as wrongly mark live container as complete.
>  #  For container resource unmanaged by YARN:
>  ** User breakaway processes from container job object.
>  ** User creates VMs from container job object.
>  ** User acquires other resource on the machine which is unmanaged by
>  YARN, such as produce data outside Container folder.
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN and the node:
>  ** Cause YARN and node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for user:
>  ** App which has bugs can still produce bad impacts to outside even if the App is killed for a long time
> *Initial patch for review:*
> For the initial patch, the unmanaged container cleanup feature on Windows, only can cleanup the container job object of the unmanaged container. Cleanup for more container resources will be supported. And the UT will be added if the design is agreed.
> The current container will be considered as unmanaged when:
>  # NM is dead:
>  ** Failed to check whether container is managed by NM within timeout.
>  # NM is alive but container is
>  org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
>  or not found:
>  ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE or
>  not found in the NM container list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org