You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Sejin Hwang (Jira)" <ji...@apache.org> on 2023/02/22 13:47:00 UTC

[jira] [Updated] (HADOOP-18639) DockerContainerDeletionTask is not removed from the Nodemanager's statestore when the task is completed.

     [ https://issues.apache.org/jira/browse/HADOOP-18639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sejin Hwang updated HADOOP-18639:
---------------------------------
    Description: 
YARN NodeManager has two types of deletion tasks: the FileDeletionTask for deleting log files and the DockerContainerDeletionTask for deleting Docker containers.
 
The FileDeletionTask is removed from the statestore when the task is completed, but the DockerContainerDeletionTask is not.
Therefore, the DockerContainerDeletionTask accumulates continuously in the statestore.
 
This causes the NodeManager's deletion service to run the accumulated DockerContainerDeletionTask in the statestore when the NodeManager restarts.

As a result, the FileDeletionTask and DockerContainerDeletionTask are delayed unnecessarily while processing accumulated tasks, which can cause disk full issues in environments where a large number of containers are allocated and released.

I will attach a patch soon

  was:
YARN NodeManager has two types of deletion tasks: the FileDeletionTask for deleting log files and the DockerContainerDeletionTask for deleting Docker containers.
 
The FileDeletionTask is removed from the statestore when the task is completed, but the DockerContainerDeletionTask is not.
Therefore, the DockerContainerDeletionTask accumulates continuously in the statestore.
 
This causes the NodeManager's deletion service to run the accumulated DockerContainerDeletionTask in the statestore when the NodeManager restarts.

As a result, the FileDeletionTask and DockerContainerDeletionTask are delayed unnecessarily while processing accumulated tasks, which can cause disk full issues in environments where a large number of containers are allocated and released.


> DockerContainerDeletionTask is not removed from the Nodemanager's statestore when the task is completed.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18639
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18639
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Sejin Hwang
>            Priority: Major
>
> YARN NodeManager has two types of deletion tasks: the FileDeletionTask for deleting log files and the DockerContainerDeletionTask for deleting Docker containers.
>  
> The FileDeletionTask is removed from the statestore when the task is completed, but the DockerContainerDeletionTask is not.
> Therefore, the DockerContainerDeletionTask accumulates continuously in the statestore.
>  
> This causes the NodeManager's deletion service to run the accumulated DockerContainerDeletionTask in the statestore when the NodeManager restarts.
> As a result, the FileDeletionTask and DockerContainerDeletionTask are delayed unnecessarily while processing accumulated tasks, which can cause disk full issues in environments where a large number of containers are allocated and released.
> I will attach a patch soon



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org