You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "lujie (JIRA)" <ji...@apache.org> on 2018/08/31 10:06:01 UTC

[jira] [Comment Edited] (YARN-8703) Localized resource may leak on disk if container is killed while localizing

    [ https://issues.apache.org/jira/browse/YARN-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598533#comment-16598533 ] 

lujie edited comment on YARN-8703 at 8/31/18 10:05 AM:
-------------------------------------------------------

[~jlowe]

Thanks for your explain, I have added sleep and kill job command in the  constructor of ResourceLocalizedEvent, I also add a log statement to show  the suspicious file path!  By doing this, i have triggered the warning log statement to execute.  The log file shows that the suspicious file can be deleted when the application  finished! 
{code:java}
2018-08-31 16:43:23,357 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Received LOCALIZED event for request { hdfs://hadoop11:29000/tmp/hadoop-yarn/staging/hires/.staging/job_1535704989367_0001/job.splitmetainfo, 1535705000427, FILE, null } but localized resource is missing
2018-08-31 16:43:23,358 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: start to delete file:/home/hires/cloudraid/hadoop/hadoop-3.2.0-SNAPSHOT/tmp/nm-local-dir/usercache/hires/appcache/application_1535704989367_0001/filecache/10
2018-08-31 16:43:23,381 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource { hdfs://hadoop11:29000/tmp/hadoop-yarn/staging/hires/.staging/job_1535704989367_0001/job.splitmetainfo, 1535705000427, FILE, null } has been removed and will no longer be localized


2018-08-31 16:43:23,395 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /home/hires/cloudraid/hadoop/hadoop-3.2.0-SNAPSHOT/tmp/nm-local-dir/usercache/hires/appcache/application_1535704989367_0001/filecache/10
{code}
 


was (Author: xiaoheipangzi):
[~jlowe]

Thanks for your explain, I have added sleep and kill job command in the  constructor of ResourceLocalizedEvent, I also add a log statement to show  the suspicious file path!  By doing this, i have triggered the warning log statement to execute.  The log file shows that the suspicious file can be deleted when the application  finished! 

> Localized resource may leak on disk if container is killed while localizing
> ---------------------------------------------------------------------------
>
>                 Key: YARN-8703
>                 URL: https://issues.apache.org/jira/browse/YARN-8703
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Jason Lowe
>            Priority: Major
>
> If a container is killed while localizing then it releases all of its resources.  If the resource count goes to zero and it is in the DOWNLOADING state then the resource bookkeeping is removed in the resource tracker.  Shortly afterwards the localizer could heartbeat in and report the successful localization of the resource that was just removed.  When the LocalResourcesTrackerImpl receives the LOCALIZED event but does not find the corresponding LocalResource for the event then it simply logs a "localized without a location" warning.  At that point I think the localized resource has been leaked on the disk since the NM has removed bookkeeping for the resource without removing it on disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org