You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "wangchengwei (JIRA)" <ji...@apache.org> on 2019/08/05 07:39:00 UTC

[jira] [Comment Edited] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

    [ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899837#comment-16899837 ] 

wangchengwei edited comment on YARN-9616 at 8/5/19 7:38 AM:
------------------------------------------------------------

Hi, [~wzzdreamer] ,  I have figured out a solution to this issue.

What caused this issue is that the packed arhcive files would be unpacked by NM automatically after localization, then the _SharedCacheUploader_ coludn't find the origin archive files and threw _FileNotFoundException_. This issue wolud lead to the packed archive files wolud never upload to share cache, and would be uploaded and localized again and again. 

All origin resource files wolud be upload to a hdfs path (staging or  specified by user) before job submitted, so all resource files cloud be found at hdfs. As the origin resource files of packed archives cloud not found in NM, we cloud get these files from their hdfs path rather than NM local path. So the solution to this issue is:
 # *check whether the resource is packed archive before upload*
 # *if not,  uploaded it from NM local path*
 # *if yes, copied origin file in hdfs to the shared cache path*

The solution colud solve this issue in my tests.  If needed, I cloud submit the patch here.

 


was (Author: smarthan):
Hi, [~wzzdreamer] ,  I have figured out a solution to this issue.

What caused this issue is that the packed arhcive files would be unpacked by NM automatically after localization, then the _SharedCacheUploader_ coludn't find the origin archive files and threw _FileNotFoundException_. This issue wolud lead to the packed archive files wolud never upload to share cache, and would be uploaded and localized again and again. 

All origin resource files wolud be upload to a hdfs path (staging or  specified by user) before job submitted, so all resource files cloud be found at hdfs. As the origin resource files of packed archives cloud not found in NM, we cloud get these files from their hdfs path rather than NM local path. So the solution to this issue is:
 # *check whether the resource is packed archive before upload*
 #  *if not,  uploaded it from NM local path*
 # *if yes, copied origin file in hdfs to the shared cache path*

The solution colud solve this issue in my tests.  I submit the patch here, please review it if possible. 

 

> Shared Cache Manager Failed To Upload Unpacked Resources
> --------------------------------------------------------
>
>                 Key: YARN-9616
>                 URL: https://issues.apache.org/jira/browse/YARN-9616
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.3, 2.9.2, 2.8.5
>            Reporter: zhenzhao wang
>            Assignee: zhenzhao wang
>            Priority: Major
>
> Yarn will unpack archives files and some other files based on the file type and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be unpacked while download. Let's say there're file1 && file2 inside one.zip. Then the files kept on local disk will be like /disk3/yarn/local/filecache/352/one.zip/file1 and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache uploader couldn't upload one.zip to shared cache as it was removed during localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader: Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
>         at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
>         at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
>         at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
>         at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
>         at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
>         at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org