You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Devaraj K (JIRA)" <ji...@apache.org> on 2012/06/19 15:48:43 UTC

[jira] [Created] (MAPREDUCE-4352) Jobs fail during resource localization when directories in file cache reaches to unix directory limit

Devaraj K created MAPREDUCE-4352:
------------------------------------

             Summary: Jobs fail during resource localization when directories in file cache reaches to unix directory limit
                 Key: MAPREDUCE-4352
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4352
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: nodemanager
    Affects Versions: 2.0.1-alpha, 3.0.0
            Reporter: Devaraj K
            Assignee: Devaraj K


If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception.


{code:xml}
java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
{code}

We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4352) Jobs fail during resource localization when directories in file cache reaches to unix directory limit

Posted by "Devaraj K (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396779#comment-13396779 ] 

Devaraj K commented on MAPREDUCE-4352:
--------------------------------------

@Robert : In hadoop-1, it is controlled with the config 'mapreduce.tasktracker.local.cache.numberdirectories' but I don't see any configuration exist or implementation present in 2.0. I see only cache size. We need to implement it I think.

@Koji: Thanks a lot for giving the related jira id.
                
> Jobs fail during resource localization when directories in file cache reaches to unix directory limit
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4352
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4352
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.1-alpha, 3.0.0
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception.
> {code:xml}
> java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
> 	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
> 	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
> 	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
> 	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
> 	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4352) Jobs fail during resource localization when directories in file cache reaches to unix directory limit

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396765#comment-13396765 ] 

Robert Joseph Evans commented on MAPREDUCE-4352:
------------------------------------------------

There is a config in 1.0 for this type of thing.  My guess is that it either was not implemented in 2.0 or the default is too large. 
                
> Jobs fail during resource localization when directories in file cache reaches to unix directory limit
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4352
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4352
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.1-alpha, 3.0.0
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception.
> {code:xml}
> java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
> 	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
> 	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
> 	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
> 	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
> 	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4352) Jobs fail during resource localization when directories in file cache reaches to unix directory limit

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396773#comment-13396773 ] 

Koji Noguchi commented on MAPREDUCE-4352:
-----------------------------------------

Sounds similar to MAPREDUCE-1538 (pre-2.0).
                
> Jobs fail during resource localization when directories in file cache reaches to unix directory limit
> -----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4352
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4352
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.1-alpha, 3.0.0
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception.
> {code:xml}
> java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
> 	at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
> 	at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
> 	at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
> 	at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
> 	at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
> 	at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira