You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by rakesh kothari <ra...@live.com> on 2012/12/09 02:13:26 UTC

Hadoop jobs failing with FileNotFound Exception while the job is still running

Hi,

We are having a strange issue in our Hadoop 
cluster. We have noticed that some of our jobs fail with the with a file
 not found exception[see below]. Basically 
the files in the "attempt_*" directory and the directory itself are 
getting deleted while the task is still being run on the host. Looking 
through some of the hadoop documentation I see that the job directory 
gets wiped out when it gets a KillJobAction however I am not sure why it
 gets wiped out while the job is still running.


My question is what could be deleting it while the job is running? 
Any thoughts or pointers on how to debug this would be helpful.



Thanks!




java.io.FileNotFoundException: 
/hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out
 (Permission denied) at java.io.FileInputStream.open(Native Method) at 
java.io.FileInputStream.(FileInputStream.java:120) at 
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
 at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
 at 
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400) at 
org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at 
org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165) at 
org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at 
org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at 
org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692)
 at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322)
 at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:259) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
 at org.apache.hadoop.mapred.Child.main(Child.java:253) 		 	   		  

Re: Hadoop jobs failing with FileNotFound Exception while the job is still running

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Rakesh,

It's a bit hard to spot it, but I see "permission denied" in the stack
trace.  Perhaps reviewing local file system permissions for Hadoop's
working directories would reveal something.  It might not be the case that
something is deleting the file.

Hope this helps,
--Chris
 On Dec 8, 2012 9:57 PM, "rakesh kothari" <ra...@live.com> wrote:

>  Hi,
>
> We are having a strange issue in our Hadoop cluster. We have noticed that
> some of our jobs fail with the with a file not found exception[see below].
> Basically the files in the "attempt_*" directory and the directory itself
> are getting deleted while the task is still being run on the host. Looking
> through some of the hadoop documentation I see that the job directory gets
> wiped out when it gets a KillJobAction however I am not sure why it gets
> wiped out while the job is still running.
> My question is what could be deleting it while the job is running? Any
> thoughts or pointers on how to debug this would be helpful.
>
> Thanks!
>
>
> java.io.FileNotFoundException:
> /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out
> (Permission denied) at java.io.FileInputStream.open(Native Method) at
> java.io.FileInputStream.(FileInputStream.java:120) at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400) at
> org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at
> org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165) at
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at
> org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at
> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
>

Re: Hadoop jobs failing with FileNotFound Exception while the job is still running

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Rakesh,

It's a bit hard to spot it, but I see "permission denied" in the stack
trace.  Perhaps reviewing local file system permissions for Hadoop's
working directories would reveal something.  It might not be the case that
something is deleting the file.

Hope this helps,
--Chris
 On Dec 8, 2012 9:57 PM, "rakesh kothari" <ra...@live.com> wrote:

>  Hi,
>
> We are having a strange issue in our Hadoop cluster. We have noticed that
> some of our jobs fail with the with a file not found exception[see below].
> Basically the files in the "attempt_*" directory and the directory itself
> are getting deleted while the task is still being run on the host. Looking
> through some of the hadoop documentation I see that the job directory gets
> wiped out when it gets a KillJobAction however I am not sure why it gets
> wiped out while the job is still running.
> My question is what could be deleting it while the job is running? Any
> thoughts or pointers on how to debug this would be helpful.
>
> Thanks!
>
>
> java.io.FileNotFoundException:
> /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out
> (Permission denied) at java.io.FileInputStream.open(Native Method) at
> java.io.FileInputStream.(FileInputStream.java:120) at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400) at
> org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at
> org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165) at
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at
> org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at
> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
>

Re: Hadoop jobs failing with FileNotFound Exception while the job is still running

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Rakesh,

It's a bit hard to spot it, but I see "permission denied" in the stack
trace.  Perhaps reviewing local file system permissions for Hadoop's
working directories would reveal something.  It might not be the case that
something is deleting the file.

Hope this helps,
--Chris
 On Dec 8, 2012 9:57 PM, "rakesh kothari" <ra...@live.com> wrote:

>  Hi,
>
> We are having a strange issue in our Hadoop cluster. We have noticed that
> some of our jobs fail with the with a file not found exception[see below].
> Basically the files in the "attempt_*" directory and the directory itself
> are getting deleted while the task is still being run on the host. Looking
> through some of the hadoop documentation I see that the job directory gets
> wiped out when it gets a KillJobAction however I am not sure why it gets
> wiped out while the job is still running.
> My question is what could be deleting it while the job is running? Any
> thoughts or pointers on how to debug this would be helpful.
>
> Thanks!
>
>
> java.io.FileNotFoundException:
> /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out
> (Permission denied) at java.io.FileInputStream.open(Native Method) at
> java.io.FileInputStream.(FileInputStream.java:120) at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400) at
> org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at
> org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165) at
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at
> org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at
> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
>

Re: Hadoop jobs failing with FileNotFound Exception while the job is still running

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Rakesh,

It's a bit hard to spot it, but I see "permission denied" in the stack
trace.  Perhaps reviewing local file system permissions for Hadoop's
working directories would reveal something.  It might not be the case that
something is deleting the file.

Hope this helps,
--Chris
 On Dec 8, 2012 9:57 PM, "rakesh kothari" <ra...@live.com> wrote:

>  Hi,
>
> We are having a strange issue in our Hadoop cluster. We have noticed that
> some of our jobs fail with the with a file not found exception[see below].
> Basically the files in the "attempt_*" directory and the directory itself
> are getting deleted while the task is still being run on the host. Looking
> through some of the hadoop documentation I see that the job directory gets
> wiped out when it gets a KillJobAction however I am not sure why it gets
> wiped out while the job is still running.
> My question is what could be deleting it while the job is running? Any
> thoughts or pointers on how to debug this would be helpful.
>
> Thanks!
>
>
> java.io.FileNotFoundException:
> /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out
> (Permission denied) at java.io.FileInputStream.open(Native Method) at
> java.io.FileInputStream.(FileInputStream.java:120) at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400) at
> org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at
> org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165) at
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at
> org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at
> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
>