You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by John Lafitte <jl...@brandextract.com> on 2014/06/24 09:30:40 UTC

File not found error

Using Nutch 1.7

Out of the blue all of my crawl jobs started failing a few days ago.  I
checked the user logs and nobody logged into the server and there were no
reboots or any other obvious issues.  There is plenty of disk space.  Here
is the error I'm getting, any help is appreciated:

Injector: starting at 2014-06-24 07:26:54
Injector: crawlDb: di/crawl/crawldb
Injector: urlDir: di/urls
Injector: Converting injected urls to crawl db entries.
Injector: ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:701)
 at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:656)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
 at
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
 at
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
 at org.apache.nutch.crawl.Injector.run(Injector.java:318)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.nutch.crawl.Injector.main(Injector.java:308)

Re: File not found error

Posted by John Lafitte <jl...@brandextract.com>.
Okay, I got it working again.  Not sure exactly what happened, but fsck
didn't help.  I noticed the last line showed "native method" so moved the
native binaries out of the /lib folder.  Lo and behold, the next time I ran
it, it used the java libs and displayed the filename it was having a
problem with.  It
was /tmp/hadoop-root/mapred/staging/root850517656/.staging so given that I
just went and moved the /tmp/hadoop-root directory and then it started
working again.  Permissions looked fine, so it might have just been corrupt.

Thanks for the help!


On Tue, Jun 24, 2014 at 9:03 PM, John Lafitte <jl...@brandextract.com>
wrote:

> Well I'm just using nutch in local mode, no hdfs (as far as I know)...  My
> latest thing is trying to determine if there is a filesystem issue.  It's
> not really clear what file is not found.  I have about 10 different
> configs, this is just one of them and they all have the urls folder.  The
> script worked for quite a while before this just started happening on it's
> own.  That's why I'm suspecting a filesystem error.
>
>
> On Tue, Jun 24, 2014 at 6:53 PM, kaveh minooie <ka...@plutoz.com> wrote:
>
>> you might want to check to see if
>>
>> > Injector: urlDir: di/urls
>>
>> still exist in your hdfs.
>>
>>
>>
>>
>> On 06/24/2014 12:30 AM, John Lafitte wrote:
>>
>>> Using Nutch 1.7
>>>
>>> Out of the blue all of my crawl jobs started failing a few days ago.  I
>>> checked the user logs and nobody logged into the server and there were no
>>> reboots or any other obvious issues.  There is plenty of disk space.
>>>  Here
>>> is the error I'm getting, any help is appreciated:
>>>
>>> Injector: starting at 2014-06-24 07:26:54
>>> Injector: crawlDb: di/crawl/crawldb
>>> Injector: urlDir: di/urls
>>> Injector: Converting injected urls to crawl db entries.
>>> Injector: ENOENT: No such file or directory
>>> at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
>>> at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:701)
>>>   at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:656)
>>> at
>>> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(
>>> RawLocalFileSystem.java:514)
>>>   at
>>> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(
>>> RawLocalFileSystem.java:349)
>>> at org.apache.hadoop.fs.FilterFileSystem.mkdirs(
>>> FilterFileSystem.java:193)
>>>   at
>>> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(
>>> JobSubmissionFiles.java:126)
>>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
>>>   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>   at javax.security.auth.Subject.doAs(Subject.java:416)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(
>>> UserGroupInformation.java:1190)
>>>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(
>>> JobClient.java:936)
>>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
>>>   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
>>> at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
>>>   at org.apache.nutch.crawl.Injector.run(Injector.java:318)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>   at org.apache.nutch.crawl.Injector.main(Injector.java:308)
>>>
>>>
>> --
>> Kaveh Minooie
>>
>
>

Re: File not found error

Posted by John Lafitte <jl...@brandextract.com>.
Well I'm just using nutch in local mode, no hdfs (as far as I know)...  My
latest thing is trying to determine if there is a filesystem issue.  It's
not really clear what file is not found.  I have about 10 different
configs, this is just one of them and they all have the urls folder.  The
script worked for quite a while before this just started happening on it's
own.  That's why I'm suspecting a filesystem error.


On Tue, Jun 24, 2014 at 6:53 PM, kaveh minooie <ka...@plutoz.com> wrote:

> you might want to check to see if
>
> > Injector: urlDir: di/urls
>
> still exist in your hdfs.
>
>
>
>
> On 06/24/2014 12:30 AM, John Lafitte wrote:
>
>> Using Nutch 1.7
>>
>> Out of the blue all of my crawl jobs started failing a few days ago.  I
>> checked the user logs and nobody logged into the server and there were no
>> reboots or any other obvious issues.  There is plenty of disk space.  Here
>> is the error I'm getting, any help is appreciated:
>>
>> Injector: starting at 2014-06-24 07:26:54
>> Injector: crawlDb: di/crawl/crawldb
>> Injector: urlDir: di/urls
>> Injector: Converting injected urls to crawl db entries.
>> Injector: ENOENT: No such file or directory
>> at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
>> at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:701)
>>   at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:656)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(
>> RawLocalFileSystem.java:514)
>>   at
>> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(
>> RawLocalFileSystem.java:349)
>> at org.apache.hadoop.fs.FilterFileSystem.mkdirs(
>> FilterFileSystem.java:193)
>>   at
>> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(
>> JobSubmissionFiles.java:126)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
>>   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>> at java.security.AccessController.doPrivileged(Native Method)
>>   at javax.security.auth.Subject.doAs(Subject.java:416)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1190)
>>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(
>> JobClient.java:936)
>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
>>   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
>> at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
>>   at org.apache.nutch.crawl.Injector.run(Injector.java:318)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>   at org.apache.nutch.crawl.Injector.main(Injector.java:308)
>>
>>
> --
> Kaveh Minooie
>

Re: File not found error

Posted by kaveh minooie <ka...@plutoz.com>.
you might want to check to see if

 > Injector: urlDir: di/urls

still exist in your hdfs.



On 06/24/2014 12:30 AM, John Lafitte wrote:
> Using Nutch 1.7
>
> Out of the blue all of my crawl jobs started failing a few days ago.  I
> checked the user logs and nobody logged into the server and there were no
> reboots or any other obvious issues.  There is plenty of disk space.  Here
> is the error I'm getting, any help is appreciated:
>
> Injector: starting at 2014-06-24 07:26:54
> Injector: crawlDb: di/crawl/crawldb
> Injector: urlDir: di/urls
> Injector: Converting injected urls to crawl db entries.
> Injector: ENOENT: No such file or directory
> at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
> at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:701)
>   at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:656)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
>   at
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
> at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
>   at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
>   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
> at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
>   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
>   at org.apache.nutch.crawl.Injector.run(Injector.java:318)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at org.apache.nutch.crawl.Injector.main(Injector.java:308)
>

-- 
Kaveh Minooie