You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lukas Vlcek <lu...@gmail.com> on 2007/01/10 17:29:30 UTC

nutch-0.9 trunk is failing in Indexer

Hi,

I am using Nutch trunk version (493556) and it is failing in Indexer.

java.io.IOException: Not a file:
/nutch/nutchcrawl/segments/20070110171621/crawl_fetch/part-00000/data
        at org.apache.hadoop.mapred.InputFormatBase.getSplits(
InputFormatBase.java:125)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
LocalJobRunner.java:93)
Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
        at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)

Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
        at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)

Also I noticed that there were some issues during parsing (which run prior
to indexing). The following is what I got when I allowed finer logging:

Moving bad file
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data to
/bad_files/data.1751375967
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/content/part-00000/data error
: Checksum error:
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data
at 0
 map 100% reduce 0%
Moving bad file
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/part-00000
to /bad_files/part-00000.377330604
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/crawl_parse/part-00000
error : Checksum error:
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/part-00000
at 0
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/content/part-00000/index
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/index:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_data/part-00000/index
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/index:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/crawl_fetch/part-00000/data
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_text/part-00000/index
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/index:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_text/part-00000/data
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_data/part-00000/data
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/crawl_generate/part-00000
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/part-00000:
No such file or directory

I am not sure if this can cause the IOException described above. Does
anybody know what I did incorrectly?

Regards,
Lukas