You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Eric Benavente <er...@semgine.com> on 2007/08/02 10:20:47 UTC
nutch stops randomly while crawling
OS: Linux
Nutch Version: 9.0
Running Nutch from Eclipse
Running org.apache.nutch.fetcher.Fetcher2 with Options
crawl/segments/20070801150452 -threads 10 and VMargs -Dhadoop.log.dir=logs
-Dhadoop.log.file=hadoop.log -Xmx1024M
Nutch stops to crawl after a few hours of succesfully crawling.
--schnipp---
2007-08-02 04:04:46,937 WARN fs.FileSystem - Moving bad file
/tmp/hadoop-eb/mapred/local/reduce_areo58/map_0.out to
/tmp/bad_files/map_0.out.-783779377
2007-08-02 04:04:46,940 INFO fs.FSInputChecker - Found checksum error:
org.apache.hadoop.fs.ChecksumException: Checksum error:
/tmp/hadoop-eb/mapred/local/reduce_areo58/map_0.out at 212930560
at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verifySum(ChecksumFileSystem.java:254)
at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:211)
at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167)
at
org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at java.io.DataInputStream.readFully(DataInputStream.java:176)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at
org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(SequenceFile.java:427)
at
org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700(SequenceFile.java:414)
at
org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java:1665)
at
org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(SequenceFile.java:2579)
at
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2351)
at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:180)
at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:149)
at
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:41)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)
2007-08-02 04:04:46,940 WARN mapred.LocalJobRunner - job_gk05gy
java.lang.NullPointerException
at
org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java:74)
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121)
at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:221)
at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167)
at
org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at java.io.DataInputStream.readFully(DataInputStream.java:176)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at
org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(SequenceFile.java:427)
at
org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700(SequenceFile.java:414)
at
org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java:1665)
at
org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(SequenceFile.java:2579)
at
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2351)
at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:180)
at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:149)
at
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:41)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)
--schnapp---
The amount of crawled sites is allways different, sometimes 10.000, sometimes
40.000.
any Idea?
Thanks,
Eric