You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Eric Benavente <er...@semgine.com> on 2007/08/02 10:20:47 UTC

nutch stops randomly while crawling

OS: Linux
Nutch Version: 9.0
Running Nutch from Eclipse
Running org.apache.nutch.fetcher.Fetcher2 with Options
crawl/segments/20070801150452 -threads 10 and VMargs -Dhadoop.log.dir=logs
-Dhadoop.log.file=hadoop.log -Xmx1024M


Nutch stops to crawl after a few hours of succesfully crawling.

--schnipp---

2007-08-02 04:04:46,937 WARN  fs.FileSystem - Moving bad file
/tmp/hadoop-eb/mapred/local/reduce_areo58/map_0.out to
/tmp/bad_files/map_0.out.-783779377
2007-08-02 04:04:46,940 INFO  fs.FSInputChecker - Found checksum error:
org.apache.hadoop.fs.ChecksumException: Checksum error:
/tmp/hadoop-eb/mapred/local/reduce_areo58/map_0.out at 212930560
        at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verifySum(ChecksumFileSystem.java:254)
        at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:211)
        at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167)
        at
org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
        at java.io.DataInputStream.readFully(DataInputStream.java:176)
        at java.io.DataInputStream.readFully(DataInputStream.java:152)
        at
org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(SequenceFile.java:427)
        at
org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700(SequenceFile.java:414)
        at
org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java:1665)
        at
org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(SequenceFile.java:2579)
        at
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2351)
        at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:180)
        at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:149)
        at
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:41)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)

2007-08-02 04:04:46,940 WARN  mapred.LocalJobRunner - job_gk05gy
java.lang.NullPointerException
        at
org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java:74)
        at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121)
        at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:221)
        at
org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167)
        at
org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
        at java.io.DataInputStream.readFully(DataInputStream.java:176)
        at java.io.DataInputStream.readFully(DataInputStream.java:152)
        at
org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(SequenceFile.java:427)
        at
org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700(SequenceFile.java:414)
        at
org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java:1665)
        at
org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(SequenceFile.java:2579)
        at
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2351)
        at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:180)
        at
org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:149)
        at
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:41)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)

--schnapp---

The amount of crawled sites is allways different, sometimes 10.000, sometimes
40.000.

any Idea?

Thanks,

Eric