You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Karsten Dello <de...@mi.fu-berlin.de> on 2006/12/06 02:24:57 UTC

Problem with fetching

Hello,

I have a strange problem generating fetchlists,
maybe someone can point in the right direction?

I do a couple of inject/generate/fetch/update-cycles to crawl a
defined subgraph.
last cycle approx. 600000 docs should be fetched, but only 150000 are
actually fetched.

The last thing I see in the log file is

2006-12-04 14:24:14,221 INFO  fetcher.Fetcher - fetch of
http://www.microbes.info/forums/index.php?s=7179874ada709ad4d9874517f2790ef0&
failed with: java.lang.NullPointerEx$
2006-12-04 14:24:14,238 FATAL fetcher.Fetcher - java.lang.NullPointerException
2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:198)
2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:189)
2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:91)
2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:314)
2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:232)
2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - fetcher
caught:java.lang.NullPointerException


Then nothing happens; approx. 10 minutes later map reduce comes up with
2006-12-04 14:33:06,169 INFO  mapred.LocalJobRunner - reduce > sort
2006-12-04 14:33:07,586 INFO  mapred.JobClient -  map 100%  reduce 33%