You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Eddie Drapkin <oo...@gmail.com> on 2010/02/25 22:18:50 UTC

Text.encode failing during de-duplication

Hello,

I'm trying to upgrade from Nutch 0.9 to Nutch 1.0 and I've solved all of the
issues that I seem be having, except for one.

When I run a web crawl, everything fetches fine until it gets to dedup, in
which case, I get this stack trace:


2010-02-25 14:31:46,592 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.NullPointerException
        at org.apache.hadoop.io.Text.
encode(Text.java:388)
        at org.apache.hadoop.io.Text.set(Text.java:178)
        at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:191)
        at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:157)
        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
2010-02-25 14:31:47,328 FATAL indexer.DeleteDuplicates - DeleteDuplicates:
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1250)
        at
org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:448)
        at
org.apache.nutch.indexer.DeleteDuplicates.run(DeleteDuplicates.java:515)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java:499)


I'm running (I can't upgrade to 1.6) on a 1.5 JVM.  I've tried with a
version of hadoop that's old enough to run on 1.5 (0.18.3) and with a
version of hadoop (0.20.2) that a co-worker modified to build and run on
1.5, but is it possible that I can't upgrade until I can upgrade my JVM?
Maybe it's something else?  If there's any more information you need, let me
know, thanks!

Thanks,
Eddie

PS. Sorry if this gets sent twice, I tried to send before I subscribed to
this list.