You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Dan Cox <da...@speakeasy.net> on 2012/01/18 21:25:46 UTC

nutch 1.4/hadoop 1.0 can't find class: org.apache.nutch.protocol.ProtocolStatus

Nutch Users,

I'm attempting to run a crawl using nutch 1.4 deployed to hadoop 1.0 (single
server for now).

I'm able to crawl using the local runtime, but when using the deploy
runtime, nutch spews the below,

12/01/18 13:17:04 WARN mapred.LocalJobRunner: job_local_0005
java.lang.RuntimeException: problem advancing post rec#0
        at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTa
sk.java:249)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.jav
a:245)
        at
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:40)
        at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.io.IOException: can't find class:
org.apache.nutch.protocol.ProtocolStatus because
org.apache.nutch.protocol.ProtocolStatus
        at
org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java
:204)
        at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146)
        at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278)
        at
org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritable
Configurable.java:54)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
eserialize(WritableSerialization.java:67)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d
eserialize(WritableSerialization.java:40)
        at
org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1271)
        at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1211)
        ... 6 more
12/01/18 13:17:05 INFO mapred.JobClient:  map 100% reduce 0%
12/01/18 13:17:05 INFO mapred.JobClient: Job complete: job_local_0005
12/01/18 13:17:05 INFO mapred.JobClient: Counters: 22
12/01/18 13:17:05 INFO mapred.JobClient:   File Input Format Counters
12/01/18 13:17:05 INFO mapred.JobClient:     Bytes Read=257
12/01/18 13:17:05 INFO mapred.JobClient:   FileSystemCounters
12/01/18 13:17:05 INFO mapred.JobClient:     FILE_BYTES_READ=191387911
12/01/18 13:17:05 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=193193843
12/01/18 13:17:05 INFO mapred.JobClient:   FetcherStatus
12/01/18 13:17:05 INFO mapred.JobClient:     temp_moved=1
12/01/18 13:17:05 INFO mapred.JobClient:     success=1
12/01/18 13:17:05 INFO mapred.JobClient:   Map-Reduce Framework
12/01/18 13:17:05 INFO mapred.JobClient:     Map output materialized
bytes=60191
12/01/18 13:17:05 INFO mapred.JobClient:     Map input records=2
12/01/18 13:17:05 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/01/18 13:17:05 INFO mapred.JobClient:     Spilled Records=5
12/01/18 13:17:05 INFO mapred.JobClient:     Map output bytes=60169
12/01/18 13:17:05 INFO mapred.JobClient:     Total committed heap usage
(bytes)=756613120
12/01/18 13:17:05 INFO mapred.JobClient:     CPU time spent (ms)=0
12/01/18 13:17:05 INFO mapred.JobClient:     Map input bytes=159
12/01/18 13:17:05 INFO mapred.JobClient:     SPLIT_RAW_BYTES=134
12/01/18 13:17:05 INFO mapred.JobClient:     Combine input records=0
12/01/18 13:17:05 INFO mapred.JobClient:     Reduce input records=0
12/01/18 13:17:05 INFO mapred.JobClient:     Reduce input groups=0
12/01/18 13:17:05 INFO mapred.JobClient:     Combine output records=0
12/01/18 13:17:05 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=0
12/01/18 13:17:05 INFO mapred.JobClient:     Reduce output records=0
12/01/18 13:17:05 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=0
12/01/18 13:17:05 INFO mapred.JobClient:     Map output records=5
12/01/18 13:17:05 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1204)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:136)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57
)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Googling the error pulled up this bug
https://issues.apache.org/jira/browse/NUTCH-1084. 

Has anybody had success running nutch 1.4 on hadoop 1.0, or is this
configuration "incompatible"?

Thanks in advance.

Dan

Re: nutch 1.4/hadoop 1.0 can't find class: org.apache.nutch.protocol.ProtocolStatus

Posted by Markus Jelsma <ma...@openindex.io>.

> Nutch Users,
> 
> I'm attempting to run a crawl using nutch 1.4 deployed to hadoop 1.0
> (single server for now).
> 
> I'm able to crawl using the local runtime, but when using the deploy
> runtime, nutch spews the below,
> 
> 12/01/18 13:17:04 WARN mapred.LocalJobRunner: job_local_0005
> java.lang.RuntimeException: problem advancing post rec#0
>         at
> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214) at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceT
> a sk.java:249)
>         at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.ja
> v a:245)
>         at
> org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:40
> ) at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> Caused by: java.io.IOException: can't find class:
> org.apache.nutch.protocol.ProtocolStatus because
> org.apache.nutch.protocol.ProtocolStatus
>         at
> org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.jav
> a
> 
> :204)
> 
>         at
> org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146) at
> org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278) at
> org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritabl
> e Configurable.java:54)
>         at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.
> d eserialize(WritableSerialization.java:67)
>         at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.
> d eserialize(WritableSerialization.java:40)
>         at
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1271)
>         at
> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1211) ... 6
> more
> 12/01/18 13:17:05 INFO mapred.JobClient:  map 100% reduce 0%
> 12/01/18 13:17:05 INFO mapred.JobClient: Job complete: job_local_0005
> 12/01/18 13:17:05 INFO mapred.JobClient: Counters: 22
> 12/01/18 13:17:05 INFO mapred.JobClient:   File Input Format Counters
> 12/01/18 13:17:05 INFO mapred.JobClient:     Bytes Read=257
> 12/01/18 13:17:05 INFO mapred.JobClient:   FileSystemCounters
> 12/01/18 13:17:05 INFO mapred.JobClient:     FILE_BYTES_READ=191387911
> 12/01/18 13:17:05 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=193193843
> 12/01/18 13:17:05 INFO mapred.JobClient:   FetcherStatus
> 12/01/18 13:17:05 INFO mapred.JobClient:     temp_moved=1
> 12/01/18 13:17:05 INFO mapred.JobClient:     success=1
> 12/01/18 13:17:05 INFO mapred.JobClient:   Map-Reduce Framework
> 12/01/18 13:17:05 INFO mapred.JobClient:     Map output materialized
> bytes=60191
> 12/01/18 13:17:05 INFO mapred.JobClient:     Map input records=2
> 12/01/18 13:17:05 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 12/01/18 13:17:05 INFO mapred.JobClient:     Spilled Records=5
> 12/01/18 13:17:05 INFO mapred.JobClient:     Map output bytes=60169
> 12/01/18 13:17:05 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=756613120
> 12/01/18 13:17:05 INFO mapred.JobClient:     CPU time spent (ms)=0
> 12/01/18 13:17:05 INFO mapred.JobClient:     Map input bytes=159
> 12/01/18 13:17:05 INFO mapred.JobClient:     SPLIT_RAW_BYTES=134
> 12/01/18 13:17:05 INFO mapred.JobClient:     Combine input records=0
> 12/01/18 13:17:05 INFO mapred.JobClient:     Reduce input records=0
> 12/01/18 13:17:05 INFO mapred.JobClient:     Reduce input groups=0
> 12/01/18 13:17:05 INFO mapred.JobClient:     Combine output records=0
> 12/01/18 13:17:05 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=0
> 12/01/18 13:17:05 INFO mapred.JobClient:     Reduce output records=0
> 12/01/18 13:17:05 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=0
> 12/01/18 13:17:05 INFO mapred.JobClient:     Map output records=5
> 12/01/18 13:17:05 INFO mapred.JobClient: Job Failed: NA
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
>         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1204)
>         at org.apache.nutch.crawl.Crawl.run(Crawl.java:136)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:5
> 7 )
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp
> l .java:43)
>         at java.lang.reflect.Method.invoke(Method.java:601)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> Googling the error pulled up this bug
> https://issues.apache.org/jira/browse/NUTCH-1084.

This is not likely the same error although it may look like it. We've still 
not figured out what the exact problem with protocolstatus is.
> 
> Has anybody had success running nutch 1.4 on hadoop 1.0, or is this
> configuration "incompatible"?

No, they work fine together but you say you are trying the deployed job file 
but the stack trace show the local job runner. I think you didn't submit the 
job to hadoop.

> 
> Thanks in advance.
> 
> Dan