You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "shubham.gupta" <sh...@orkash.com> on 2017/03/08 07:14:00 UTC

All nutch jobs Failing | Nutch 2.3.1 + MongoDB

Hey

While I am running the whole process flow of Nutch i.e. 
Inject,Generate,Fetch,Parse,Update.

The following errors are being logged:

*Generator Job*

java.lang.Exception: java.lang.ClassCastException: 
org.bson.types.ObjectId cannot be cast to java.lang.String
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot 
be cast to java.lang.String
         at 
org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:34)
         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)
2017-03-07 15:28:07,696 ERROR crawl.GeneratorJob - GeneratorJob: 
java.lang.RuntimeException: job failed: name=[rss_new]generate: 
1488880683-1996901673, jobid=job_local78754654_0001
         at 
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:121)
         at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:233)
         at 
org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:262)
         at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:328)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
         at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:336)

*Fetcher Job:*

java.lang.Exception: java.lang.ClassCastException: 
org.bson.types.ObjectId cannot be cast to java.lang.String
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot 
be cast to java.lang.String
         at 
org.apache.nutch.fetcher.FetcherJob$FetcherMapper.map(FetcherJob.java:96)
         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)

*Parser Job:*

java.lang.Exception: java.lang.ClassCastException: 
org.bson.types.ObjectId cannot be cast to java.lang.String
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot 
be cast to java.lang.String
         at 
org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:80)
         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)

The plugin.folder directory specified in conf/nutch-site.xml is correct. 
And, when checked in code it point towards the line where the class is 
specified.

Like public class GeneratorMapper(). What changes need to be made in the 
configuration files.

-- 
Thanks and Regards,
Shubham Gupta


Re: All nutch jobs Failing | Nutch 2.3.1 + MongoDB

Posted by "shubham.gupta" <sh...@orkash.com>.
Hey

I was inserting the data in a table rss_webpage (webpage appended 
automatically by nutch), but when i changed the table to rss_one_webpage 
the error disappeared. Is this the reason behind Nutch or MongoDB.

Thanks and Regards,
Shubham Gupta

On Wednesday 08 March 2017 12:44 PM, shubham.gupta wrote:
> Hey
>
> While I am running the whole process flow of Nutch i.e. 
> Inject,Generate,Fetch,Parse,Update.
>
> The following errors are being logged:
>
> *Generator Job*
>
> java.lang.Exception: java.lang.ClassCastException: 
> org.bson.types.ObjectId cannot be cast to java.lang.String
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.ClassCastException: org.bson.types.ObjectId 
> cannot be cast to java.lang.String
>         at 
> org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:34)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>         at 
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-03-07 15:28:07,696 ERROR crawl.GeneratorJob - GeneratorJob: 
> java.lang.RuntimeException: job failed: name=[rss_new]generate: 
> 1488880683-1996901673, jobid=job_local78754654_0001
>         at 
> org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:121)
>         at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:233)
>         at 
> org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:262)
>         at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:328)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at 
> org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:336)
>
> *Fetcher Job:*
>
> java.lang.Exception: java.lang.ClassCastException: 
> org.bson.types.ObjectId cannot be cast to java.lang.String
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.ClassCastException: org.bson.types.ObjectId 
> cannot be cast to java.lang.String
>         at 
> org.apache.nutch.fetcher.FetcherJob$FetcherMapper.map(FetcherJob.java:96)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>         at 
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> *Parser Job:*
>
> java.lang.Exception: java.lang.ClassCastException: 
> org.bson.types.ObjectId cannot be cast to java.lang.String
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.ClassCastException: org.bson.types.ObjectId 
> cannot be cast to java.lang.String
>         at 
> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:80)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>         at 
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> The plugin.folder directory specified in conf/nutch-site.xml is 
> correct. And, when checked in code it point towards the line where the 
> class is specified.
>
> Like public class GeneratorMapper(). What changes need to be made in 
> the configuration files.
>