You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tom Chiverton <tc...@extravision.com> on 2016/10/14 13:30:39 UTC

Nutch 2, Solr 5 - solrdedup causes ClassCastException:

I've tried using both Solr 6 and 5 with the latest Nutch 2, and with 
both I am getting an error from Nutch's bin/crawl.

mnt/nutch/nutch/runtime/local/bin/nutch solrdedup -D 
mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D 
mapred.reduce.tasks.speculative.execution=false -D 
mapred.map.tasks.speculative.execution=false -D 
mapred.compress.map.output=true http://localhost:8983/solr/nutch
Exception in thread "main" java.lang.RuntimeException: job failed: 
name=apache-nutch-2.3.1.jar, jobid=job_local2123017879_0001
         at 
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
         at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:383)
         at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.run(SolrDeleteDuplicates.java:393)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
         at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.main(SolrDeleteDuplicates.java:403)
Error running:
   /mnt/nutch/nutch/runtime/local/bin/nutch solrdedup -D 
mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D 
mapred.reduce.tasks.speculative.execution=false -D 
mapred.map.tasks.speculative.execution=false -D 
mapred.compress.map.output=true http://localhost:8983/solr/nutch
Failed with exit value 1.

hadoop.log says

java.lang.Exception: java.lang.ClassCastException: java.util.ArrayList 
cannot be cast to java.lang.String
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be 
cast to java.lang.String
         at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrRecordReader.nextKeyValue(SolrDeleteDuplicates.java:233)
         at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
         at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
         at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)

Which appears to be related to the digest field somehow...

Is this a known bug ? Do I need a particular version of Nutch with a 
particular Solr or something ?
-- 
*Tom Chiverton*
Lead Developer
e: 	tc@extravision.com <ma...@extravision.com>
p: 	0161 817 2922
t: 	@extravision <http://www.twitter.com/extravision>
w: 	www.extravision.com <http://www.extravision.com/>

Extravision - email worth seeing <http://www.extravision.com/>
Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, 
Manchester, M15 4LD.
Company Reg No: 0\u200c\u200c5017214 VAT: GB 8\u200c\u200c24 5386 19

This e-mail is intended solely for the person to whom it is addressed 
and may contain confidential or privileged information.
Any views or opinions presented in this e-mail are solely of the author 
and do not necessarily represent those of Extravision Ltd.


Re: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

Posted by Tom Chiverton <tc...@extravision.com>.
Where would this be configured ? I'm creating the solr core by just doing

"solr/bin/solr create_core -c nutch"

should I be feeding it a special schema file somehow ?

Tom


On 14/10/16 14:39, Markus Jelsma wrote:
> Your digest field is configured as multi valued, which should not be the case.


RE: Nutch 2, Solr 5 - solrdedup causes ClassCastException:

Posted by Markus Jelsma <ma...@openindex.io>.
According to the source:
https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java#L233

Your digest field is configured as multi valued, which should not be the case.

M.

 
 
-----Original message-----
> From:Tom Chiverton <tc...@extravision.com>
> Sent: Friday 14th October 2016 15:31
> To: user@nutch.apache.org
> Subject: Nutch 2, Solr 5 - solrdedup causes ClassCastException:
> 
>     I've tried using both Solr 6 and 5 with the latest Nutch 2, and with
 
>     both I am getting an error from Nutch's bin/crawl. 
 
>     
 
>     mnt/nutch/nutch/runtime/local/bin/nutch solrdedup -D
 
>     mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D
 
>     mapred.reduce.tasks.speculative.execution=false -D
 
>     mapred.map.tasks.speculative.execution=false -D
 
>     mapred.compress.map.output=true http://localhost:8983/solr/nutch <http://localhost:8983/solr/nutch>
 
>     Exception in thread "main" java.lang.RuntimeException: job failed:
 
>     name=apache-nutch-2.3.1.jar, jobid=job_local2123017879_0001
 
>             at
 
>     org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
 
>             at
 
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:383)
 
>             at
 
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.run(SolrDeleteDuplicates.java:393)
 
>             at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 
>             at
 
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.main(SolrDeleteDuplicates.java:403)
 
>     Error running:
 
>       /mnt/nutch/nutch/runtime/local/bin/nutch solrdedup -D
 
>     mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D
 
>     mapred.reduce.tasks.speculative.execution=false -D
 
>     mapred.map.tasks.speculative.execution=false -D
 
>     mapred.compress.map.output=true http://localhost:8983/solr/nutch <http://localhost:8983/solr/nutch>
 
>     Failed with exit value 1.
 
>     
 
>     hadoop.log says
 
>     
 
>     java.lang.Exception: java.lang.ClassCastException:
 
>     java.util.ArrayList cannot be cast to java.lang.String
 
>             at
 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
 
>             at
 
>     org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
 
>     Caused by: java.lang.ClassCastException: java.util.ArrayList cannot
 
>     be cast to java.lang.String
 
>             at
 
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrRecordReader.nextKeyValue(SolrDeleteDuplicates.java:233)
 
>             at
 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
 
>             at
 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 
>             at
 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 
>             at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 
>             at
 
>     org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 
>             at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 
>             at
 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 
>             at
 
>     java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 
>             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 
>             at
 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 
>             at
 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 
>             at java.lang.Thread.run(Thread.java:745)
 
>     
 
>     Which appears to be related to the digest field somehow...
 
>     
 
>     Is this a known bug ? Do I need a particular version of Nutch with a
 
>     particular Solr or something ?
 
> -- 
 
>               
>               
>               
>               
> Tom Chiverton
 
>                 Lead Developer
 
>               
>               
 
>                       
> e: 
 
>                       
> tc@extravision.com <ma...@extravision.com>
 
>                       
> p: 
 
>                       
> 0161 817 2922
 
>                       
> t: 
 
>                       
> @extravision <http://www.twitter.com/extravision>
 
>                       
> w: 
 
>                       
> www.extravision.com <http://www.extravision.com/>
 
>               
>               
>  <http://www.extravision.com/>
 
>               
>               
>  Registered in the UK at: 107 Timber Wharf, 33 Worsley
 
>                   Street, Manchester, M15 4LD.
 
>                 Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
 
>                 
 
>                 This e-mail is intended solely for the person to whom it
 
>                 is addressed and may contain confidential or privileged
 
>                 information.
 
>                 Any views or opinions presented in this e-mail are
 
>                 solely of the author and do not necessarily represent
 
>                 those of Extravision Ltd. 
 
>