You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "Corey, Stephen" <CO...@ecu.edu> on 2016/01/05 17:13:08 UTC

Nutch with Solrcloud 5

Has anyone gotten Nutch (preferably 1.11, but any version would be fine) to index data to Solr 5 running in cloud mode? I keep getting the message:

Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:222)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:231)


And in my Hadoop.log, I see:

....
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.client.solrj.SolrServerException: No collection param specified on request and no default collection has been set.
        at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:292)
        at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
        at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
        at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153)
        ... 11 more


I am definitely specifying the collection name in the URL. I normally use the bin/crawl command, but I can also replicate this by the individual command:

bin/nutch index -Dsolr.server.url=http://localhost/solr/gettingstarted -Dsolr.server.type=cloud -Dsolr.zookeeper.url=localhost:9983 ecutest/crawldb -linkdb ecutest/linkdb ecutest/segments/20160104103038


Any ideas?