You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jorge Luis Betancourt González <jl...@uci.cu> on 2015/06/16 14:15:26 UTC

Re: [MASSMAIL]Integrating nutch 1.10 with Solr 5.2.0

This looks like may be some misconfiguration regarding the schema.xml file in solr, can you take a look at the solr logs? 

Regards,

----- Original Message -----
From: "kunal chakma" <kc...@gmail.com>
To: user@nutch.apache.org
Sent: Sunday, June 14, 2015 9:39:33 PM
Subject: [MASSMAIL]Integrating nutch 1.10 with Solr 5.2.0

Hi,
     I am very new to the nutch and solr plateform. I have been trying a
lot to integrate Solr 5.2.0 with nutch 1.10 but not able to do so. I have
followed all the steps mentioned at nutch 1.x tutorial page but when I
execute the following command ,

bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb
crawl/linkdb/ crawl/segments/20150613164847/ -filter -normalize

I get the following errors
Indexer: starting at 2015-06-14 19:05:28
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default
solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : username for authentication
solr.auth.password : password for authentication


Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:113)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:177)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:187)

Please help me in resolving the issue.


 *With regards,*

*KUNAL CHAKMA*
Computer Science & Engineering Department
National Institute of Technology Agartala
Jirania-799055,
Agartala,Tripura
India

 Signature powered by
<http://r1.wisestamp.com/r/landing?promo=4&dest=http%3A%2F%2Fwww.wisestamp.com%2Femail-install%3Futm_source%3Dextension%26utm_medium%3Demail%26utm_campaign%3Dpromo_4>
WiseStamp
<http://r1.wisestamp.com/r/landing?promo=4&dest=http%3A%2F%2Fwww.wisestamp.com%2Femail-install%3Futm_source%3Dextension%26utm_medium%3Demail%26utm_campaign%3Dpromo_4>

Re: [MASSMAIL]Integrating nutch 1.10 with Solr 5.2.0

Posted by Ankit Goel <an...@gmail.com>.
Hi,
this wont be very helpful but it might expedite things a bit. The same way
nutch 1.x is quite diff from nutch 2.x, similarly solr 4.x is diff from
5.x. First off the file structure is different. All tutorials talk about
the "example" folder, but you wont find the requisite files in 5.x. Also in
4.x we use the default collection1 when starting a solr instance, but in
5.x we have to specify it solr/gettingstarted/shard. So the location of
your schema.xml is also something you might have to look into.

basically if you need to implement n get on the road fast, use the 4.x
version of solr, till there is a proper tutorial or someone posts an answer
here.

On Tue, Jun 16, 2015 at 5:45 PM, Jorge Luis Betancourt González <
jlbetancourt@uci.cu> wrote:

> This looks like may be some misconfiguration regarding the schema.xml file
> in solr, can you take a look at the solr logs?
>
> Regards,
>
> ----- Original Message -----
> From: "kunal chakma" <kc...@gmail.com>
> To: user@nutch.apache.org
> Sent: Sunday, June 14, 2015 9:39:33 PM
> Subject: [MASSMAIL]Integrating nutch 1.10 with Solr 5.2.0
>
> Hi,
>      I am very new to the nutch and solr plateform. I have been trying a
> lot to integrate Solr 5.2.0 with nutch 1.10 but not able to do so. I have
> followed all the steps mentioned at nutch 1.x tutorial page but when I
> execute the following command ,
>
> bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb
> crawl/linkdb/ crawl/segments/20150613164847/ -filter -normalize
>
> I get the following errors
> Indexer: starting at 2015-06-14 19:05:28
> Indexer: deleting gone documents: false
> Indexer: URL filtering: false
> Indexer: URL normalizing: false
> Active IndexWriters :
> SOLRIndexWriter
> solr.server.url : URL of the SOLR instance (mandatory)
> solr.commit.size : buffer size when sending to SOLR (default 1000)
> solr.mapping.file : name of the mapping file for fields (default
> solrindex-mapping.xml)
> solr.auth : use authentication (default false)
> solr.auth.username : username for authentication
> solr.auth.password : password for authentication
>
>
> Indexer: java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:113)
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:177)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:187)
>
> Please help me in resolving the issue.
>
>
>  *With regards,*
>
> *KUNAL CHAKMA*
> Computer Science & Engineering Department
> National Institute of Technology Agartala
> Jirania-799055,
> Agartala,Tripura
> India
>
>  Signature powered by
> <
> http://r1.wisestamp.com/r/landing?promo=4&dest=http%3A%2F%2Fwww.wisestamp.com%2Femail-install%3Futm_source%3Dextension%26utm_medium%3Demail%26utm_campaign%3Dpromo_4
> >
> WiseStamp
> <
> http://r1.wisestamp.com/r/landing?promo=4&dest=http%3A%2F%2Fwww.wisestamp.com%2Femail-install%3Futm_source%3Dextension%26utm_medium%3Demail%26utm_campaign%3Dpromo_4
> >
>



-- 
Regards,
Ankit Goel
http://about.me/ankitgoel