You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rum Raisin <ru...@yahoo.com> on 2011/10/28 03:28:53 UTC

Integrating nutch crawl into solr

Hi,
I'm running nutch 1.3 and solr 3.4. Both newly installed.  
I ran a crawl which seems successful as I can see some data retrieved...


bin/nutch crawl urls -dir crawl -depth 3 -topN 20


Then I copied the default config/schema.xml file from nutch to solr's example/solr/conf directory. Restarted solr.

Then to put the crawl data into solr I ran below command...

bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl crawl/linkdb crawl/segments/*


It gave me an error about a missing "current" directory. So I manually created that. And ran again.
The 2nd time I ran it, there were no errors. But it ran quickly. So I go into my solr admin panel and the statistics show maxDocs=0 and numDocs=0.
Also did a *:* query but got 0 results.
So it looks like nothing got imported into solr.  I was following the tutorial here: http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch
Help what am I doing wrong? Why can't I get the nutch data into solr? Thanks.

Re: Integrating nutch crawl into solr

Posted by Rum Raisin <ru...@yahoo.com>.
Thanks I resolved it. Was due to wrongly specified crawldb directory. The tutorial had it like this... Is this a typo in the tutorial?
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawldb crawldb/linkdb crawldb/segments/*
I changed "crawldb" to "crawldb/crawldb" so that the crawldb, linkdb, segments directories are on the same level like they are by default.

________________________________
From: lewis john mcgibbney <le...@gmail.com>
To: user@nutch.apache.org; Rum Raisin <ru...@yahoo.com>
Sent: Friday, October 28, 2011 2:20 AM
Subject: Re: Integrating nutch crawl into solr


Please check your Hadoop.log and solr logs for related clues
 
The current directory should not be created manually, this should be a result from Nutch related task executions.


On Fri, Oct 28, 2011 at 3:28 AM, Rum Raisin <ru...@yahoo.com> wrote:

Hi,
>I'm running nutch 1.3 and solr 3.4. Both newly installed.  
>I ran a crawl which seems successful as I can see some data retrieved...
>
>
>bin/nutch crawl urls -dir crawl -depth 3 -topN 20
>
>
>Then I copied the default config/schema.xml file from nutch to solr's example/solr/conf directory. Restarted solr.
>
>Then to put the crawl data into solr I ran below command...
>
>bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl crawl/linkdb crawl/segments/*
>
>
>It gave me an error about a missing "current" directory. So I manually created that. And ran again.
>The 2nd time I ran it, there were no errors. But it ran quickly. So I go into my solr admin panel and the statistics show maxDocs=0 and numDocs=0.
>Also did a *:* query but got 0 results.
>So it looks like nothing got imported into solr.  I was following the tutorial here: http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch
>Help what am I doing wrong? Why can't I get the nutch data into solr? Thanks.


-- 
Lewis 

Re: Integrating nutch crawl into solr

Posted by lewis john mcgibbney <le...@gmail.com>.
Please check your Hadoop.log and solr logs for related clues

The current directory should not be created manually, this should be a
result from Nutch related task executions.

On Fri, Oct 28, 2011 at 3:28 AM, Rum Raisin <ru...@yahoo.com> wrote:

> Hi,
> I'm running nutch 1.3 and solr 3.4. Both newly installed.
> I ran a crawl which seems successful as I can see some data retrieved...
>
>
> bin/nutch crawl urls -dir crawl -depth 3 -topN 20
>
>
> Then I copied the default config/schema.xml file from nutch to solr's
> example/solr/conf directory. Restarted solr.
>
> Then to put the crawl data into solr I ran below command...
>
> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl crawl/linkdb
> crawl/segments/*
>
>
> It gave me an error about a missing "current" directory. So I manually
> created that. And ran again.
> The 2nd time I ran it, there were no errors. But it ran quickly. So I go
> into my solr admin panel and the statistics show maxDocs=0 and numDocs=0.
> Also did a *:* query but got 0 results.
> So it looks like nothing got imported into solr.  I was following the
> tutorial here:
> http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch
> Help what am I doing wrong? Why can't I get the nutch data into solr?
> Thanks.




-- 
*Lewis*