You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by ".: Abishek :." <ab...@gmail.com> on 2011/02/10 04:18:14 UTC

-solr parameter in Crawl

Hi all,

 When do we use the -solr param for the nutch crawl? And is it a mandate
that solr should be running in the solr URL passed in the -solr?

 Should I be using it as,

 bin/nutch crawl ..... -solr http://127.0.0.1:8983/solr

or

 bin/nutch crawl -solr http://127.0.0.1:8983/solr <crawldb> <linkdb>
<segments>

 Thanks for your time.

Cheers,
Abi

Re: -solr parameter in Crawl

Posted by ".: Abishek :." <ab...@gmail.com>.

Thanks guys.

On Thu, Feb 10, 2011 at 9:49 PM, Estrada Groups <
estrada.adam.groups@gmail.com> wrote:

> I use the -solr and almost exclusively because it cuts out a lot of steps
> in the crawl process. Your command line would look something like bin/nutch
> crawl urls -depth 10 -threads 10 -topN 10 -solr http://localhost:8983/solr
>
> Nutch and Solr will both tell you if there are errors which usually have to
> do with field mismatching.
>
> Adam
>
> Sent from my iPhone
>
> On Feb 10, 2011, at 5:27 AM, "McGibbney, Lewis John" <
> Lewis.McGibbney@gcu.ac.uk> wrote:
>
> > Hi Abi,
> >
> > Nutch uses Lucene as the default mechanism when running the crawl
> command. I would be surprised if you did not receive some sort of error
> message when attempting to add a Solr param to a Nutch crawl.
> >
> > If you follow one of the online tutorials available you will find that
> final stage (solrindex) is a separate command
> >
> > Lewis
> > ________________________________________
> > From: .: Abishek :. [ab1sh3k@gmail.com]
> > Sent: 10 February 2011 03:18
> > To: user@nutch.apache.org
> > Subject: -solr parameter in Crawl
> >
> > Hi all,
> >
> > When do we use the -solr param for the nutch crawl? And is it a mandate
> > that solr should be running in the solr URL passed in the -solr?
> >
> > Should I be using it as,
> >
> > bin/nutch crawl ..... -solr http://127.0.0.1:8983/solr
> >
> > or
> >
> > bin/nutch crawl -solr http://127.0.0.1:8983/solr <crawldb> <linkdb>
> > <segments>
> >
> > Thanks for your time.
> >
> > Cheers,
> > Abi
> >
> > Email has been scanned for viruses by Altman Technologies' email
> management service - www.altman.co.uk/emailsystems
> >
> > Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
> >
> > Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> >
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
> >
> > Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
> >
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
>

Re: -solr parameter in Crawl

Posted by Estrada Groups <es...@gmail.com>.

I use the -solr and almost exclusively because it cuts out a lot of steps in the crawl process. Your command line would look something like bin/nutch crawl urls -depth 10 -threads 10 -topN 10 -solr http://localhost:8983/solr

Nutch and Solr will both tell you if there are errors which usually have to do with field mismatching.

Adam

Sent from my iPhone

On Feb 10, 2011, at 5:27 AM, "McGibbney, Lewis John" <Le...@gcu.ac.uk> wrote:

> Hi Abi,
> 
> Nutch uses Lucene as the default mechanism when running the crawl command. I would be surprised if you did not receive some sort of error message when attempting to add a Solr param to a Nutch crawl.
> 
> If you follow one of the online tutorials available you will find that final stage (solrindex) is a separate command
> 
> Lewis
> ________________________________________
> From: .: Abishek :. [ab1sh3k@gmail.com]
> Sent: 10 February 2011 03:18
> To: user@nutch.apache.org
> Subject: -solr parameter in Crawl
> 
> Hi all,
> 
> When do we use the -solr param for the nutch crawl? And is it a mandate
> that solr should be running in the solr URL passed in the -solr?
> 
> Should I be using it as,
> 
> bin/nutch crawl ..... -solr http://127.0.0.1:8983/solr
> 
> or
> 
> bin/nutch crawl -solr http://127.0.0.1:8983/solr <crawldb> <linkdb>
> <segments>
> 
> Thanks for your time.
> 
> Cheers,
> Abi
> 
> Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems
> 
> Glasgow Caledonian University is a registered Scottish charity, number SC021474
> 
> Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
> 
> Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

RE: -solr parameter in Crawl

Posted by "McGibbney, Lewis John" <Le...@gcu.ac.uk>.

Hi Abi,

Nutch uses Lucene as the default mechanism when running the crawl command. I would be surprised if you did not receive some sort of error message when attempting to add a Solr param to a Nutch crawl.

If you follow one of the online tutorials available you will find that final stage (solrindex) is a separate command

Lewis
________________________________________
From: .: Abishek :. [ab1sh3k@gmail.com]
Sent: 10 February 2011 03:18
To: user@nutch.apache.org
Subject: -solr parameter in Crawl

Hi all,

 When do we use the -solr param for the nutch crawl? And is it a mandate
that solr should be running in the solr URL passed in the -solr?

 Should I be using it as,

 bin/nutch crawl ..... -solr http://127.0.0.1:8983/solr

or

 bin/nutch crawl -solr http://127.0.0.1:8983/solr <crawldb> <linkdb>
<segments>

 Thanks for your time.

Cheers,
Abi

Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html