You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Spadez <ja...@hotmail.com> on 2012/02/24 15:47:24 UTC

Nutch AND Solr? Nutch performance and features

Hi.

I am effectively trying to make a search engine site, which scrapes data,
collects it and then makes it searchable. I have decided on SOLR for my
search system, but my plan was to try and make a scrapper using PHP.

However, having found nutch, it seems like this might be something worth
looking at. Firstly, is nutch simply a web scrapper or does it integrate
other aspects of lucene as well? Im wondering if I would need to install
Nutch and SOLR together, or if Nutch integrates the search system as well.

Secondly, how does Nutch compare with a home brew PHP scraper. Im really out
of my depth here, am I looking at a tool that is extremely powerful and
ready for a production environment, or is it still very much a development
project on the side?

Any input you can give would be much appreciated.

James


--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-AND-Solr-Nutch-performance-and-features-tp3772750p3772750.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch AND Solr? Nutch performance and features

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi James,

On Fri, Feb 24, 2012 at 2:47 PM, Spadez <ja...@hotmail.com> wrote:

> However, having found nutch, it seems like this might be something worth
> looking at. Firstly, is nutch simply a web scrapper or does it integrate
> other aspects of lucene as well? Im wondering if I would need to install
> Nutch and SOLR together, or if Nutch integrates the search system as well.
>

You need to set up Nutch for crawling the web/filesystem of choice, then
the process of communicating with Solr is trivial. Please see this tutorial
for a comprehensive walkthrough [0]

>
> Secondly, how does Nutch compare with a home brew PHP scraper.

Haven't got a clue as I haven't seen or used any home brew scrapers.


> Im really out
> of my depth here, am I looking at a tool that is extremely powerful and
> ready for a production environment,

yes it is a very well established, actively maintained web crawler with a
healthy user and development community. It is also a mature project within
the Apache Software Foundation.


> or is it still very much a development
> project on the side?
>

No not at all. Nutch excels at covering the tasks you require.

Thanks

Lewis

I
[0] http://wiki.apache.org/nutch/NutchTutorial