You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Spadez <ja...@hotmail.com> on 2012/02/24 15:47:24 UTC
Nutch AND Solr? Nutch performance and features
Hi.
I am effectively trying to make a search engine site, which scrapes data,
collects it and then makes it searchable. I have decided on SOLR for my
search system, but my plan was to try and make a scrapper using PHP.
However, having found nutch, it seems like this might be something worth
looking at. Firstly, is nutch simply a web scrapper or does it integrate
other aspects of lucene as well? Im wondering if I would need to install
Nutch and SOLR together, or if Nutch integrates the search system as well.
Secondly, how does Nutch compare with a home brew PHP scraper. Im really out
of my depth here, am I looking at a tool that is extremely powerful and
ready for a production environment, or is it still very much a development
project on the side?
Any input you can give would be much appreciated.
James
--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-AND-Solr-Nutch-performance-and-features-tp3772750p3772750.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Nutch AND Solr? Nutch performance and features
Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi James,
On Fri, Feb 24, 2012 at 2:47 PM, Spadez <ja...@hotmail.com> wrote:
> However, having found nutch, it seems like this might be something worth
> looking at. Firstly, is nutch simply a web scrapper or does it integrate
> other aspects of lucene as well? Im wondering if I would need to install
> Nutch and SOLR together, or if Nutch integrates the search system as well.
>
You need to set up Nutch for crawling the web/filesystem of choice, then
the process of communicating with Solr is trivial. Please see this tutorial
for a comprehensive walkthrough [0]
>
> Secondly, how does Nutch compare with a home brew PHP scraper.
Haven't got a clue as I haven't seen or used any home brew scrapers.
> Im really out
> of my depth here, am I looking at a tool that is extremely powerful and
> ready for a production environment,
yes it is a very well established, actively maintained web crawler with a
healthy user and development community. It is also a mature project within
the Apache Software Foundation.
> or is it still very much a development
> project on the side?
>
No not at all. Nutch excels at covering the tasks you require.
Thanks
Lewis
I
[0] http://wiki.apache.org/nutch/NutchTutorial