You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Heart <be...@gmail.com> on 2005/10/18 14:17:48 UTC

how to build a SE based on nutch

I'm new to nutch. Several days ago, I finish building a simple intranet se based on nutch 0.6.
and I've spend two week to read the source code of nutch 0.6.

Now I want to build a bigger one. I want to crawl the pages from several website I specific.
My server is a poor machine with 1CPU 1G Mem and 320G HD, the bandwidth is 10Mbps
I want to provide a search service about some specific domain. so i choose some 
big websites, and crawl them. 
so my question is :
Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day
and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study?
any advices would be greatly appreciated.



-- 
Best regards,
 Heart                            mailto:betogether@gmail.com

Re: how to build a SE based on nutch

Posted by Miguel A Paraz <mp...@gmail.com>.

On 10/18/05, Heart <be...@gmail.com> wrote:
> Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day
> and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study?
> any advices would be greatly appreciated.

I'll add:
Sorry to ask this, but I could not find it in the docs. How could I
request Nutch to refetch sites that are already in the db? I tried
injecting them again, but they are not refreshed.