You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Heart <be...@gmail.com> on 2005/10/18 14:17:48 UTC
how to build a SE based on nutch
I'm new to nutch. Several days ago, I finish building a simple intranet se based on nutch 0.6.
and I've spend two week to read the source code of nutch 0.6.
Now I want to build a bigger one. I want to crawl the pages from several website I specific.
My server is a poor machine with 1CPU 1G Mem and 320G HD, the bandwidth is 10Mbps
I want to provide a search service about some specific domain. so i choose some
big websites, and crawl them.
so my question is :
Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day
and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study?
any advices would be greatly appreciated.
--
Best regards,
Heart mailto:betogether@gmail.com
Re: how to build a SE based on nutch
Posted by Miguel A Paraz <mp...@gmail.com>.
On 10/18/05, Heart <be...@gmail.com> wrote:
> Must I update all the site(crawl the sites) in one crawl procedure,may I crawl one site per day
> and run a program to index them together, I wonder if the crawl procedure last too long ,how can I provide my service? Is there any good system for me to study?
> any advices would be greatly appreciated.
I'll add:
Sorry to ask this, but I could not find it in the docs. How could I
request Nutch to refetch sites that are already in the db? I tried
injecting them again, but they are not refreshed.