You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tomislav Poljak <tp...@gmail.com> on 2007/08/27 09:59:50 UTC

help with hardware requirements

I need help determining hardware specs for crawling 100 sites with 1000
pages each. Regular re-crawl is needed probably every day (maybe even
more often). So will one server meet these crawling requirements (only
crawling, searching will be handled by other machine)? If so, what
hardware specification would be recommended (how much Ram, CPU's, hard
disk space)?

Thanks,
       Tomislav


Re: help with hardware requirements

Posted by purpureleaf <pu...@gmail.com>.
I am not an expert on it, but I am doing something similar.
So you got 100k pages, that is very few to nutch's standard.
I think crawling will be the slow part, not because hardware, but because of
that if you crawling  fast then 1page/second per site, you may be blocked by
some site. 
If you really want to update it everyday, this may be a problem.

the searching stuff is really fast, I was worried about it woo, but once I
saw my AMD 1800+ pc(1G mem) can do a search less than 0.1 second, I didn't
bother myself looking into this problem anymore. I saw someone on this list
doing crawling/searching on a PIII with resealable speed.

Regards
Pan

Tomislav Poljak wrote:
> 
> I need help determining hardware specs for crawling 100 sites with 1000
> pages each. Regular re-crawl is needed probably every day (maybe even
> more often). So will one server meet these crawling requirements (only
> crawling, searching will be handled by other machine)? If so, what
> hardware specification would be recommended (how much Ram, CPU's, hard
> disk space)?
> 
> Thanks,
>        Tomislav
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/help-with-hardware-requirements-tf4333859.html#a12381466
Sent from the Nutch - User mailing list archive at Nabble.com.