You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Chun Wei Ho <cw...@gmail.com> on 2006/02/16 05:30:01 UTC

Hardware Requirements for a large index?

Hi,

I am in the process of deciding specs for a crawling machine and a
searching machine (two machines), which will support merging/indexing
and searching operations on a single index that may scale to about
several million pages (at which it would be about 2-10 GB, assuming
linear growth with pages).

What is the range of hardware that I should be looking at? Could
anyone share their deployment/hardware specs for a large index size?
I'm looking for RAM and CPU considerations.

Also what is the preferred platform - Java has a max memory allocation
of 4GB on Solaris and 2GB on linux? -> Does it make sense to get more
RAM than this?

Thanks!

CW

Re: Hardware Requirements for a large index?

Posted by Stefan Groschupf <sg...@media-style.com>.
> I am in the process of deciding specs for a crawling machine and a
> searching machine (two machines), which will support merging/indexing
> and searching operations on a single index that may scale to about
> several million pages (at which it would be about 2-10 GB, assuming
> linear growth with pages).
There are some rule of thumbs in the wiki that are still up to date  
for the index itself sine they are more related to lucene.
In general my suggestion is start small and grow with your need,  
hadoop is perfect for that.

> What is the range of hardware that I should be looking at? Could
> anyone share their deployment/hardware specs for a large index size?
> I'm looking for RAM and CPU considerations.
>
> Also what is the preferred platform - Java has a max memory allocation
> of 4GB on Solaris and 2GB on linux? -> Does it make sense to get more
> RAM than this?

As far I know you should be able to use more memory with a 64bit jvm.