You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Chun Wei Ho <cw...@gmail.com> on 2007/07/08 05:20:21 UTC

Lucene index sizes and performance

We are currently running a search service with a single Lucene index
of about 10 GB. We would like to find out:

(a) What is the usual index size of everyone else? How large have
Lucene index gone in production environments, and is there a sort of a
optimal size that Lucene indexes should be?

(b) With a index size of 10GB, how much memory would you recommend a
dual 3GHz machine serving searches on it to have. We currently have
4GB RAM and are thinking of adding more for faster searches?

Is there a ballpark figure or guide that we can adhere to - so we
might add more RAM depending on the rate of index growth.


(c) We're considered the possibility of splitting our large index into
several smaller ones based on discussions in previous threads.

Did anyone do so here and how did you manage it - splitting by logical
category, or splitting by time (so perhaps a index that holds 2 months
worth of documents might be split into 8 indexes of 1 week each). How
would the searching application handle/merge results from different
indexes?


Regards,
CW

Just a postscript here to thank mailing list folks who have been
providing us with guidance on Lucene all this time :)