You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by muraalee <mu...@gmail.com> on 2007/07/07 02:25:06 UTC
Scaling Lucene to 500 million documents - preferred architecture
Hi Everybody,
We are building a search infrastructure using lucene to scale upto 500
million document with search < 500 ms.
Here is my rough math on the size of content & index :
Total Documents = 500 million documents
Size / Document = 10k / document
Index Size / Million = 2 GB / million document
Total Index size = 500 million ~ 1 TB
We are planning to partition this 1 TB index into 25 partitions with each
partition of around 20 million documents @ 40 GB size.
Since 1TB doesn't seem to be that much, we are debating whether we should go
for RAM memory for the whole 1 TB. Checked the prices for RAM memory ( 64 GB
/ 8 CPU boxes ) and they are very competitive.
Now the question is.. Can we use RAM Directory for all of this 1 TB or
FSDirectory is better with separate spindle for each CPU ?
We are considering 25 boxes ( 8 CPU - 64 GB boxes ) for each partition and
separate brokers to merge these results.
Did anybody did something like this in the past ? Appreciate if you guys can
share your experiences.
thanks
Murali V
--
View this message in context: http://www.nabble.com/Scaling-Lucene-to-500-million-documents---preferred-architecture-tf4038794.html#a11474442
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org