You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by muraalee <mu...@gmail.com> on 2007/07/07 02:25:06 UTC

Scaling Lucene to 500 million documents - preferred architecture

Hi Everybody,
We are building a search infrastructure using lucene to scale upto 500
million document with search < 500 ms. 

Here is my rough math on the size of content & index :
Total Documents = 500 million documents 
Size / Document = 10k / document
Index Size / Million = 2 GB / million document
Total Index size = 500 million ~ 1 TB

We are planning to partition this 1 TB index into 25 partitions  with each
partition of around 20 million documents @ 40 GB size. 

Since 1TB doesn't seem to be that much, we are debating whether we should go
for RAM memory for the whole 1 TB. Checked the prices for RAM memory ( 64 GB
/ 8 CPU boxes ) and they are very competitive.

Now the question is..  Can we use RAM Directory for all of this 1 TB or
FSDirectory is better with separate spindle for each CPU ?

We are considering 25 boxes ( 8 CPU - 64 GB boxes ) for each partition and
separate brokers to merge these results. 

Did anybody did something like this in the past ? Appreciate if you guys can
share your experiences.

thanks
Murali V

-- 
View this message in context: http://www.nabble.com/Scaling-Lucene-to-500-million-documents---preferred-architecture-tf4038794.html#a11474442
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org