You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2007/07/17 02:05:47 UTC

Re: Scaling Lucene to 500 million documents - preferred architecture

Hi Murali (redirecting to the more appropriate java-user list)

Sounds doable.  I'd go with FSDirectory (or even its memory mapped cousin) instead of RAMDirectory - let the OS cache Lucene indices.  I'm looking at a search cluster with 3 times that many machines (but not as high-end as your 8 CPU/64GB ones) and well over 1B docs.

Otis
--
Lucene Consulting -- http://lucene-consulting.com/


----- Original Message ----
From: muraalee <mu...@gmail.com>
To: java-dev@lucene.apache.org
Sent: Saturday, July 7, 2007 2:25:06 AM
Subject: Scaling Lucene to 500 million documents - preferred architecture


Hi Everybody,
We are building a search infrastructure using lucene to scale upto 500
million document with search < 500 ms. 

Here is my rough math on the size of content & index :
Total Documents = 500 million documents 
Size / Document = 10k / document
Index Size / Million = 2 GB / million document
Total Index size = 500 million ~ 1 TB

We are planning to partition this 1 TB index into 25 partitions  with each
partition of around 20 million documents @ 40 GB size. 

Since 1TB doesn't seem to be that much, we are debating whether we should go
for RAM memory for the whole 1 TB. Checked the prices for RAM memory ( 64 GB
/ 8 CPU boxes ) and they are very competitive.

Now the question is..  Can we use RAM Directory for all of this 1 TB or
FSDirectory is better with separate spindle for each CPU ?

We are considering 25 boxes ( 8 CPU - 64 GB boxes ) for each partition and
separate brokers to merge these results. 

Did anybody did something like this in the past ? Appreciate if you guys can
share your experiences.

thanks
Murali V

-- 
View this message in context: http://www.nabble.com/Scaling-Lucene-to-500-million-documents---preferred-architecture-tf4038794.html#a11474442
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org