You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chun Wei Ho <cw...@gmail.com> on 2007/07/08 05:19:29 UTC

Lucene index sizes and performance

We are currently running a search service with a single Lucene index
of about 10 GB. We would like to find out:

(a) What is the usual index size of everyone else? How large have
Lucene index gone in prodution environments, and is there a sort of a
optimal size that Lucene indexes should be?

(b) With a index size of 10GB, how much memory would you recommend a
dual 3GHz machine serving searches on it to have. We currently have
4GB RAM and are thinking of adding more for faster searches?

Is there a ballpark figure or guide that we can adhere to - so we
might add more RAM depending on the rate of index growth.


(c) We're considered the possibility of splitting our large index into
several smaller ones based on discussions in previous threads.

Did anyone do so here and how did you manage it - splitting by logical
category, or splitting by time (so perhaps a index that holds 2 months
worth of documents might be split into 8 indexes of 1 week each). How
would the searching application handle/merge results from different
indexes?


Regards,
CW

Just a postscript here to thank mailing list folks who have been
providing us with guidance on Lucene all this time :)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene index sizes and performance

Posted by sunnyfr <jo...@gmail.com>.
Hi Chris,

Just 10-15% of the index size for the memory, how does it work?
It just look for in each segment merged ? 
that's why when I commit it's getting slower ?? 

Thanks 



chrislusf wrote:
> 
> Not really suggestion but some points to consider.
> (a) Greatly depending on your hardware, especially harddrive speed.
> (b) Do you do SortBy? Each SortBy field will need an array in memory.
> If no sortBy, reserve memory for about 10~15% of index size will be
> enough.
> (c) Maybe try to split by index content category first, much easier.
> 
> -- 
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> 
> On 7/7/07, Chun Wei Ho <cw...@gmail.com> wrote:
>> We are currently running a search service with a single Lucene index
>> of about 10 GB. We would like to find out:
>>
>> (a) What is the usual index size of everyone else? How large have
>> Lucene index gone in prodution environments, and is there a sort of a
>> optimal size that Lucene indexes should be?
>>
>> (b) With a index size of 10GB, how much memory would you recommend a
>> dual 3GHz machine serving searches on it to have. We currently have
>> 4GB RAM and are thinking of adding more for faster searches?
>>
>> Is there a ballpark figure or guide that we can adhere to - so we
>> might add more RAM depending on the rate of index growth.
>>
>>
>> (c) We're considered the possibility of splitting our large index into
>> several smaller ones based on discussions in previous threads.
>>
>> Did anyone do so here and how did you manage it - splitting by logical
>> category, or splitting by time (so perhaps a index that holds 2 months
>> worth of documents might be split into 8 indexes of 1 week each). How
>> would the searching application handle/merge results from different
>> indexes?
>>
>>
>> Regards,
>> CW
>>
>> Just a postscript here to thank mailing list folks who have been
>> providing us with guidance on Lucene all this time :)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Lucene-index-sizes-and-performance-tp11484310p22805212.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene index sizes and performance

Posted by Chris Lu <ch...@gmail.com>.
Not really suggestion but some points to consider.
(a) Greatly depending on your hardware, especially harddrive speed.
(b) Do you do SortBy? Each SortBy field will need an array in memory.
If no sortBy, reserve memory for about 10~15% of index size will be enough.
(c) Maybe try to split by index content category first, much easier.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

On 7/7/07, Chun Wei Ho <cw...@gmail.com> wrote:
> We are currently running a search service with a single Lucene index
> of about 10 GB. We would like to find out:
>
> (a) What is the usual index size of everyone else? How large have
> Lucene index gone in prodution environments, and is there a sort of a
> optimal size that Lucene indexes should be?
>
> (b) With a index size of 10GB, how much memory would you recommend a
> dual 3GHz machine serving searches on it to have. We currently have
> 4GB RAM and are thinking of adding more for faster searches?
>
> Is there a ballpark figure or guide that we can adhere to - so we
> might add more RAM depending on the rate of index growth.
>
>
> (c) We're considered the possibility of splitting our large index into
> several smaller ones based on discussions in previous threads.
>
> Did anyone do so here and how did you manage it - splitting by logical
> category, or splitting by time (so perhaps a index that holds 2 months
> worth of documents might be split into 8 indexes of 1 week each). How
> would the searching application handle/merge results from different
> indexes?
>
>
> Regards,
> CW
>
> Just a postscript here to thank mailing list folks who have been
> providing us with guidance on Lucene all this time :)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene index sizes and performance

Posted by Michael Stoppelman <st...@gmail.com>.
On Sat, Jul 7, 2007 at 8:19 PM, Chun Wei Ho <cw...@gmail.com> wrote:

> We are currently running a search service with a single Lucene index
> of about 10 GB. We would like to find out:
>
> (a) What is the usual index size of everyone else? How large have
> Lucene index gone in prodution environments, and is there a sort of a
> optimal size that Lucene indexes should be?
>

Same here. I'm interested in this answer too... If you're serving a lot of
traffic and need to highlight docs you need to keep
everything in memory. That's my lesson to share with the world.


>
> (b) With a index size of 10GB, how much memory would you recommend a
> dual 3GHz machine serving searches on it to have. We currently have
> 4GB RAM and are thinking of adding more for faster searches?
>
> Is there a ballpark figure or guide that we can adhere to - so we
> might add more RAM depending on the rate of index growth.
>
>
> (c) We're considered the possibility of splitting our large index into
> several smaller ones based on discussions in previous threads.
>
> Did anyone do so here and how did you manage it - splitting by logical
> category, or splitting by time (so perhaps a index that holds 2 months
> worth of documents might be split into 8 indexes of 1 week each). How
> would the searching application handle/merge results from different
> indexes?
>

Yes, this works. The most important thing if you're serving lots of traffic
is to have a master indexing box (that doesn't allow reading) and distribute
those copies to the slaves that are readonly. Also, when you copy your
indices out to slaves and the memory on them isn't twice as big as your
index you can use rsync w/ the fadvise patch (
http://insights.oetiker.ch/linux/fadvise.html) so the current index isn't
evicted from the disk cache (in linux).

I'm not sure about the combining of results, maybe this will help
http://sujitpal.blogspot.com/2007/08/remote-lucene-indexes.html


>
>
> Regards,
> CW
>
> Just a postscript here to thank mailing list folks who have been
> providing us with guidance on Lucene all this time :)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>