You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Scott Smith <ss...@mainstreamdata.com> on 2011/12/06 23:04:48 UTC

To optimize or not - Solr vs Lucene

Wasn't sure which mailing list to send this to.  I'm writing an application that can be configured to run directly with lucene or with solr and I'm trying to figure out whether optimization of the index should be totally eliminated, eliminated in the lucene case only or what.

If I read the 3.5 lucene javadocs, optimize() has been deprecated because it is "rarely justified" with the current lucene index implementation (I started with lucene in the 1.42 days when I think it was pretty much a necessity).  However, If I read the lucid imagination 3.4 manual (page 176), it talks about how optimizing will merge a lot of small blocks together making the index more efficient-which is exactly what I thought optimize did.  Since solr is based on lucene, I'm wondering if the 3.4 manual is simply out-of-date on this point or whether there is something else going on.

Our application is indexing content in "real time" and so the index changes frequently during the day.  Some of our indexes only contain a few hundred thousand documents.  However, in one of our applications there are over 50 million documents (using Solr with multiple shards).  I thought optimization was a way to keep the index segments merged and thus make the searching more efficient.  I thought it was especially needed if the index was being updated frequently.

When should I optimize?

Thanks in advance for any feedback.

Scott

Re: To optimize or not - Solr vs Lucene

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Dec 6, 2011 at 5:04 PM, Scott Smith <ss...@mainstreamdata.com> wrote:
> If I read the 3.5 lucene javadocs, optimize() has been deprecated because it is "rarely justified" with the current lucene index implementation

It's functionality is not being deprecated... it's just that the
method is being renamed in Lucene (it's staying as "optimize" in
solr).

Using optimize is all about tradeoffs... it's expensive since it
rewrites the complete index (i.e. forces a merge of all segments into
one).  People interested in near-realtime indexing on large indexes
should definitely avoid optimizing.  If index changes don't need to be
visible as often, or for smaller indexes like yours, it really depends
on what you are optimizing for... memory use, query throughput,
indexing bandwidth, turnaround-time (i.e. near-realtime), etc.

-Yonik
http://www.lucidimagination.com