You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2008/02/29 06:56:59 UTC

[Solr Wiki] Update of "CollectionDistribution" by JamesBrady

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by JamesBrady:
http://wiki.apache.org/solr/CollectionDistribution

The comment on the change is:
Change "ten minute" claim about optimize times

------------------------------------------------------------------------------
  
  On a very large index, adding even a few documents then running an optimize means rewriting the complete index.  This consumes a lot of disk I/O and impacts query performace. Optimizing a very large index may even involve copying the index twice  &#151; the current code for merging one index into another calls optimize at the beginning ''and'' the end.  If some docs have been deleted, the first optimize call will rewrite the index even before the second index is merged.
  
- Optimizations can take nearly ten minutes to run.  We do not know what happens to query performance on a collection that has not been optimized for a long time. We ''do'' know that it will get worse as the collection becomes more fragmented, but   how much worse is very dependent on the manner of updates and commits to the collection.
+ Optimization is an I/O intensive process, as the entire index is read and re-written in optimized form. Anecdotal data shows that optimizations on modest server hardware can take around 5 minutes per GB, although this obviously varies considerably with index fragmentation and hardware bottlenecks. We do not know what happens to query performance on a collection that has not been optimized for a long time. We ''do'' know that it will get worse as the collection becomes more fragmented, but how much worse is very dependent on the manner of updates and commits to the collection.
  
  We are presuming optimizations should be run once following large ''batch-like'' updates to the collection and/or once a day.