You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/08/12 03:49:57 UTC

Re: eternal optimize interrupted

(replying to solr-user thread over onto solr-dev)

: > last evening we started an optimize over our solr index of 45GB. This morning
: > the optimize was still running, discs spinning like crazy and de index
: > directory has grew to 83GB.
: 
: Hmmm, it was probably code to done given that 45*2=90.
: But with that size of an index, and given that solr/tomcat wasn't
: responsive, and that there was a lot of disk IO, perhaps the system
: was swapping?

random thought here, but for really big indexes, would iterative partial 
optimizes result in less disk (and in theory: less swap) then doing a full 
optimize?

With a full optimize, the original segment files have to remain until the 
entire optimize is finishe,d hence the 2x disk usage ... but if you 
continuously send partial optimize commands (with maxSegments- one less 
then the current number of segments) then on each iteration the old 
segment files could be cleaned up.

If i remember correctly: a full optimize is just iterative merging the 
smallest two segments anyway, which means (unless i'm smoking crack) 
iterative partial merges should take the same amount of time -- and use 
less disk.

what do the segment merging experts think?  does this sound right?


which begs the quesiton: should <optimize/> do this automaticly for 
people?  In a generic lucene app, a "full optimize" needs to work the way 
it does so any other threads/apps trying to open the index get either the 
original index or the new fully optimized index; but we don't really have 
that limitation in Solr ... we could do the iteration yourself, and just 
hold off on firing any postOptimize or newSearcher events.



-Hoss