You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Whitman <br...@variogr.am> on 2007/05/18 19:48:34 UTC

takes an hour

I have a largish solr store (2.4m documents with lots of stored text,  
27GB data dir) and I ran optimize on it last night. The QTime was  
3605096 !  (The commit took about a minute.) During the optimize the  
solr java process had 50% CPU and was using all of its max heap size.  
(1GB) On a server that's doing other stuff it was a bit much.

What is optimize doing (from a user's standpoint), how often should I  
run it, is it necessary?

Can I do the optimize on an rsync'd (non-live) copy of the index and  
then sync it back?



Re: takes an hour

Posted by Yonik Seeley <yo...@apache.org>.
On 5/18/07, Tom Hill <so...@zvents.com> wrote:
> Hi -
>
> What happens if updates occur during the optimize?

It blocks.

There's been some work on the Lucene side to buffer up to maxBufferedDocs
while merges are going on in the background.  If optimization takes an
hour on a really large index, however, you can only buffer so much.  A
better option is to perhaps coordinate things so you don't send
updates when you do an optimization (of an index this large at least).

-Yonik

Re: takes an hour

Posted by Tom Hill <so...@zvents.com>.
Hi -

What happens if updates occur during the optimize?

Thanks,

Tom

Re: takes an hour

Posted by Brian Whitman <br...@variogr.am>.
On May 18, 2007, at 2:10 PM, Yonik Seeley wrote:

> What's your max heap set to?  Might just want to verify that not too
> much time is spent in GC, which can happen when you are right at the
> brink.

Ah.. I thought it was set to 1GB but in my upgrade to java 1.6 I  
guess I'm now just giving it the default. I'll put it back to 1 or  
1.5GB (4GB machine), the only big things running on here are solr  
instances under resin. We're also moving to a multiple machine solr  
scenario, so I'll look into the syncing scripts. Thanks for the tips.

-brian




Re: takes an hour

Posted by Yonik Seeley <yo...@apache.org>.
On 5/18/07, Yonik Seeley <yo...@apache.org> wrote:
> Once in a blue moon, the addition of a single document could possibly
> cause cascading merges, essentially the same as an optimize.  One way
> to avoid this is to set a large mergeFactor... the downside being that
> you get more segments and have to optimize occasionally to keep it
> under control.

Actually, without setting the mergeFactor higher, a better way to control
a cascading merge would probably be maxMergeDocs... then perhaps
optimizing once a night at off-peak hours.

-Yonik

Re: takes an hour

Posted by Yonik Seeley <yo...@apache.org>.
On 5/18/07, Brian Whitman <br...@variogr.am> wrote:
> I have a largish solr store (2.4m documents with lots of stored text,
> 27GB data dir) and I ran optimize on it last night. The QTime was
> 3605096 !  (The commit took about a minute.) During the optimize the
> solr java process had 50% CPU and was using all of its max heap size.
> (1GB) On a server that's doing other stuff it was a bit much.

What's your max heap set to?  Might just want to verify that not too
much time is spent in GC, which can happen when you are right at the
brink.

> What is optimize doing (from a user's standpoint), how often should I
> run it, is it necessary?

An optimize merges all segments into a single segment, which removes
any deleted docs in the process).  This means that the entire 27GB
ends up getting rewritten.
Once in a blue moon, the addition of a single document could possibly
cause cascading merges, essentially the same as an optimize.  One way
to avoid this is to set a large mergeFactor... the downside being that
you get more segments and have to optimize occasionally to keep it
under control.

> Can I do the optimize on an rsync'd (non-live) copy of the index and
> then sync it back?

Yep!  If you had a master/slave setup, this could be done pretty much
automatically... at the cost of larger latencies between update and
visibility (since index changes need to be rsync'd to the slaves).

-Yonik