You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jamie Johnson <je...@gmail.com> on 2012/03/30 01:04:45 UTC

Optimizing in SolrCloud

What is the best way to periodically optimize a Solr index?  I've seen
a few places where this is done from a CRON job, but I wanted to know
if there are any other techniques that are used in practice for doing
this.  My use case is that we generally load a large corpus of data up
front and then information trickle's in after that, but we want this
information to be available for search within a reasonable amount of
time (say 10 minutes).  I believe that the CRON job would probably
suffice but if there are any other thoughts/suggestions I'd be
interested to hear them.

Re: Optimizing in SolrCloud

Posted by Walter Underwood <wu...@wunderwood.org>.
The documents are removed from the search when the delete is committed.

The space for those documents is reclaimed at the next merge for the segment where they were. 

wunder

On Mar 29, 2012, at 4:15 PM, Jamie Johnson wrote:

> Thanks, does it matter that we are also updates to documents at
> various times?  Do the deleted documents get removed when doing a
> merge or does that only get done on an optimize?
> 
> On Thu, Mar 29, 2012 at 7:08 PM, Walter Underwood <wu...@wunderwood.org> wrote:
>> Don't. "Optimize" is a poorly-chosen name for a full merge. It doesn't make that much difference and there is almost never a need to do it on a periodic basis.
>> 
>> The full merge will mean a longer time between the commit and the time that the data is first searchable. Do the commit, then search.
>> 
>> wunder
>> 
>> On Mar 29, 2012, at 4:04 PM, Jamie Johnson wrote:
>> 
>>> What is the best way to periodically optimize a Solr index?  I've seen
>>> a few places where this is done from a CRON job, but I wanted to know
>>> if there are any other techniques that are used in practice for doing
>>> this.  My use case is that we generally load a large corpus of data up
>>> front and then information trickle's in after that, but we want this
>>> information to be available for search within a reasonable amount of
>>> time (say 10 minutes).  I believe that the CRON job would probably
>>> suffice but if there are any other thoughts/suggestions I'd be
>>> interested to hear them.
>> 





Re: Optimizing in SolrCloud

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Thu, Mar 29, 2012 at 7:15 PM, Jamie Johnson <je...@gmail.com> wrote:
> Thanks, does it matter that we are also updates to documents at
> various times?  Do the deleted documents get removed when doing a
> merge or does that only get done on an optimize?

Yes, any merge removes documents that have been marked as deleted
(from the segments involved in the merge).

Optimize can still make sense, but more often in scenarios where
documents are updated infrequently.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10

Re: Optimizing in SolrCloud

Posted by Jamie Johnson <je...@gmail.com>.
Thanks, does it matter that we are also updates to documents at
various times?  Do the deleted documents get removed when doing a
merge or does that only get done on an optimize?

On Thu, Mar 29, 2012 at 7:08 PM, Walter Underwood <wu...@wunderwood.org> wrote:
> Don't. "Optimize" is a poorly-chosen name for a full merge. It doesn't make that much difference and there is almost never a need to do it on a periodic basis.
>
> The full merge will mean a longer time between the commit and the time that the data is first searchable. Do the commit, then search.
>
> wunder
>
> On Mar 29, 2012, at 4:04 PM, Jamie Johnson wrote:
>
>> What is the best way to periodically optimize a Solr index?  I've seen
>> a few places where this is done from a CRON job, but I wanted to know
>> if there are any other techniques that are used in practice for doing
>> this.  My use case is that we generally load a large corpus of data up
>> front and then information trickle's in after that, but we want this
>> information to be available for search within a reasonable amount of
>> time (say 10 minutes).  I believe that the CRON job would probably
>> suffice but if there are any other thoughts/suggestions I'd be
>> interested to hear them.
>
>
>
>
>

Re: Optimizing in SolrCloud

Posted by Walter Underwood <wu...@wunderwood.org>.
Don't. "Optimize" is a poorly-chosen name for a full merge. It doesn't make that much difference and there is almost never a need to do it on a periodic basis.

The full merge will mean a longer time between the commit and the time that the data is first searchable. Do the commit, then search.

wunder

On Mar 29, 2012, at 4:04 PM, Jamie Johnson wrote:

> What is the best way to periodically optimize a Solr index?  I've seen
> a few places where this is done from a CRON job, but I wanted to know
> if there are any other techniques that are used in practice for doing
> this.  My use case is that we generally load a large corpus of data up
> front and then information trickle's in after that, but we want this
> information to be available for search within a reasonable amount of
> time (say 10 minutes).  I believe that the CRON job would probably
> suffice but if there are any other thoughts/suggestions I'd be
> interested to hear them.