You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Maxim Veksler <ma...@vekslers.org> on 2012/01/23 09:03:54 UTC

Solr Cluster - Is it wise to run optimize() on the master after each update

I'm planning on having 1 Master and multiple slaves (cloud based, slaves
are going up / down randomly).

The slaves should be constantly available, meaning searching performance
should optimally not be affected by the updates at all.
It's unclear to me how the Cluster based replication works, does it copy
the files from the master and updates in place? In which case am I correct
to assume that except for cache being emptied the search performance in not
affects?

Does optimize on the master some how affects the performance of the slaves?
Is it recommended to run optimize after each update, assuming I'm not
concerted about locking the master for updates and it's OK if the optimize
finishes in under 20min?

Thank you,
Maxim.

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

Posted by Erick Erickson <er...@gmail.com>.
My first reaction is that, unless you have a specific use-case,
this is unnecessary. When using a slave the Solr replication
goes on in the background. Autowarming also is carried out
in the background. Only when the autowarming is done are
queries sent to the new (internal-to-solr) searcher. All without
any interruption, the apps that consume Solr don't notice
anything at all. Just try it, don't re-invent the wheel!

Best
Erick

On Mon, Jan 23, 2012 at 9:54 AM, Maxim Veksler <ma...@vekslers.org> wrote:
> Wonderful input. Thank you very much Erick.
>
> One question, I've been told that Solr supports an operation mode of multi
> core where you build the index on the master (optimize or not) then pass it
> to the "stand by" core on the slaves. Once the synchronization is complete
> you switch on the slave between the active and passive core (an operation
> that is claimed to be atomic, and can happen at run time). Have you or
> other members of this list had experience with this mode of operation?
>
> Thank you.
>
> On Mon, Jan 23, 2012 at 7:25 PM, Erick Erickson <er...@gmail.com>wrote:
>
>> In general, do not optimize unless you
>> 1> have a very static index
>> 2> actually test the search performance afterwards.
>>
>> First, as Andrew says, optimizing will force a complete
>> copy of the entire index at replication. If you do NOT
>> optimize, only the most recent segments to be written
>> are copied.
>>
>> Second, unless you have a quite large number of
>> segments, optimizing despite its cool-sounding name,
>> doesn't buy you much. In fact there's a JIRA to
>> rename it to something less good-sounding precisely
>> because people think "of course I want the index
>> optimizied".
>>
>> Third, under no circumstances should you optimize
>> after every update. This will absolutely kill your
>> indexing. Optimizing copies all segments into
>> a single segment. In other words you'll spend a lot
>> of time copying junk around for no good reason. Here
>> I'm assuming by "update" you mean after every batch
>> of documents is added. If you're talking after an entire
>> indexing run, it's not so bad.
>>
>> Fourth, one tangible result of optimizing is that the
>> index is purged of all deleted documents (and remember
>> that a document update is really a delete followed by
>> an add). But the same thing happens on segment
>> merges, which happen without optimizing.
>>
>> Bottom line: Don't bother to optimize unless and until
>> you demonstrate that optimizing provides enough of a
>> performance boost to be worth it. Even then re-check
>> your assumptions. Look at the various merge policies
>> to have more control over when merges occur and
>> the number of segments you have, but try to forget
>> that optimization even exists <G>....
>>
>> Best
>> Erick
>>
>>
>> There's some good info here...
>> http://wiki.apache.org/solr/SolrPerformanceFactors
>>
>> Best
>> Erick
>>
>> On Mon, Jan 23, 2012 at 12:22 AM, Andrew Harvey <an...@mootpointer.com>
>> wrote:
>> > We found that optimising too often killed our slave performance. An
>> optimise will cause you to merge and ship the whole index rather than just
>> the relevant portions when you replicate.
>> >
>> > The change on our slaves in terms of IO and CPU as well as RAM was
>> marked.
>> >
>> > Andrew
>> >
>> > Sent on the run.
>> >
>> > On 23/01/2012, at 19:03, Maxim Veksler <ma...@vekslers.org> wrote:
>> >
>> >> I'm planning on having 1 Master and multiple slaves (cloud based, slaves
>> >> are going up / down randomly).
>> >>
>> >> The slaves should be constantly available, meaning searching performance
>> >> should optimally not be affected by the updates at all.
>> >> It's unclear to me how the Cluster based replication works, does it copy
>> >> the files from the master and updates in place? In which case am I
>> correct
>> >> to assume that except for cache being emptied the search performance in
>> not
>> >> affects?
>> >>
>> >> Does optimize on the master some how affects the performance of the
>> slaves?
>> >> Is it recommended to run optimize after each update, assuming I'm not
>> >> concerted about locking the master for updates and it's OK if the
>> optimize
>> >> finishes in under 20min?
>> >>
>> >> Thank you,
>> >> Maxim.
>>

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

Posted by Maxim Veksler <ma...@vekslers.org>.
Wonderful input. Thank you very much Erick.

One question, I've been told that Solr supports an operation mode of multi
core where you build the index on the master (optimize or not) then pass it
to the "stand by" core on the slaves. Once the synchronization is complete
you switch on the slave between the active and passive core (an operation
that is claimed to be atomic, and can happen at run time). Have you or
other members of this list had experience with this mode of operation?

Thank you.

On Mon, Jan 23, 2012 at 7:25 PM, Erick Erickson <er...@gmail.com>wrote:

> In general, do not optimize unless you
> 1> have a very static index
> 2> actually test the search performance afterwards.
>
> First, as Andrew says, optimizing will force a complete
> copy of the entire index at replication. If you do NOT
> optimize, only the most recent segments to be written
> are copied.
>
> Second, unless you have a quite large number of
> segments, optimizing despite its cool-sounding name,
> doesn't buy you much. In fact there's a JIRA to
> rename it to something less good-sounding precisely
> because people think "of course I want the index
> optimizied".
>
> Third, under no circumstances should you optimize
> after every update. This will absolutely kill your
> indexing. Optimizing copies all segments into
> a single segment. In other words you'll spend a lot
> of time copying junk around for no good reason. Here
> I'm assuming by "update" you mean after every batch
> of documents is added. If you're talking after an entire
> indexing run, it's not so bad.
>
> Fourth, one tangible result of optimizing is that the
> index is purged of all deleted documents (and remember
> that a document update is really a delete followed by
> an add). But the same thing happens on segment
> merges, which happen without optimizing.
>
> Bottom line: Don't bother to optimize unless and until
> you demonstrate that optimizing provides enough of a
> performance boost to be worth it. Even then re-check
> your assumptions. Look at the various merge policies
> to have more control over when merges occur and
> the number of segments you have, but try to forget
> that optimization even exists <G>....
>
> Best
> Erick
>
>
> There's some good info here...
> http://wiki.apache.org/solr/SolrPerformanceFactors
>
> Best
> Erick
>
> On Mon, Jan 23, 2012 at 12:22 AM, Andrew Harvey <an...@mootpointer.com>
> wrote:
> > We found that optimising too often killed our slave performance. An
> optimise will cause you to merge and ship the whole index rather than just
> the relevant portions when you replicate.
> >
> > The change on our slaves in terms of IO and CPU as well as RAM was
> marked.
> >
> > Andrew
> >
> > Sent on the run.
> >
> > On 23/01/2012, at 19:03, Maxim Veksler <ma...@vekslers.org> wrote:
> >
> >> I'm planning on having 1 Master and multiple slaves (cloud based, slaves
> >> are going up / down randomly).
> >>
> >> The slaves should be constantly available, meaning searching performance
> >> should optimally not be affected by the updates at all.
> >> It's unclear to me how the Cluster based replication works, does it copy
> >> the files from the master and updates in place? In which case am I
> correct
> >> to assume that except for cache being emptied the search performance in
> not
> >> affects?
> >>
> >> Does optimize on the master some how affects the performance of the
> slaves?
> >> Is it recommended to run optimize after each update, assuming I'm not
> >> concerted about locking the master for updates and it's OK if the
> optimize
> >> finishes in under 20min?
> >>
> >> Thank you,
> >> Maxim.
>

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

Posted by Erick Erickson <er...@gmail.com>.
In general, do not optimize unless you
1> have a very static index
2> actually test the search performance afterwards.

First, as Andrew says, optimizing will force a complete
copy of the entire index at replication. If you do NOT
optimize, only the most recent segments to be written
are copied.

Second, unless you have a quite large number of
segments, optimizing despite its cool-sounding name,
doesn't buy you much. In fact there's a JIRA to
rename it to something less good-sounding precisely
because people think "of course I want the index
optimizied".

Third, under no circumstances should you optimize
after every update. This will absolutely kill your
indexing. Optimizing copies all segments into
a single segment. In other words you'll spend a lot
of time copying junk around for no good reason. Here
I'm assuming by "update" you mean after every batch
of documents is added. If you're talking after an entire
indexing run, it's not so bad.

Fourth, one tangible result of optimizing is that the
index is purged of all deleted documents (and remember
that a document update is really a delete followed by
an add). But the same thing happens on segment
merges, which happen without optimizing.

Bottom line: Don't bother to optimize unless and until
you demonstrate that optimizing provides enough of a
performance boost to be worth it. Even then re-check
your assumptions. Look at the various merge policies
to have more control over when merges occur and
the number of segments you have, but try to forget
that optimization even exists <G>....

Best
Erick


There's some good info here...
http://wiki.apache.org/solr/SolrPerformanceFactors

Best
Erick

On Mon, Jan 23, 2012 at 12:22 AM, Andrew Harvey <an...@mootpointer.com> wrote:
> We found that optimising too often killed our slave performance. An optimise will cause you to merge and ship the whole index rather than just the relevant portions when you replicate.
>
> The change on our slaves in terms of IO and CPU as well as RAM was marked.
>
> Andrew
>
> Sent on the run.
>
> On 23/01/2012, at 19:03, Maxim Veksler <ma...@vekslers.org> wrote:
>
>> I'm planning on having 1 Master and multiple slaves (cloud based, slaves
>> are going up / down randomly).
>>
>> The slaves should be constantly available, meaning searching performance
>> should optimally not be affected by the updates at all.
>> It's unclear to me how the Cluster based replication works, does it copy
>> the files from the master and updates in place? In which case am I correct
>> to assume that except for cache being emptied the search performance in not
>> affects?
>>
>> Does optimize on the master some how affects the performance of the slaves?
>> Is it recommended to run optimize after each update, assuming I'm not
>> concerted about locking the master for updates and it's OK if the optimize
>> finishes in under 20min?
>>
>> Thank you,
>> Maxim.

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

Posted by Andrew Harvey <an...@mootpointer.com>.
We found that optimising too often killed our slave performance. An optimise will cause you to merge and ship the whole index rather than just the relevant portions when you replicate. 

The change on our slaves in terms of IO and CPU as well as RAM was marked. 

Andrew

Sent on the run. 

On 23/01/2012, at 19:03, Maxim Veksler <ma...@vekslers.org> wrote:

> I'm planning on having 1 Master and multiple slaves (cloud based, slaves
> are going up / down randomly).
> 
> The slaves should be constantly available, meaning searching performance
> should optimally not be affected by the updates at all.
> It's unclear to me how the Cluster based replication works, does it copy
> the files from the master and updates in place? In which case am I correct
> to assume that except for cache being emptied the search performance in not
> affects?
> 
> Does optimize on the master some how affects the performance of the slaves?
> Is it recommended to run optimize after each update, assuming I'm not
> concerted about locking the master for updates and it's OK if the optimize
> finishes in under 20min?
> 
> Thank you,
> Maxim.