You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2013/07/09 04:41:33 UTC

Lookback and/or time-aware Merge Policy?

Hi,

I was (re-re-re-re)-reading Mike's post about Lucene segment merges -
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Mike mentioned lookhead as something that could possibly yield more
optimal merges.

But what about lookback? :)

What if some sort of stats were kept about about which segments were
picked for merges?  With some sort of stats in hand, could one look
back and, knowing what happened after those merges, evaluate if more
optimal merge choices could have been made and then use that "next
time"?

Also, what about time of day and query rates?  Very often search
traffic follows the wave pattern, which could mean that more
aggressive merging could be done during periods with lower query
rates... or maybe during that time more segments could be allowed to
live in the index, assuming that after allowing that for some time,
the subsequent merge could be bigger/more thorough, so to speak.

Thoughts?

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lookback and/or time-aware Merge Policy?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Thanks for showing I wasn't completely crazy to think this made sense, Mike.

I added:
https://issues.apache.org/jira/browse/LUCENE-5134
https://issues.apache.org/jira/browse/LUCENE-5135

Otis



On Mon, Jul 15, 2013 at 1:28 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Lookback is a good idea: you could at least gather statistics and
> assess, later, whether good merges had been selected, and maybe play
> "what if" games to explore if different merge selections would have
> resulted in less copying.
>
> A time-based MergeScheduler would make sense: e.g., it would allow
> small merges to run any time, but big ones must wait until "after
> hours".
>
> Also, RateLimitedDirWrapper can be used to limit IO impact of ongoing
> merges.  It's like a naive ionice, for merging.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Jul 8, 2013 at 10:41 PM, Otis Gospodnetic
> <ot...@gmail.com> wrote:
>> Hi,
>>
>> I was (re-re-re-re)-reading Mike's post about Lucene segment merges -
>> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>>
>> Mike mentioned lookhead as something that could possibly yield more
>> optimal merges.
>>
>> But what about lookback? :)
>>
>> What if some sort of stats were kept about about which segments were
>> picked for merges?  With some sort of stats in hand, could one look
>> back and, knowing what happened after those merges, evaluate if more
>> optimal merge choices could have been made and then use that "next
>> time"?
>>
>> Also, what about time of day and query rates?  Very often search
>> traffic follows the wave pattern, which could mean that more
>> aggressive merging could be done during periods with lower query
>> rates... or maybe during that time more segments could be allowed to
>> live in the index, assuming that after allowing that for some time,
>> the subsequent merge could be bigger/more thorough, so to speak.
>>
>> Thoughts?
>>
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lookback and/or time-aware Merge Policy?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Lookback is a good idea: you could at least gather statistics and
assess, later, whether good merges had been selected, and maybe play
"what if" games to explore if different merge selections would have
resulted in less copying.

A time-based MergeScheduler would make sense: e.g., it would allow
small merges to run any time, but big ones must wait until "after
hours".

Also, RateLimitedDirWrapper can be used to limit IO impact of ongoing
merges.  It's like a naive ionice, for merging.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jul 8, 2013 at 10:41 PM, Otis Gospodnetic
<ot...@gmail.com> wrote:
> Hi,
>
> I was (re-re-re-re)-reading Mike's post about Lucene segment merges -
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
> Mike mentioned lookhead as something that could possibly yield more
> optimal merges.
>
> But what about lookback? :)
>
> What if some sort of stats were kept about about which segments were
> picked for merges?  With some sort of stats in hand, could one look
> back and, knowing what happened after those merges, evaluate if more
> optimal merge choices could have been made and then use that "next
> time"?
>
> Also, what about time of day and query rates?  Very often search
> traffic follows the wave pattern, which could mean that more
> aggressive merging could be done during periods with lower query
> rates... or maybe during that time more segments could be allowed to
> live in the index, assuming that after allowing that for some time,
> the subsequent merge could be bigger/more thorough, so to speak.
>
> Thoughts?
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org