You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael McCandless (Jira)" <ji...@apache.org> on 2021/08/27 12:08:00 UTC

[jira] [Commented] (LUCENE-10073) Allow very small merges to merge more than segmentsPerTier segments?

    [ https://issues.apache.org/jira/browse/LUCENE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405780#comment-17405780 ] 

Michael McCandless commented on LUCENE-10073:
---------------------------------------------

+1, I think that makes sense.

I wish we had benchmarks telling us if there is any merge performance penalty merging so many tiny segments at once.  I.e. would it be worth using two or three merge threads to merge those tiny segments, versus using one thread to merge all of them.  But until we have such benchmarks, +1 to be more aggressive on merging tiny segments.

We might also enable merge-on-refresh by default... I think we have another issue open to ponder that.

> Allow very small merges to merge more than segmentsPerTier segments?
> --------------------------------------------------------------------
>
>                 Key: LUCENE-10073
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10073
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> If you are doing lots of concurrent indexing, NRT search regularly publishes many tiny segments, which in-turn pushes a lot of pressure on merging, which needs to constantly merge these tiny segments so that the total number of segments of the index remains under the budget.
> In parallel, TieredMergePolicy's behavior is to merge aggressively segments that are below the floor size. The budget of the number of segments allowed in the index is computed as if all segments were larger than the floor size, and merges that only contain segments below the floor size get a perfect skew which guarantees them to get a better score than any merge that contains one or more segments above the floor size.
> I'm considering reducing the merging overhead in the NRT case by raising maxMergeAtOnce and allowing merges to merge more than mergeFactor segments as long as the number of merged segments is below maxMergeAtOnce and the merged segment size is below the floor segment size.
> Said otherwise, "normal" merges would be allowed to merge up to mergeFactor segments like today, while small merges (size of the merged segment < floor segment bytes) could go up to maxMergeAtOnce.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org