You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Adrien Grand (Jira)" <ji...@apache.org> on 2022/05/30 16:04:00 UTC

[jira] [Created] (LUCENE-10599) Improve LogMergePolicy's handling of maxMergeSize

Adrien Grand created LUCENE-10599:
-------------------------------------

             Summary: Improve LogMergePolicy's handling of maxMergeSize
                 Key: LUCENE-10599
                 URL: https://issues.apache.org/jira/browse/LUCENE-10599
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Adrien Grand


LogMergePolicy excludes from merging segments whose size is greater than or equal to maxMergeSize. Since a segment whose size is maxMergeSize-1 is still considered for merging, segments will effectively reach a size somewhere between maxMergeSize and mergeFactor*maxMergeSize before they are not considered for merging anymore.

At least this is what I thought. When LogMergePolicy ignores a segment that is too large for merging, it also ignores other segments that are in the same window of mergeFactor segments for merging if they are on the same tier. So actually segments might reach a size that is somewhere between maxMergeSize / mergeFactor^0.75 and maxMergeSize * mergeFactor before they are not considered for merging anymore.

Assuming a merge factor of 10 and a max merge size of 1,000 this means that segments will reach their maximum size somewhere between 178 and 10,000. This range is too large and makes maxMergeSize too hard to reason about?

Specifically, if you have 10 999-docs segments, then LogDocMergePolicy will happily merge them into a single 9990-docs segment. However if you have one 1,000 segment and 9 180-docs segments, then the 180-docs segments will not get merged with any other segment, even if you keep adding segments to the index.

I propose to change this behavior so that when a large segment is encountered, then we wouldn't skip the entire window of mergeFactor segments, but just the segments that are too large.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org