You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Adrien Grand (Jira)" <ji...@apache.org> on 2021/07/28 08:11:00 UTC
[jira] [Commented] (LUCENE-7020) TieredMergePolicy - cascade maxMergeAtOnce setting to maxMergeAtOnceExplicit

    [ https://issues.apache.org/jira/browse/LUCENE-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388572#comment-17388572 ] 

Adrien Grand commented on LUCENE-7020:
--------------------------------------

I've just seen a similar issue to the one that Shawn is decribing. A small index (3.3GB) has more than 30 segments and ends up needing two rounds to be force-merged down to 1 segment. With the default settings, it takes 264s to force-merge this index. If I set the max number of segments to merge at once to 50, then force-merging down to 1 segment takes 190s, 28% faster.

An alternative I'd like to propose would be to raise the default value of maxMergeAtOnceExplicit to 50 instead of 30. While 30-segments indices can be as small as 2.2GB with the default configuration(10 2MB segments, 10 20MB segments and 10 200MB segments), a 50-segments index must be at least 72GB (10 2MB segments, 10 20MB segments, 10 200MB segments, 10 2GB segments and 10 5GB segments).

Or maybe we shouldn't limit the number of segments to merge at once with explicit merges? I understand the argument about read-ahead, but we also have data structures that are very CPU-intensive to merge like stored fields with index sorting, vectors or multi-dimensional points (when N>1) because they may need to recompute the data structure entirely. Avoiding cascading merges in such cases is very helpful. For the record, the example I gave above falls in none of these cases and yet already yields a significant speedup if it doesn't need to cascade merges.



> TieredMergePolicy - cascade maxMergeAtOnce setting to maxMergeAtOnceExplicit
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-7020
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7020
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: 5.4.1
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>            Priority: Major
>         Attachments: LUCENE-7020.patch
>
>
> SOLR-8621 covers improvements in configuring a merge policy in Solr.
> Discussions on that issue brought up the fact that if large values are configured for maxMergeAtOnce and segmentsPerTier, but maxMergeAtOnceExplicit is not changed, then doing a forceMerge is likely to not work as expected.
> When I first configured maxMergeAtOnce and segmentsPerTier to 35 in Solr, I saw an optimize (forceMerge) fully rewrite most of the index *twice* in order to achieve a single segment, because there were approximately 80 segments in the index before the optimize, and maxMergeAtOnceExplicit defaults to 30.  On advice given via the solr-user mailing list, I configured maxMergeAtOnceExplicit to 105 and have not had that problem since.
> I propose that setting maxMergeAtOnce should also set maxMergeAtOnceExplicit to three times the new value -- unless the setMaxMergeAtOnceExplicit method has been invoked, indicating that the user wishes to set that value themselves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org