You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Robert Muir (Jira)" <ji...@apache.org> on 2022/05/18 22:34:00 UTC

[jira] [Commented] (LUCENE-10569) Think again about the floor segment size?

    [ https://issues.apache.org/jira/browse/LUCENE-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539116#comment-17539116 ] 

Robert Muir commented on LUCENE-10569:
--------------------------------------

I agree. same with the stored fields stuff too. I'd love to get "merge policy slowness" out of the way to revisit that stuff, but yeah, its probably more important to solve the general issues around it. Or at least contain the damn thing more somehow (e.g. docs limit) and make it more fruitful (e.g. wait on merges to finish in reopen by default)

> Think again about the floor segment size?
> -----------------------------------------
>
>                 Key: LUCENE-10569
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10569
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> TieredMergePolicy has a floor segment size that it uses to prevent indexes from having a long tail of small segments, which would be very inefficient at search time. It is 2MB by default.
> While this floor segment size is good for searches, it also has the side effect of computing sub-optimal merges when segments are below this size. We came up whis 2MB floor segment size many years ago when Lucene was less space-efficient. I think we should consider lowering it at a minimum, and maybe move to a threshold on the document count rather than the byte size of the segment to better work with datasets of small or highly-compressible documents? Or maybe there are better ways?
> Separately, we should enable merge-on-refresh by default (LUCENE-10078) and only return suboptimal merges for merge-on-refresh, not regular background merges.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org