You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Adrien Grand (Jira)" <ji...@apache.org> on 2020/09/17 17:59:00 UTC
[jira] [Resolved] (LUCENE-9529) Larger stored fields block sizes
mean we're more likely to disable optimized bulk merging
[ https://issues.apache.org/jira/browse/LUCENE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand resolved LUCENE-9529.
----------------------------------
Fix Version/s: 8.7
Resolution: Fixed
> Larger stored fields block sizes mean we're more likely to disable optimized bulk merging
> -----------------------------------------------------------------------------------------
>
> Key: LUCENE-9529
> URL: https://issues.apache.org/jira/browse/LUCENE-9529
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Fix For: 8.7
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Whenever possible when merging stored fields, Lucene tries to copy the compressed data instead of decompressing the source segment to then re-compressing in the destination segment. A problem with this approach is that if some blocks are incomplete (typically the last block of a segment) then it remains incomplete in the destination segment too, and if we do it for too long we end up with a bad compression ratio. So Lucene keeps track of these incomplete blocks, and makes sure to keep a ratio of incomplete blocks below 1%.
> But as we increased the block size, it has become more likely to have a high ratio of incomplete blocks. E.g. if you have a segment with 1MB of stored fields, with 16kB blocks like before, you have 63 complete blocks and 1 incomplete block, or 1.6%. But now with ~512kB blocks, you have one complete block and 1 incomplete block, ie. 50%.
> I'm not sure how to fix it or even whether it should be fixed but wanted to open an issue to track this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org