You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/12/12 10:16:04 UTC

[GitHub] [lucene] jpountz opened a new pull request, #12011: Tune the amount of memory that is allocated to sorting postings upon flushing.

jpountz opened a new pull request, #12011:
URL: https://github.com/apache/lucene/pull/12011

   When flushing segments that have an index sort configured, postings lists get loaded into arrays and get reordered according to the index sort.
   
   This reordering is implemented with `TimSorter`, a variant of merge sort. Like merge sort, an important part of `TimSorter` consists of merging two contiguous sorted slices of the array into a combined sorted slice. This merging can be done either with external memory, which is the classical approach, or in-place, which still runs in linear time but with a much higher factor. Until now we were allocating a fixed budget of `maxDoc/64` for doing these merges with external memory. If this is not enough, sorted slices would be merged in place.
   
   I've been looking at some profiles recently for an index where a non-negligible chunk of the time was spent on in-place merges. So I would like to propose the following change:
    - Increase the maximum RAM budget to `maxDoc / 8`. This should help avoid in-place merges for all postings up to `docFreq = maxDoc / 4`.
    - Make this RAM budget lazily allocated, rather than eagerly like today. This would help not allocate memory in O(maxDoc) for fields like primary keys that only have a couple postings per term.
   
   So overall memory usage would never be more than 50% higher than what it is today, because `TimSorter` never needs more than X temporary slots if the postings list doesn't have at least 2*X entries, and these 2*X entries already get loaded into memory today. And for fields that have short postings, memory usage should actually be lower.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #12011: Tune the amount of memory that is allocated to sorting postings upon flushing.

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #12011:
URL: https://github.com/apache/lucene/pull/12011#issuecomment-1362512365

   I plan on merging it soon if there are no objections.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz merged pull request #12011: Tune the amount of memory that is allocated to sorting postings upon flushing.

Posted by GitBox <gi...@apache.org>.
jpountz merged PR #12011:
URL: https://github.com/apache/lucene/pull/12011


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org