You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/11/14 08:59:59 UTC

[GitHub] [lucene] bruno-roustant commented on a change in pull request #430: LUCENE-10225: Improve IntroSelector.

bruno-roustant commented on a change in pull request #430:
URL: https://github.com/apache/lucene/pull/430#discussion_r748823600



##########
File path: lucene/core/src/java/org/apache/lucene/util/IntroSelector.java
##########
@@ -185,6 +204,115 @@ private void shuffle(int from, int to) {
     }
   }
 
+  /** Selects the k-th entry with a bottom-k algorithm, given that k is close to {@code from}. */

Review comment:
       Good remark, it means I miss more doc.
   This algorithm is my own, and the idea is actually quite simple, maybe the code is not clear enough.
   When k is close to from, we take an int array of size (from - k + 1) called bottom, and each bottom array element i points to the corresponding (from + i) entry. And we determine the max of this bottom array. Then we loop on all the remaining entries, for each entry e we compare it to the max of bottom, if e < max then e evicts max and takes it slot in bottom, and we determine the new max of bottom. (the speed comes from the fact that most of the time we only compare e to bottom-max)
   At the end, all slots in bottom point to the k least entries, we just have to swap them and finally put the bottom max at index k.
   
   I'll add more doc!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org