You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/05/07 21:44:50 UTC

[GitHub] [lucene] jpountz commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building

jpountz commented on a change in pull request #91:
URL: https://github.com/apache/lucene/pull/91#discussion_r628551730



##########
File path: lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointsReaderUtils.java
##########
@@ -35,63 +37,60 @@
 
   MutablePointsReaderUtils() {}
 
-  /** Sort the given {@link MutablePointValues} based on its packed value then doc ID. */
+  /**
+   * Sort the given {@link MutablePointValues} based on its packed value, note that doc ID is not
+   * taken into sorting algorithm, since if they are already in ascending order, stable sort is able
+   * to maintain the ordering of doc ID.
+   */

Review comment:
       Oh sorry I had missed your comments. The code path I'm concerned about is this one https://github.com/apache/lucene/blob/d03662c48bfc5bf2be4840a7f743f9cb64b17fee/lucene/core/src/java/org/apache/lucene/index/PointValuesWriter.java#L178-L201 when sortMap is not null, which ends up calling MutablePointsReaderUtil.sort. In practice sortMap is not null when index sorting is configured.
   
   I guess we have 3 options:
    1. Make MutablePointsReaderUtils check whether doc IDs are ordered first, so that it knows whether to use doc IDs when sorting.
    2. Change PointValuesWriter to sort by doc ID before calling MutablePointsReaderUtil.sort when index sorting is enabled.
    3. Change PointValuesWriter to do a second pass over results after calling `MutablePointsReaderUtil.sort` to sort identical values by doc ID when index sorting is enabled.
   
   I suggested (1) initially and your suggestion is (2), which might be a good idea actually if we can reuse the sortMap to reorder the `MutablePointsReader` by doc ID without actually having to sort (ie. O(n) rather than O(n log n)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org