You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/07/22 15:23:18 UTC

[GitHub] [lucene] dnhatn commented on a change in pull request #221: LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields.

dnhatn commented on a change in pull request #221:
URL: https://github.com/apache/lucene/pull/221#discussion_r674914345



##########
File path: lucene/core/src/java/org/apache/lucene/index/DocIDMerger.java
##########
@@ -144,59 +159,45 @@ protected boolean lessThan(Sub a, Sub b) {
     public void reset() throws IOException {
       // caller may not have fully consumed the queue:
       queue.clear();
+      current = null;
       boolean first = true;
       for (T sub : subs) {
         if (first) {
           // by setting mappedDocID = -1, this entry is guaranteed to be the top of the queue
           // so the first call to next() will advance it
           sub.mappedDocID = -1;
+          current = sub;
           first = false;
-        } else {
-          int mappedDocID;
-          while (true) {
-            int docID = sub.nextDoc();
-            if (docID == NO_MORE_DOCS) {
-              mappedDocID = NO_MORE_DOCS;
-              break;
-            }
-            mappedDocID = sub.docMap.get(docID);
-            if (mappedDocID != -1) {
-              break;
-            }
-          }
-          if (mappedDocID == NO_MORE_DOCS) {
-            // all docs in this sub were deleted; do not add it to the queue!
-            continue;
-          }
-          sub.mappedDocID = mappedDocID;
-        }
-        queue.add(sub);
+        } else if (sub.nextMappedDoc() != NO_MORE_DOCS) {
+          queue.add(sub);
+        } // else all docs in this sub were deleted; do not add it to the queue!
       }
+      queueNextMappedDoc = queue.size() == 0 ? NO_MORE_DOCS : queue.top().mappedDocID;
     }
 
     @Override
     public T next() throws IOException {
-      T top = queue.top();
-
-      while (true) {
-        int docID = top.nextDoc();
-        if (docID == NO_MORE_DOCS) {
-          queue.pop();
-          top = queue.top();
-          break;
-        }
-        int mappedDocID = top.docMap.get(docID);
-        if (mappedDocID == -1) {
-          // doc was deleted
-          continue;
+      int nextDoc = current.nextMappedDoc();
+      // the below condition is unlikely when the index sort is on a low-cardinality field
+      if (nextDoc >= queueNextMappedDoc) {
+        if (nextDoc == NO_MORE_DOCS) {
+          if (queue.size() == 0) {
+            current = null;
+            return null;
+          } else {
+            current = queue.pop();
+            queueNextMappedDoc = queue.size() == 0 ? NO_MORE_DOCS : queue.top().mappedDocID;

Review comment:
       I think we can remove `queueNextMappedDoc` and replace it with `queue.top().mappedDocID`? Below is my version without `queueNextMappedDoc`. WDYT?
   
   ```java
   int nextDoc = current.nextMappedDoc();
   // the below condition is unlikely when the index sort is on a low-cardinality field
   if (nextDoc == NO_MORE_DOCS) {
     if (queue.size() == 0) {
       return null;
     }
     current = queue.pop();
   } else if (queue.size() > 0 && nextDoc > queue.top().mappedDocID) {
     T newCurrent = queue.top();
     queue.updateTop(current);
     current = newCurrent;
   }
   return current;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org