You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/08 18:35:04 UTC

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #1068: LUCENE-10674: Update subiterators when BitSetConjDISI exhausts

jmazanec15 commented on code in PR #1068:
URL: https://github.com/apache/lucene/pull/1068#discussion_r966298807


##########
lucene/core/src/java/org/apache/lucene/search/ConjunctionDISI.java:
##########
@@ -281,6 +281,12 @@ private int doNext(int doc) throws IOException {
       advanceLead:
       for (; ; doc = lead.nextDoc()) {
         if (doc >= minLength) {
+          if (doc != NO_MORE_DOCS) {
+            lead.advance(NO_MORE_DOCS);
+          }
+          for (BitSetIterator iterator : bitSetIterators) {
+            iterator.setDocId(NO_MORE_DOCS);
+          }

Review Comment:
   > The if statement makes sense to me, but I'm curious how you managed to hit this case. This suggests that we create BitSets whose size is not maxDoc, do you know where this happens?
   
   I think I might be misunderstanding the question. Each bitsetiterator could have a different length of bitset, potentially as an optimization ([minLength](https://github.com/apache/lucene/blob/branch_9_4/lucene/core/src/java/org/apache/lucene/search/ConjunctionDISI.java#L256) I think suggests this is expected). If a bitsetiterator's top match is 10 and there are 1M docs in the index, I think there was no reason to store 1M bits - the bitsetiterator can just exhaust after 10.
   
   
   > The for loop should be unnecessary, there is no guarantee that all sub iterators advance to NO_MORE_DOCS. If this causes problems, then it means we have another bug somewhere else?
   
   Agree this is probably unnecessary. I added it to ensure that [this statement](https://github.com/apache/lucene/blob/branch_9_4/lucene/core/src/java/org/apache/lucene/search/ConjunctionDISI.java#L31-L32) holds: "Requires that all of its sub-iterators must be on the same document all the time."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org