You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/06/15 16:57:16 UTC

[GitHub] [lucene] jpountz opened a new pull request, #961: Handle more cases in `BooleanWeight#count`.

jpountz opened a new pull request, #961:
URL: https://github.com/apache/lucene/pull/961

   As suggested by @zhaih on #950, we could support more cases in
   `BooleanWeight#count`. This PR adds support for these cases specifically:
    - Pure disjunctions where only one clause has a non-zero count.
    - Pure disjunctions where one clause matches all docs.
    - Negations where positive clauses match all docs (pure negation).
    - Negations where positive clauses match no docs.
    - Negations where negative clauses match no docs.
    - Negations where negative clauses match all docs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #961: Handle more cases in `BooleanWeight#count`.

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #961:
URL: https://github.com/apache/lucene/pull/961#issuecomment-1157403188

   Thanks for the review, I pushed a comment to clarify that more cases could be handled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz merged pull request #961: Handle more cases in `BooleanWeight#count`.

Posted by GitBox <gi...@apache.org>.
jpountz merged PR #961:
URL: https://github.com/apache/lucene/pull/961


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a diff in pull request #961: Handle more cases in `BooleanWeight#count`.

Posted by GitBox <gi...@apache.org>.
zhaih commented on code in PR #961:
URL: https://github.com/apache/lucene/pull/961#discussion_r898701671


##########
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##########
@@ -346,47 +346,100 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
 
   @Override
   public int count(LeafReaderContext context) throws IOException {
-    // Implement counting for pure conjunctions in the case when one clause doesn't match any docs,
-    // or all clauses but one match all docs.
-    if (weightedClauses.isEmpty()) {
-      return 0;
-    }
-    for (WeightedBooleanClause weightedClause : weightedClauses) {
-      switch (weightedClause.clause.getOccur()) {
-        case FILTER:
-        case MUST:
-          break;
-        case MUST_NOT:
-        case SHOULD:
-        default:
-          return super.count(context);
+    final int numDocs = context.reader().numDocs();
+    int positiveCount;
+    if (query.isPureDisjunction()) {
+      positiveCount = optCount(context, Occur.SHOULD);
+    } else if (query.getClauses(Occur.FILTER).isEmpty() == false
+        || query.getClauses(Occur.MUST).isEmpty() == false) {
+      if (query.getMinimumNumberShouldMatch() > 0) {
+        positiveCount = -1;
+      } else {
+        positiveCount = reqCount(context);
       }
+    } else {
+      positiveCount = -1;
+    }
+
+    if (positiveCount == 0) {
+      return 0;
     }
-    if (query.getMinimumNumberShouldMatch() > 0) {
-      return super.count(context);
+
+    int prohibitedCount = optCount(context, Occur.MUST_NOT);
+    if (prohibitedCount == -1) {
+      return -1;
+    } else if (prohibitedCount == 0) {
+      return positiveCount;
+    } else if (prohibitedCount == numDocs) {
+      return 0;
+    } else if (positiveCount == numDocs) {
+      return numDocs - prohibitedCount;
+    } else {
+      return -1;
     }
-    // From now on we know the query is a pure conjunction
+  }
+
+  /**
+   * Return the number of matches of required clauses, or -1 if unknown, or numDocs if there are no
+   * required clauses.
+   */
+  private int reqCount(LeafReaderContext context) throws IOException {
     final int numDocs = context.reader().numDocs();
-    int conjunctionCount = numDocs;
+    int reqCount = numDocs;
     for (WeightedBooleanClause weightedClause : weightedClauses) {
+      if (weightedClause.clause.isRequired() == false) {
+        continue;
+      }
       int count = weightedClause.weight.count(context);
       if (count == -1 || count == 0) {
         // If the count of one clause is unknown, then the count of the conjunction is unknown too.
         // If one clause doesn't match any docs then the conjunction doesn't match any docs either.
         return count;
       } else if (count == numDocs) {
         // the query matches all docs, it can be safely ignored
-      } else if (conjunctionCount == numDocs) {
+      } else if (reqCount == numDocs) {
         // all clauses seen so far match all docs, so the count of the new clause is also the count
         // of the conjunction
-        conjunctionCount = count;
+        reqCount = count;
       } else {
         // We have two clauses whose count is in [1, numDocs), we can't figure out the number of
         // docs that match the conjunction without running the query.
-        return super.count(context);
+        return -1;
+      }
+    }
+    return reqCount;
+  }
+
+  /**
+   * Return the number of matches of required clauses, or -1 if unknown, or 0 if there are no

Review Comment:
   number of matches of optional (instead of "required") clauses?



##########
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##########
@@ -346,47 +346,100 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
 
   @Override
   public int count(LeafReaderContext context) throws IOException {
-    // Implement counting for pure conjunctions in the case when one clause doesn't match any docs,
-    // or all clauses but one match all docs.
-    if (weightedClauses.isEmpty()) {
-      return 0;
-    }
-    for (WeightedBooleanClause weightedClause : weightedClauses) {
-      switch (weightedClause.clause.getOccur()) {
-        case FILTER:
-        case MUST:
-          break;
-        case MUST_NOT:
-        case SHOULD:
-        default:
-          return super.count(context);
+    final int numDocs = context.reader().numDocs();
+    int positiveCount;
+    if (query.isPureDisjunction()) {
+      positiveCount = optCount(context, Occur.SHOULD);
+    } else if (query.getClauses(Occur.FILTER).isEmpty() == false
+        || query.getClauses(Occur.MUST).isEmpty() == false) {
+      if (query.getMinimumNumberShouldMatch() > 0) {
+        positiveCount = -1;

Review Comment:
   There'll be still some rare cases here? When `minShouldMatch > 0` and no should clause match; when `minShouldMatch == 1` and should clauses matches all?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org