You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/12/14 11:33:40 UTC

[GitHub] [lucene] LuXugang opened a new pull request, #12017: Aggressive `count` in BooleanWeight

LuXugang opened a new pull request, #12017:
URL: https://github.com/apache/lucene/pull/12017

   When BooleanQuery is pure disjunction, if at least one clause could match all docs, then we could get right `count`  even though there was other clause whose `count` is  unknown. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

Posted by GitBox <gi...@apache.org>.
LuXugang commented on code in PR #12017:
URL: https://github.com/apache/lucene/pull/12017#discussion_r1049163040


##########
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##########
@@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Exception {
     }
   }
 
+  public void testAggressiveMatchCount() throws IOException {
+    Directory dir = newDirectory();
+    IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig());
+    Document doc = new Document();
+    LongPoint longPoint = new LongPoint("long", 3L, 4L, 5L);
+    doc.add(longPoint);
+    StringField stringField = new StringField("string", "abc", Store.NO);
+    doc.add(stringField);
+    writer.addDocument(doc);
+    longPoint.setLongValues(10L, 11L, 12L);
+    stringField.setStringValue("xyz");
+    writer.addDocument(doc);
+
+    IndexReader reader = DirectoryReader.open(writer);
+    writer.close();
+    IndexSearcher searcher = new IndexSearcher(reader);
+
+    long[] lower = new long[] {4L, 5L, 6L};
+    long[] upper = new long[] {9L, 10L, 11L};
+    Query unknownCountQuery = LongPoint.newRangeQuery("long", lower, upper);
+    assert reader.leaves().size() == 1;
+    assert searcher
+            .createWeight(unknownCountQuery, ScoreMode.COMPLETE, 1f)
+            .count(reader.leaves().get(0))
+        == -1;
+
+    Query query =
+        new BooleanQuery.Builder()
+            .add(new TermQuery(new Term("string", "xyz")), Occur.MUST)
+            .add(unknownCountQuery, Occur.MUST_NOT)
+            .add(new MatchAllDocsQuery(), Occur.MUST_NOT)
+            .build();
+    Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f);
+    // count of the first MUST_NOT clause is unknown, but the second MUST_NOT clause matches all
+    // docs
+    assertEquals(0, weight.count(reader.leaves().get(0)));
+
+    query =
+        new BooleanQuery.Builder()
+            .add(new TermQuery(new Term("string", "xyz")), Occur.MUST)
+            .add(unknownCountQuery, Occur.MUST_NOT)
+            .add(new TermQuery(new Term("string", "abc")), Occur.MUST_NOT)
+            .build();
+    weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f);
+    // count of the first MUST_NOT clause is unknown, though the second MUST_NOT clause matche one
+    // doc, we can't figure out the number of
+    // docs
+    assertEquals(-1, weight.count(reader.leaves().get(0)));
+
+    // test pure disConjunction

Review Comment:
   addressed in https://github.com/apache/lucene/pull/12017/commits/272dec54a59dedc666cf8311c09830c94b3c5369



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

Posted by GitBox <gi...@apache.org>.
LuXugang commented on code in PR #12017:
URL: https://github.com/apache/lucene/pull/12017#discussion_r1049162986


##########
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##########
@@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Exception {
     }
   }
 
+  public void testAggressiveMatchCount() throws IOException {

Review Comment:
   addressed in https://github.com/apache/lucene/pull/12017/commits/272dec54a59dedc666cf8311c09830c94b3c5369 and https://github.com/apache/lucene/pull/12017/commits/ae3f8d67ed75724ac2662c24025c85ae7008612a



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

Posted by GitBox <gi...@apache.org>.
jpountz commented on code in PR #12017:
URL: https://github.com/apache/lucene/pull/12017#discussion_r1048383541


##########
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##########
@@ -470,14 +470,19 @@ private int reqCount(LeafReaderContext context) throws IOException {
   private int optCount(LeafReaderContext context, Occur occur) throws IOException {
     final int numDocs = context.reader().numDocs();
     int optCount = 0;
+    boolean unknownCount = false;
     for (WeightedBooleanClause weightedClause : weightedClauses) {
       if (weightedClause.clause.getOccur() != occur) {
         continue;
       }
       int count = weightedClause.weight.count(context);
-      if (count == -1 || count == numDocs) {
-        // If any of the clauses has a number of matches that is unknown, the number of matches of
-        // the disjunction is unknown.
+      if (count == -1) {
+        // If one clause has a number of matches that is unknown, let's be more aggressive to check
+        // whether remain clauses could match all docs.
+        unknownCount = true;
+        continue;

Review Comment:
   use an `else if` instead of `continue` for consistency with how other cases are handled?



##########
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##########
@@ -492,7 +497,10 @@ private int optCount(LeafReaderContext context, Occur occur) throws IOException
         return -1;

Review Comment:
   Maybe this line should also set `unkwownCount = true` instead of returning `-1` right away?



##########
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##########
@@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Exception {
     }
   }
 
+  public void testAggressiveMatchCount() throws IOException {

Review Comment:
   Fold these tests into the existing `testDisjunctionMatchesCount` test?



##########
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##########
@@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Exception {
     }
   }
 
+  public void testAggressiveMatchCount() throws IOException {
+    Directory dir = newDirectory();
+    IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig());
+    Document doc = new Document();
+    LongPoint longPoint = new LongPoint("long", 3L, 4L, 5L);
+    doc.add(longPoint);
+    StringField stringField = new StringField("string", "abc", Store.NO);
+    doc.add(stringField);
+    writer.addDocument(doc);
+    longPoint.setLongValues(10L, 11L, 12L);
+    stringField.setStringValue("xyz");
+    writer.addDocument(doc);
+
+    IndexReader reader = DirectoryReader.open(writer);
+    writer.close();
+    IndexSearcher searcher = new IndexSearcher(reader);
+
+    long[] lower = new long[] {4L, 5L, 6L};
+    long[] upper = new long[] {9L, 10L, 11L};
+    Query unknownCountQuery = LongPoint.newRangeQuery("long", lower, upper);
+    assert reader.leaves().size() == 1;
+    assert searcher
+            .createWeight(unknownCountQuery, ScoreMode.COMPLETE, 1f)
+            .count(reader.leaves().get(0))
+        == -1;
+
+    Query query =
+        new BooleanQuery.Builder()
+            .add(new TermQuery(new Term("string", "xyz")), Occur.MUST)
+            .add(unknownCountQuery, Occur.MUST_NOT)
+            .add(new MatchAllDocsQuery(), Occur.MUST_NOT)
+            .build();
+    Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f);
+    // count of the first MUST_NOT clause is unknown, but the second MUST_NOT clause matches all
+    // docs
+    assertEquals(0, weight.count(reader.leaves().get(0)));
+
+    query =
+        new BooleanQuery.Builder()
+            .add(new TermQuery(new Term("string", "xyz")), Occur.MUST)
+            .add(unknownCountQuery, Occur.MUST_NOT)
+            .add(new TermQuery(new Term("string", "abc")), Occur.MUST_NOT)
+            .build();
+    weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1f);
+    // count of the first MUST_NOT clause is unknown, though the second MUST_NOT clause matche one
+    // doc, we can't figure out the number of
+    // docs
+    assertEquals(-1, weight.count(reader.leaves().get(0)));
+
+    // test pure disConjunction

Review Comment:
   ```suggestion
       // test pure disjunction
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] LuXugang commented on a diff in pull request #12017: Aggressive `count` in BooleanWeight

Posted by GitBox <gi...@apache.org>.
LuXugang commented on code in PR #12017:
URL: https://github.com/apache/lucene/pull/12017#discussion_r1049162892


##########
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##########
@@ -470,14 +470,19 @@ private int reqCount(LeafReaderContext context) throws IOException {
   private int optCount(LeafReaderContext context, Occur occur) throws IOException {
     final int numDocs = context.reader().numDocs();
     int optCount = 0;
+    boolean unknownCount = false;
     for (WeightedBooleanClause weightedClause : weightedClauses) {
       if (weightedClause.clause.getOccur() != occur) {
         continue;
       }
       int count = weightedClause.weight.count(context);
-      if (count == -1 || count == numDocs) {
-        // If any of the clauses has a number of matches that is unknown, the number of matches of
-        // the disjunction is unknown.
+      if (count == -1) {
+        // If one clause has a number of matches that is unknown, let's be more aggressive to check
+        // whether remain clauses could match all docs.
+        unknownCount = true;
+        continue;

Review Comment:
   addressed in  https://github.com/apache/lucene/pull/12017/commits/dc13ae7bddc3ca272af7c45dd998258984469904



##########
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java:
##########
@@ -492,7 +497,10 @@ private int optCount(LeafReaderContext context, Occur occur) throws IOException
         return -1;

Review Comment:
   Thanks @jpountz , you are right. addressed in  https://github.com/apache/lucene/pull/12017/commits/477b2a4e1f95f408562fc1745979bf9815ffaa75



##########
lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java:
##########
@@ -1015,6 +1015,80 @@ public void testDisjunctionRandomClausesMatchesCount() throws Exception {
     }
   }
 
+  public void testAggressiveMatchCount() throws IOException {

Review Comment:
   addressed in https://github.com/apache/lucene/pull/12017/commits/272dec54a59dedc666cf8311c09830c94b3c5369



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] LuXugang merged pull request #12017: Aggressive `count` in BooleanWeight

Posted by GitBox <gi...@apache.org>.
LuXugang merged PR #12017:
URL: https://github.com/apache/lucene/pull/12017


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org