You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/02/10 14:05:43 UTC

[GitHub] [lucene] msokolov commented on a change in pull request #672: LUCENE-10418: Optimize `Query#rewrite` in the non-scoring case.

msokolov commented on a change in pull request #672:
URL: https://github.com/apache/lucene/pull/672#discussion_r803701669



##########
File path: lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
##########
@@ -191,51 +191,55 @@ boolean isPureDisjunction() {
     return clauses.iterator();
   }
 
-  private BooleanQuery rewriteNoScoring() {
-    boolean keepShould =
+  BooleanQuery rewriteNoScoring() {
+    boolean actuallyRewritten = false;
+    BooleanQuery.Builder newQuery =
+        new BooleanQuery.Builder().setMinimumNumberShouldMatch(getMinimumNumberShouldMatch());
+
+    final boolean keepShould =
         getMinimumNumberShouldMatch() > 0
             || (clauseSets.get(Occur.MUST).size() + clauseSets.get(Occur.FILTER).size() == 0);
 
-    if (clauseSets.get(Occur.MUST).size() == 0 && keepShould) {
-      return this;
-    }
-    BooleanQuery.Builder newQuery = new BooleanQuery.Builder();
-
-    newQuery.setMinimumNumberShouldMatch(getMinimumNumberShouldMatch());
     for (BooleanClause clause : clauses) {
-      switch (clause.getOccur()) {
-        case MUST:
-          {
-            newQuery.add(clause.getQuery(), Occur.FILTER);
-            break;
-          }
-        case SHOULD:
-          {
-            if (keepShould) {
-              newQuery.add(clause);
-            }
-            break;
-          }
-        case FILTER:
-        case MUST_NOT:
-        default:
-          {
-            newQuery.add(clause);
-          }
+      Query query = clause.getQuery();
+      Query rewritten = ConstantScoreQuery.rewriteNoScoring(query);
+      BooleanClause.Occur occur = clause.getOccur();
+      if (occur == Occur.SHOULD && keepShould == false) {
+        // ignore clause
+        actuallyRewritten = true;
+      } else if (occur == Occur.MUST) {
+        // replace MUST clauses with FILTER clauses
+        newQuery.add(rewritten, Occur.FILTER);
+        actuallyRewritten = true;
+      } else if (query != rewritten) {
+        newQuery.add(rewritten, occur);
+        actuallyRewritten = true;
+      } else {
+        newQuery.add(clause);
       }
     }
 
+    if (actuallyRewritten == false) {
+      return this;
+    }
+
     return newQuery.build();
   }
 
   @Override
   public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost)
       throws IOException {
-    BooleanQuery query = this;
     if (scoreMode.needsScores() == false) {
-      query = rewriteNoScoring();
+      Query rewritten = rewriteNoScoring();
+      if (this != rewritten) {
+        // Pass it back to IndexSearcher#rewrite, which might find new opportunities for rewriting

Review comment:
       this goes beyond the non-scoring case right? In theory it could result in additional rewrites for scoring queries as well?

##########
File path: lucene/core/src/java/org/apache/lucene/search/ConstantScoreQuery.java
##########
@@ -63,6 +65,22 @@ public Query rewrite(IndexReader reader) throws IOException {
     return super.rewrite(reader);
   }
 
+  /**
+   * Perform some simplifications that are only legal when a query is not expected to produce
+   * scores.
+   */
+  static Query rewriteNoScoring(Query query) {

Review comment:
       It might be nice to enable other queries to also be aware of the scoring/nonscoring mode? I think we have other queries that can have child queries like `DisjunctionMaxQuery` maybe positional queries? I mean this is already a step forward - progress! Just wondering if there are other opportunities




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org