You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/02/10 14:42:16 UTC

[GitHub] [lucene] rmuir commented on a change in pull request #672: LUCENE-10418: Optimize `Query#rewrite` in the non-scoring case.

rmuir commented on a change in pull request #672:
URL: https://github.com/apache/lucene/pull/672#discussion_r803745240



##########
File path: lucene/core/src/java/org/apache/lucene/search/ConstantScoreQuery.java
##########
@@ -63,6 +65,22 @@ public Query rewrite(IndexReader reader) throws IOException {
     return super.rewrite(reader);
   }
 
+  /**
+   * Perform some simplifications that are only legal when a query is not expected to produce
+   * scores.
+   */
+  static Query rewriteNoScoring(Query query) {

Review comment:
       I question how much complexity we should add to optimize really degenerate inputs such as `ConstantScoreQuery(DisjunctionMaxQuery())`. I also think it might be better to put such logic here, not in e.g. booleanquery. The optimization is specific to CSQ, no? For example for your DisjunctionMaxQuery case:
   ```java
   } else if (query instanceof DisjunctionMaxQuery) {
     // since we don't care about scoring, turn it into a simple booleanquery... does this even make it faster?
     var builder = new BooleanQuery.Builder();
     for (Query subQuery : (DisjunctionMaxQuery)query) {
       builder.add(subQuery, Occur.SHOULD);
     }
     return builder.build();
   }
   ```
   It might also make the logic easier to follow for the BooleanQuery case too, especially the recursive piece. I personally think it is a lot better than adding `if needsScores == false` conditional logic everywhere to that already hairy code. If you move it to CSQ, then there's no conditional anymore, and it just seems like a better home.
   
   In general, its messy either way because I think we make it messy. I hate that we have `Query.rewrite` but here now we have it happening in `Query.createWeight` too. 
   
   It is also unclear to me if this optimization happens for all the correct places, where scores are not needed. This doesn't necessarily mean we need to add more abstractions or API complexity to make it work cleanly. For example in `IndexSearcher.count`, when it has to fall back to `search()` to do the counting, it doesn't need scores. it can wrap the query in a ConstantScoreQuery to get the optimizations. Probably BooleanQuery could do the same with its`FILTER` clauses?
   
   It is just one potential option, to really make this "non-scoring rewrite"  case easier to optimize everywhere: wrap it in a ConstantScoreQuery and you get all the optimizations. There are probably other alternatives we can consider too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org