You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2023/01/11 15:52:56 UTC

[GitHub] [lucene] javanna commented on pull request #12072: Fix exponential runtime for Boolean#rewrite

javanna commented on PR #12072:
URL: https://github.com/apache/lucene/pull/12072#issuecomment-1379007346

   I did some additional testing to understand the impact of the regression and in light of that I view this as a bug rather than a performance regression, because we end up performing many needless rewrite steps that waste resources and  end up making rewrite take tens of seconds depending on the depth of a boolean query.
   
   I added some logging to the rewrite methods involved to more clearly show the issue. The following is the output from a boolean query with depth 3.
   
   ```
       IndexSearcher searcher = newSearcher(new MultiReader());
       Directory dir = newDirectory();
       (new RandomIndexWriter(random(), dir)).close();
       IndexReader r = DirectoryReader.open(dir);
   
       int depth = 3;
       BooleanQuery.Builder bq = new BooleanQuery.Builder().add(new TermQuery(new Term("field", "value")), Occur.MUST);
       for (int i = 0; i < depth; i++) {
         bq = new BooleanQuery.Builder().add(bq.build(), Occur.MUST).add(new TermQuery(new Term("depth" + (depth - i), "value")), Occur.MUST);
       }
       BooleanQuery booleanQuery = new BooleanQuery.Builder().add(bq.build(), Occur.FILTER).build();
       Query rewritten = searcher.rewrite(booleanQuery);
       System.out.println("final rewritten query: " + rewritten);
       r.close();
       dir.close();
   ```
   
   Output before the fix (87 lines):
   ```
   boolean query rewrite: #(+(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value) -> (ConstantScore(+(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value))^0.0
   ---------------------------------------------- top level rewrite round completed
   + constant score query rewriting inner query: +(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value
   boolean query rewrite: +field:value -> field:value
   boolean query rewrite: +(+field:value) +depth3:value -> +field:value +depth3:value
   boolean query rewrite: +(+(+field:value) +depth3:value) +depth2:value -> +(+field:value +depth3:value) +depth2:value
   boolean query rewrite: +(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value -> +(+(+field:value +depth3:value) +depth2:value) +depth1:value
   + constant score query rewriting inner boolean query no scoring: +(+(+field:value +depth3:value) +depth2:value) +depth1:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: +(+field:value +depth3:value) +depth2:value
   boolean query rewrite: +field:value +depth3:value -> +field:value +depth3:value
   boolean query rewrite: +(+field:value +depth3:value) +depth2:value -> +(+field:value +depth3:value) +depth2:value
   + constant score query rewriting inner boolean query no scoring: +(+field:value +depth3:value) +depth2:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: +field:value +depth3:value
   boolean query rewrite: +field:value +depth3:value -> +field:value +depth3:value
   + constant score query rewriting inner boolean query no scoring: +field:value +depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   boolean query rewrite no scoring: +field:value +depth3:value -> #field:value #depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth2:value
   boolean query rewrite no scoring: +(+field:value +depth3:value) +depth2:value -> #(#field:value #depth3:value) #depth2:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth1:value
   boolean query rewrite no scoring: +(+(+field:value +depth3:value) +depth2:value) +depth1:value -> #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   ---------------------------------------------- top level rewrite round completed
   + constant score query rewriting inner query: #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   + constant score query rewriting inner query: #(#field:value #depth3:value) #depth2:value
   + constant score query rewriting inner query: #field:value #depth3:value
   + constant score query rewriting inner query: field:value
   + constant score query rewriting inner query: depth3:value
   boolean query rewrite: #field:value #depth3:value -> #field:value #depth3:value
   + constant score query rewriting inner boolean query no scoring: #field:value #depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   + constant score query rewriting inner query: depth2:value
   boolean query rewrite: #(#field:value #depth3:value) #depth2:value -> #(#field:value #depth3:value) #depth2:value
   + constant score query rewriting inner boolean query no scoring: #(#field:value #depth3:value) #depth2:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: #field:value #depth3:value
   + constant score query rewriting inner query: field:value
   + constant score query rewriting inner query: depth3:value
   boolean query rewrite: #field:value #depth3:value -> #field:value #depth3:value
   + constant score query rewriting inner boolean query no scoring: #field:value #depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth2:value
   + constant score query rewriting inner query: depth1:value
   boolean query rewrite: #(#(#field:value #depth3:value) #depth2:value) #depth1:value -> #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   + constant score query rewriting inner boolean query no scoring: #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: #(#field:value #depth3:value) #depth2:value
   + constant score query rewriting inner query: #field:value #depth3:value
   + constant score query rewriting inner query: field:value
   + constant score query rewriting inner query: depth3:value
   boolean query rewrite: #field:value #depth3:value -> #field:value #depth3:value
   + constant score query rewriting inner boolean query no scoring: #field:value #depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   + constant score query rewriting inner query: depth2:value
   boolean query rewrite: #(#field:value #depth3:value) #depth2:value -> #(#field:value #depth3:value) #depth2:value
   + constant score query rewriting inner boolean query no scoring: #(#field:value #depth3:value) #depth2:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: #field:value #depth3:value
   + constant score query rewriting inner query: field:value
   + constant score query rewriting inner query: depth3:value
   boolean query rewrite: #field:value #depth3:value -> #field:value #depth3:value
   + constant score query rewriting inner boolean query no scoring: #field:value #depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth2:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth1:value
   final rewritten query: (ConstantScore(#(#(#field:value #depth3:value) #depth2:value) #depth1:value))^0.0
   ```
   
   Output after the fix (52 lines):
   
   ```
   boolean query rewrite: #(+(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value) -> (ConstantScore(+(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value))^0.0
   ---------------------------------------------- top level rewrite round completed
   + constant score query rewriting inner query: +(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value
   boolean query rewrite: +field:value -> field:value
   boolean query rewrite: +(+field:value) +depth3:value -> +field:value +depth3:value
   boolean query rewrite: +(+(+field:value) +depth3:value) +depth2:value -> +(+field:value +depth3:value) +depth2:value
   boolean query rewrite: +(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value -> +(+(+field:value +depth3:value) +depth2:value) +depth1:value
   + constant score query rewriting inner boolean query no scoring: +(+(+field:value +depth3:value) +depth2:value) +depth1:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   boolean query rewrite no scoring: +field:value +depth3:value -> #field:value #depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth2:value
   boolean query rewrite no scoring: +(+field:value +depth3:value) +depth2:value -> #(#field:value #depth3:value) #depth2:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth1:value
   boolean query rewrite no scoring: +(+(+field:value +depth3:value) +depth2:value) +depth1:value -> #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   ---------------------------------------------- top level rewrite round completed
   + constant score query rewriting inner query: #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   + constant score query rewriting inner query: #(#field:value #depth3:value) #depth2:value
   + constant score query rewriting inner query: #field:value #depth3:value
   + constant score query rewriting inner query: field:value
   + constant score query rewriting inner query: depth3:value
   boolean query rewrite: #field:value #depth3:value -> #field:value #depth3:value
   + constant score query rewriting inner boolean query no scoring: #field:value #depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   + constant score query rewriting inner query: depth2:value
   boolean query rewrite: #(#field:value #depth3:value) #depth2:value -> #(#field:value #depth3:value) #depth2:value
   + constant score query rewriting inner boolean query no scoring: #(#field:value #depth3:value) #depth2:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth2:value
   + constant score query rewriting inner query: depth1:value
   boolean query rewrite: #(#(#field:value #depth3:value) #depth2:value) #depth1:value -> #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   + constant score query rewriting inner boolean query no scoring: #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth2:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth1:value
   final rewritten query: (ConstantScore(#(#(#field:value #depth3:value) #depth2:value) #depth1:value))^0.0
   ```
   
   Output after shortcutting the main rewrite method to rewriteNoScoring instead of wrapping a boolean query with constant score query (39 lines):
   
   ```
   boolean query rewrite: #(+(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value) -> (ConstantScore(+(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value))^0.0
   ---------------------------------------------- top level rewrite round completed
   + constant score query rewriting inner query: +(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value
   boolean query rewrite: +field:value -> field:value
   boolean query rewrite: +(+field:value) +depth3:value -> +field:value +depth3:value
   boolean query rewrite: +(+(+field:value) +depth3:value) +depth2:value -> +(+field:value +depth3:value) +depth2:value
   boolean query rewrite: +(+(+(+field:value) +depth3:value) +depth2:value) +depth1:value -> +(+(+field:value +depth3:value) +depth2:value) +depth1:value
   + constant score query rewriting inner boolean query no scoring: +(+(+field:value +depth3:value) +depth2:value) +depth1:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   boolean query rewrite no scoring: +field:value +depth3:value -> #field:value #depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth2:value
   boolean query rewrite no scoring: +(+field:value +depth3:value) +depth2:value -> #(#field:value #depth3:value) #depth2:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth1:value
   boolean query rewrite no scoring: +(+(+field:value +depth3:value) +depth2:value) +depth1:value -> #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   ---------------------------------------------- top level rewrite round completed
   + constant score query rewriting inner query: #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth2:value
   + constant score query rewriting inner query: depth1:value
   boolean query rewrite: #(#(#field:value #depth3:value) #depth2:value) #depth1:value -> #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   + constant score query rewriting inner boolean query no scoring: #(#(#field:value #depth3:value) #depth2:value) #depth1:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: field:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth3:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth2:value
   + Artificially wrapping into constant score query to rewrite
   + constant score query rewriting inner query: depth1:value
   final rewritten query: (ConstantScore(#(#(#field:value #depth3:value) #depth2:value) #depth1:value))^0.0
   ```
   
   Obviously the effect of this grows further as the depth of the query increases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org