You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/08 02:20:21 UTC

[GitHub] [lucene] rmuir commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

rmuir commented on code in PR #11738:
URL: https://github.com/apache/lucene/pull/11738#discussion_r965442751


##########
lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java:
##########
@@ -165,9 +143,46 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException {
 
         PostingsEnum docs = null;
 
-        final List<TermAndState> collectedTerms = new ArrayList<>();
-        if (collectTerms(context, termsEnum, collectedTerms)) {
-          // build a boolean query
+        // We will first try to collect up to 'threshold' terms into 'matchingTerms'
+        // if there are too many terms, we will fall back to building the 'builder'

Review Comment:
   But this comment isn't really what happens. Instead, if we hit the threshold (16), we just null out our ArrayList of `collectedTerms`, and continue to iterate through the same loop and allocate another ArrayList of `collectedTerms` for the next 16 terms. and over and over again.
   
   So if there are millions of terms, we add a lot more allocations and GC pressure to do this optimization. I would prefer if we would "really bail" after we hit the limit, so it behaves as fast as before for the "huge numbers of terms" case. Seems like handling the case for `if (builder != null)` in the if-then-else decision tree here might do the trick?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org