You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/08 14:51:27 UTC

[GitHub] [lucene] gsmiller commented on a diff in pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

gsmiller commented on code in PR #11738:
URL: https://github.com/apache/lucene/pull/11738#discussion_r966070400


##########
lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java:
##########
@@ -165,9 +143,46 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException {
 
         PostingsEnum docs = null;
 
-        final List<TermAndState> collectedTerms = new ArrayList<>();
-        if (collectTerms(context, termsEnum, collectedTerms)) {
-          // build a boolean query
+        // We will first try to collect up to 'threshold' terms into 'matchingTerms'
+        // if there are too many terms, we will fall back to building the 'builder'

Review Comment:
   Thanks for the suggestion @rmuir. Let me see if I can use the existing code structure a bit more in this change. The reason I didn't want to just call `collectTerms` as-is is that we could unnecessarily seek and load term states when we've already found a term covering all docs. For example, if the first term we visit covers all terms, we can just stop there.
   
   I'm also not sure I'm following your point about reallocating `collectedTerms` as part of this change? That's certainly not my intention with this code, but maybe I'm staring at a bug and not realizing it? As soon as we hit the size threshold, we should be nulling out `collectTerms`, initializing a building and just using that for the remaining term iteration. Apologies if I'm overlooking something though. Entirely possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org