You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/08 15:13:24 UTC

[GitHub] [lucene] gsmiller commented on pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

gsmiller commented on PR #11738:
URL: https://github.com/apache/lucene/pull/11738#issuecomment-1240856191

   @jpountz:
   
   > It might not be a big win in practice, but it should be enough to compare the docFreq with the docCount (rather than maxDoc) and use this postings whose docFreq is equal to docCount as an iterator of matches.
   
   I like that idea. I wonder if checking for both conditions makes sense? If a term contains all docs in the segment, it should be more efficient to use `DocIdSet#all` right? (rather than iterating the actual postings). But, if a term doesn't contain all docs in the segment but _does_ contain all docs in the field (i.e., the field isn't completely dense), we could add an additional optimization here to use that single term's postings. Is that what you had in mind?
   
   Here's what I'm thinking:
   ```
             int docFreq = termsEnum.docFreq();
             if (reader.maxDoc() == docFreq) {
               return new WeightOrDocIdSet(DocIdSet.all(docFreq));
             } else if (terms.getDocCount() == docFreq) {
               TermStates termStates = new TermStates(searcher.getTopReaderContext());
               termStates.register(termsEnum.termState(), context.ord, docFreq, termsEnum.totalTermFreq());
               Query q = new ConstantScoreQuery(new TermQuery(new Term(query.field, term), termStates));
               Weight weight = searcher.rewrite(q).createWeight(searcher, scoreMode, score());
               return new WeightOrDocIdSet(weight);
             }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org