You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/08 17:28:04 UTC

[GitHub] [lucene] gsmiller commented on pull request #11741: DRAFT: Experiment with intersecting TermInSetQuery terms up-front to better estimate cost

gsmiller commented on PR #11741:
URL: https://github.com/apache/lucene/pull/11741#issuecomment-1241017654

   @jpountz thanks for the feedback! If we assume a scenario where we have a `TermInSetQuery` over very selective terms (low docFreqs for each), we'd want to use the index query unless there's another clause that can lead that query that's significantly more restrictive. So there's no benefit of using `IndexOrDocValuesQuery` in that scenario, but moving the term lookup to the `scorerSupplier` also shouldn't hurt this case since we have to do it anyway.
   
   On the other hand, with today's implementation, we may not know that the `TermInSetQuery` is very selective (e.g., maybe there are terms in the field, but not in the query, that match a large number of documents). In this case, it may be beneficial to use the index-based query, but—because of our naive cost heuristic—we could end up using a doc-values query because we're significantly over-estimating the cost of the index query.
   
   I think the case where this approach would really hurt us is when the `TermInSetQuery` is not particularly restrictive—to the point that we end up using the doc-values query—but we have to pay this up-front cost to look up terms just to do decide we don't want to use the index-based query after all. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org