You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/02 00:38:42 UTC

[GitHub] [lucene] gsmiller opened a new issue, #11740: Can we improve cost estimation in TermInSetQuery's ScoreSupplier?

gsmiller opened a new issue, #11740:
URL: https://github.com/apache/lucene/issues/11740

   ### Description
   
   To minimize the up-front cost of creating a `ScoreSupplier`, `TermInSetQuery` doesn't actually intersect its terms with the index, which means it has no visibility into the postings length of each term for the purpose of cost estimation. Because of this, we might grossly over-estimate the cost.
   
   I wonder if we can do better somehow? As one thought, I wonder if there are any cases where it's actually justified to intersect the terms up-front? While there's a cost of doing so, having a more accurate cost estimate for the `Scorer` might be useful in some cases?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on issue #11740: Can we improve cost estimation in TermInSetQuery's ScoreSupplier?

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller commented on issue #11740:
URL: https://github.com/apache/lucene/issues/11740#issuecomment-1458502502

   After experimenting with this some more, I haven't been able to come up with any sensible way of doing better cost estimation than we're already doing without actually doing some term seeking. I'm going to resolve this out for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller closed issue #11740: Can we improve cost estimation in TermInSetQuery's ScoreSupplier?

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller closed issue #11740: Can we improve cost estimation in TermInSetQuery's ScoreSupplier?
URL: https://github.com/apache/lucene/issues/11740


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on issue #11740: Can we improve cost estimation in TermInSetQuery's ScoreSupplier?

Posted by GitBox <gi...@apache.org>.
gsmiller commented on issue #11740:
URL: https://github.com/apache/lucene/issues/11740#issuecomment-1234950938

   Put up a draft PR to show how we could intersect terms early here: #11741


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on issue #11740: Can we improve cost estimation in TermInSetQuery's ScoreSupplier?

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller commented on issue #11740:
URL: https://github.com/apache/lucene/issues/11740#issuecomment-1421245215

   As a different approach, the idea of a "self-optimizing" `TermInSetQuery` as explored in #12089, working around the problem of trying to provide an up-front cost estimation to be used by `IndexOrDocValues`. There's some history/context there that's relevant to this idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org