You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Mikhail Khludnev <mk...@apache.org> on 2023/02/13 11:47:11 UTC

Re: Maximum score estimation

Hello.
Just FYI. I scratched a little prototype
https://github.com/mkhludnev/likely/blob/main/src/test/java/org/apache/lucene/contrb/highly/TestLikelyReader.java#L53
To estimate maximum possible score for the query against an index:
 - it creates a virtual index (LikelyReader), which
 - contains all terms from the original index with the same docCount
 - matching all of these terms in the first doc (docnum=0) with the maximum
termFreq (which estimating is a separate question).
So, if we search over this LikelyReader we get a score estimate, which can
hardly be exceeded by the same query over the original index.
I suppose this might be useful for LTR as a better alternative to the query
score feature.

On Tue, Dec 6, 2022 at 10:02 AM Mikhail Khludnev <mk...@apache.org> wrote:

> Hello dev!
> Users are interested in the meaning of absolute value of the score, but we
> always reply that it's just relative value. Maximum score of matched docs
> is not an answer.
> Ultimately we need to measure how much sense a query has in the index.
> e.g. [jet OR propulsion OR spider] query should be measured like
> nonsense, because the best matching docs have much lower scores than
> hypothetical (and assuming absent) doc matching [jet AND propulsion AND
> spider].
> Could it be a method that returns the maximum possible score if all query
> terms would match. Something like stubbing postings on virtual all_matching
> doc with average stats like tf and field length and kicks scorers in? It
> reminds me something about probabilistic retrieval, but not much. Is there
> anything like this already?
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev