You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2017/12/07 11:35:00 UTC

[jira] [Updated] (LUCENE-8083) Give similarities better values for maxScore

     [ https://issues.apache.org/jira/browse/LUCENE-8083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-8083:
---------------------------------
    Attachment: LUCENE-8083.patch

Here is a patch that improves BM25's maxScore by taking the maxFreq into account, and implements maxScore on all SimilarityBase impls by passing freq=maxFreq and docLen=1 to the score method. I also added new tests that are specific to this maxScore method.

Practically, this means that the LUCENE-4100 optimizations now work well with similarities whose score saturates quickly with increasing frequencies like all DFR similarities, IBSimilarity with DistributionSPL, AxiomaticF2EXP and AxiomaticF2LOG. It might work well with other similarities as well in the future if we start recording the per-term (or maybe per-field would be a good start) maximum term frequency.

> Give similarities better values for maxScore
> --------------------------------------------
>
>                 Key: LUCENE-8083
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8083
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8083.patch
>
>
> The benefits of LUCENE-4100 largely depend on the quality of the upper bound of the scores that is provided by the similarity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org