You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Mayya Sharipova (Jira)" <ji...@apache.org> on 2021/06/23 13:39:03 UTC

[jira] [Closed] (LUCENE-9725) Allow BM25FQuery to use other similarities

     [ https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayya Sharipova closed LUCENE-9725.
-----------------------------------

Closing after the 8.9.0 release

> Allow BM25FQuery to use other similarities
> ------------------------------------------
>
>                 Key: LUCENE-9725
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9725
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Julie Tibshirani
>            Assignee: Julie Tibshirani
>            Priority: Major
>             Fix For: 8.9
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic combined field where all terms have been indexed. It computes new term and collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field content, and (2) pass these to a similarity function. There is nothing really specific to BM25Similarity in this approach. In step 2, we could use another similarity, for example BooleanSimilarity or those based on language models like LMDirichletSimilarity. The main restriction is that norms have to be additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the one configured on IndexSearcher. We could think of this as providing a sensible default approach to cross-field scoring for many similarities. It's an incremental step towards LUCENE-8711, which would give similarities more fine-grained control over how stats/ scores are combined across fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org