You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tom Burton-West (JIRA)" <ji...@apache.org> on 2013/08/13 18:09:47 UTC

[jira] [Updated] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

     [ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom Burton-West updated LUCENE-5175:
------------------------------------

    Attachment: LUCENE-5175.patch

Patch adds optional parameter delta to lower-bound tf normalization.  Attached also are unit tests. 

Still need to add tests of the explanation/scoring for cases 1) no norms, and 2) no delta

If no delta parameter is supplied, the math works out to the equivalent of the regular BM25 formula  as far as the score, but I think there is an extra step or two to get there.  I'll see if I can get some benchmarks running to see if there is any significant performance issue.
                
> Add parameter to lower-bound TF normalization for BM25 (for long documents)
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-5175
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5175
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Tom Burton-West
>            Priority: Minor
>         Attachments: LUCENE-5175.patch
>
>
> In the article "When Documents Are Very Long, BM25 Fails!" a fix for the problem is documented.  There was a TODO note in BM25Similarity to add this fix. I will attach a patch that implements the fix shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org