You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/12/01 01:14:00 UTC
[jira] [Commented] (LUCENE-8011) Improve similarity explanations
[ https://issues.apache.org/jira/browse/LUCENE-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273779#comment-16273779 ]
ASF GitHub Bot commented on LUCENE-8011:
----------------------------------------
GitHub user mayya-sharipova opened a pull request:
https://github.com/apache/lucene-solr/pull/280
LUCENE-8011: Improve similarity explanations
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mayya-sharipova/lucene-solr LUCENE-8011-improve-similarity-explanations
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/lucene-solr/pull/280.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #280
----
commit c389c4992b66b5ae750ba7aa5b37937ebedc6615
Author: Mayya Sharipova <ma...@elastic.co>
Date: 2017-12-01T01:03:39Z
LUCENE-8011: Improve similarity explanations
----
> Improve similarity explanations
> -------------------------------
>
> Key: LUCENE-8011
> URL: https://issues.apache.org/jira/browse/LUCENE-8011
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Labels: newdev
>
> LUCENE-7997 improves BM25 and Classic explains to better explain:
> {noformat}
> product of:
> 2.2 = scaling factor, k1 + 1
> 9.388654 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
> 1.0 = n, number of documents containing term
> 17927.0 = N, total number of documents with field
> 0.9987758 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
> 979.0 = freq, occurrences of term within document
> 1.2 = k1, term saturation parameter
> 0.75 = b, length normalization parameter
> 1.0 = dl, length of field
> 1.0 = avgdl, average length of field
> {noformat}
> Previously it was pretty cryptic and used confusing terminology like docCount/docFreq without explanation:
> {noformat}
> product of:
> 0.016547536 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
> 449.0 = docFreq
> 456.0 = docCount
> 2.1920826 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
> 113659.0 = freq=113658
> 1.2 = parameter k1
> 0.75 = parameter b
> 2300.5593 = avgFieldLength
> 1048600.0 = fieldLength
> {noformat}
> We should fix other similarities too in the same way, they should be more practical.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org