You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shayan Tabrizi (JIRA)" <ji...@apache.org> on 2016/10/06 13:42:20 UTC
[jira] [Created] (LUCENE-7478) Wrong Formula in
LMDirichletSimilarity
Shayan Tabrizi created LUCENE-7478:
--------------------------------------
Summary: Wrong Formula in LMDirichletSimilarity
Key: LUCENE-7478
URL: https://issues.apache.org/jira/browse/LUCENE-7478
Project: Lucene - Core
Issue Type: Bug
Reporter: Shayan Tabrizi
Priority: Critical
It seems that the formula in LMDirichletSimilarity is wrong or at least is not the formula in the mentioned C.X. Zhai paper.
The main part of formula in LMDirichletSimilarity is:
Math.log(1 + freq /
(mu * ((LMStats)stats).getCollectionProbability())) +
Math.log(mu / (docLen + mu))
which is in fact:
(mu*p(w|C)+c(w,d))/(p(w)*(|d| + mu))
while the main formula is:
(mu*p(w|C)+c(w,d))/(|d| + mu)
So a p(w) is practically added to the formula.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org