You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Trieschnigg, R.B. (Dolf)" <r....@ewi.utwente.nl> on 2006/02/16 11:40:41 UTC
BM25 Similarity implementation
Hi,
I would like to implement the Okapi BM25 weighting function using my own Similarity implementation. Unfortunately BM25 requires the document length in the score calculation, which is not provided by the Scorer.
Does anyone know a solution to this problem?
I've tried to find other Similarity implementations than the default one used by Lucene, but I could not find any... Any suggestions?
Thanks.
Dolf
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: BM25 Similarity implementation
Posted by Doug Cutting <cu...@apache.org>.
Trieschnigg, R.B. (Dolf) wrote:
> I would like to implement the Okapi BM25 weighting function using my own Similarity implementation. Unfortunately BM25 requires the document length in the score calculation, which is not provided by the Scorer.
How do you want to measure document length? If the number of tokens is
an acceptable measure, then the norm contains sqrt(numTokens) by
default. You can modify your Similarity.lengthNorm() implementation to
not perform the sqrt, or square the norm.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org