You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Trieschnigg, R.B. (Dolf)" <r....@ewi.utwente.nl> on 2006/02/16 11:40:41 UTC

BM25 Similarity implementation

Hi,

I would like to implement the Okapi BM25 weighting function using my own Similarity implementation. Unfortunately BM25 requires the document length in the score calculation, which is not provided by the Scorer.

Does anyone know a solution to this problem?

I've tried to find other Similarity implementations than the default one used by Lucene, but I could not find any... Any suggestions?

Thanks.
Dolf

 

 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BM25 Similarity implementation

Posted by Doug Cutting <cu...@apache.org>.
Trieschnigg, R.B. (Dolf) wrote:
> I would like to implement the Okapi BM25 weighting function using my own Similarity implementation. Unfortunately BM25 requires the document length in the score calculation, which is not provided by the Scorer.

How do you want to measure document length?  If the number of tokens is 
an acceptable measure, then the norm contains sqrt(numTokens) by 
default.  You can modify your Similarity.lengthNorm() implementation to 
not perform the sqrt, or square the norm.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org