You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Eugene Ezekiel <ec...@gmail.com> on 2006/02/05 13:36:04 UTC

Reducing Inflated Similarity Scores

Hi All,

I'm currently using the Default Similarity with the Boolean Query add 
function to append clauses. The problem I face is this, given a query 
<t1> <t2> <t3> .... <tn>, where <ti> = a term
it returns me a document which that has just ONE term in it say <t1> and 
nothing else. Surprisingly, the hits score for this is 1.0.

Ok, I'm quite new to lucene so I don't really know how the Default 
Similarity works but from what I gather it is a variation of the 
cos-similarity. And the cos-measure penalizes extraneous terms 
therefore, how can the score be 1.0?

Can anyone tell what I can tweak to bring it more to the cos-measure?

Thanks.

Regards,
Eugene

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Reducing Inflated Similarity Scores

Posted by Chris Hostetter <ho...@fucit.org>.
: Ok, I'm quite new to lucene so I don't really know how the Default
: Similarity works but from what I gather it is a variation of the
: cos-similarity. And the cos-measure penalizes extraneous terms
: therefore, how can the score be 1.0?

If you are using hte Hits API then the score you are seeing is normalized
such that if the highest score in your results is greater then 1, then all
scores are divided by one.  if you want to see the "true" score you should
look at the score from one of the more advanced search methods (that
returns TopDocs).

: Can anyone tell what I can tweak to bring it more to the cos-measure?

I would start by looking at the Searchable.explain() method to really
understand where your score is comming from.  then you can look at what
methods you might need to override to get the behavior you desire (if it's
not already working fine once you see the non-normalized score)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org