You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Walter Underwood <wu...@netflix.com> on 2009/06/30 17:20:52 UTC

Scores as percentages

We've had a couple of people ask for scores as percentages. If you really
want this, it should be just barely possible, though it would take some
coding. You calculate the maximum possible score, then report scores
normalized against that.

1. Use a TF scoring formula that has a ceiling, like hyperbolicTf, and
assume the maximum TF score for each term.
2. If you are using length normalization, assume the optimum length.
3. Gather the IDFs for all the terms.
4. Choose the field with the highest query time boost, if you are using
dismax.
5. Finally, calculate the maximum possible relevance score.

I'm sure I've left something out, but that is the general idea.

Be prepared to have relevant documents with distressingly low percentage
scores, like 5%.

wunder


Re: Scores as percentages

Posted by Chris Hostetter <ho...@fucit.org>.
: We've had a couple of people ask for scores as percentages. If you really
: want this, it should be just barely possible, though it would take some
: coding. You calculate the maximum possible score, then report scores
: normalized against that.

Right ... this idea has come up on the java-user list before, and was 
*linked* to from this wiki page...

   http://wiki.apache.org/lucene-java/ScoresAsPercentages

...but i just updated the wiki to elaborate on the concept and why i 
don't think it's a good idea.  (nutshell: it still suffers from the 
problem of scores not being comparble between differerent queries (with 
differnet structures) and the percentage for queryA=>docX can change when 
docZ is are added or removed even docZ documents don't match the query at 
all.





-Hoss