You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by jlman <jo...@gmail.com> on 2009/08/22 15:40:11 UTC

Re: How to het the score in percentage

hossman wrote:
> 
> 
> : here ie, in our existing system we are showing the search score in
> : percenetage but lucene provides the search score in numbers which is
> derived
> : from some internal logic. Can anybody give some tips for converting the
> : lucene score to percentage or is there any way to retrive the score as
> : percentage from lucene search. 
> 
> there is an extremely important and fundemental question you have to 
> answer when you say you want "the score as a percentage" ... 
> 
> 	A percentage of what exactly?
> 
> score values are meaningful only for purposes of comparison between other 
> documents for the exact same query and the exact same index.  when you try 
> to compute a percentage, you are setting up an implicit comparison with 
> scores from other queries.
> 
> -Hoss
> 
> 

There is one situation where comparison is viable. When the input is an
existing document (ie - using the mlt function or doing a simple query using
a document's title/body). In such cases, the score of the document to itself
(which will hopefully be the max score in the result set) is the scaling
factor. With this approach we can answer the question "are docs A and B more
similar than docs C and D".

This may even be the approach used by carrot for clustering, though I
haven't looked into how it generates its similarity matrix. (note - it's
also possible that the scores between two docs aren't bi-directional,
meaning A is more similar to B than B is to A)

Perhaps treating each query as a document would allow lucene to return the
max score possible for that query (the match to itself), and then scale
documents from there. Yes, there are lots of challenges to actually doing
this since you wouldn't want to actually add a temporary doc to the index.

I know this topic usually morphs into assessing if a percentage-match is
useful. While I agree that scaled/normalized scores are prone to misuse, we
need a way to know if there are any good results, not just what the best
results are. One use case is when users submit content similar to existing
content and you'd like to alert them to the near-duplicate before
proceeding. Obviously you only want to prompt them if there are close
matches, and currently lucene only offers a way to get the most similar
docs, not a way to determine if any are actually similar.
-- 
View this message in context: http://www.nabble.com/How-to-het-the-score-in-percentage-tp23293756p25093931.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to het the score in percentage

Posted by Erick Erickson <er...@gmail.com>.

I'm not saying that calculating percentages is a bad thing *within a
query*.Your
absolutely right, users want some clue about how "good" a match
various items are.

But trying to compare percentages (scores) between *two different queries*
then trying to infer that there is some "better fit" based on that data
is where problems creep in.....

FWIW
Erick

On Sat, Aug 22, 2009 at 10:01 AM, Shashi Kant <sh...@gmail.com> wrote:

> Chris & Erick's arguments are persuasive , however we do live in an
> imperfect world. Most of our users want to see the relative importance
> of a results vis-a-vis the rest....
>
> Relative Importance (%) = (d - dmin)/(dmax-dmin) * 100
>
> Where dmax is the highest Lucene score (score of top result) and dmin
> is the least (the score of the last result) and d = current score.
>
> This would work for any n results.
>
> While this might be technically 'meh', we took a simple normalization
> approach of Lucene scores, it helped the users in gauging the relative
> importance and relate better. End of day, isn't that what matters
> most?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: How to het the score in percentage

Posted by Shashi Kant <sh...@gmail.com>.

Chris & Erick's arguments are persuasive , however we do live in an
imperfect world. Most of our users want to see the relative importance
of a results vis-a-vis the rest....

Relative Importance (%) = (d - dmin)/(dmax-dmin) * 100

Where dmax is the highest Lucene score (score of top result) and dmin
is the least (the score of the last result) and d = current score.

This would work for any n results.

While this might be technically 'meh', we took a simple normalization
approach of Lucene scores, it helped the users in gauging the relative
importance and relate better. End of day, isn't that what matters
most?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org