You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dmitry Serebrennikov <dm...@earthlink.net> on 2002/10/15 04:16:55 UTC

Are score values always between 0 and 1?

Greetings,

I know that the FAQ says that they are, but in at least one instance in 
my index it appears to be equal to 1.94something. Are the scores 
guaranteed to be between 0 and 1, and if not, what would it take to make 
them such?

Thanks.
Dmitry.



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Are score values always between 0 and 1?

Posted by Alex Murzaku <li...@lissus.com>.
As results are sorted by score, you just need to look at the first to
set the score normalizer. The following code is from Hits.java:

    float scoreNorm = 1.0f;
    if (length > 0 && scoreDocs[0].score > 1.0f)
      scoreNorm = 1.0f / scoreDocs[0].score;

    int end = scoreDocs.length < length ? scoreDocs.length : length;
    for (int i = hitDocs.size(); i < end; i++)
      hitDocs.addElement(new HitDoc(scoreDocs[i].score*scoreNorm,
				    scoreDocs[i].doc));

scoreNorm guarantees that all scores will be between 0 and 1.

-----Original Message-----
From: Dmitry Serebrennikov [mailto:dmitrys@earthlink.net] 
Sent: Tuesday, October 15, 2002 4:16 AM
To: Lucene Users List
Subject: Re: Are score values always between 0 and 1?


Ype Kingma wrote:

>On Tuesday 15 October 2002 04:16, Dmitry Serebrennikov wrote:
>  
>
>>Greetings,
>>
>>I know that the FAQ says that they are, but in at least one instance 
>>in my index it appears to be equal to 1.94something. Are the scores 
>>guaranteed to be between 0 and 1, and if not, what would it take to 
>>make them such?
>>    
>>
>
>Division by the highest score perhaps?
>I did this for a short while, but then I removed it again because 
>information
>is lost by the division. However, it is mainly a matter of presentation
to 
>users, so you could let them be you guide in this.
>
>Regards,
>Ype
>
>
>  
>
Well, the problem is that I don't known what the highest score might be 
until I run into one that is higher than the one I thought was the 
highest until then... I'm trying to use this for making result from 
searcher always come before another in a MultiSearcher, but I need to 
know the upper bound on the scores to get this to work.



--
To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
For additional commands, e-mail:
<ma...@jakarta.apache.org>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Are score values always between 0 and 1?

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
Ype Kingma wrote:

>On Tuesday 15 October 2002 04:16, Dmitry Serebrennikov wrote:
>  
>
>>Greetings,
>>
>>I know that the FAQ says that they are, but in at least one instance in
>>my index it appears to be equal to 1.94something. Are the scores
>>guaranteed to be between 0 and 1, and if not, what would it take to make
>>them such?
>>    
>>
>
>Division by the highest score perhaps?
>I did this for a short while, but then I removed it again because information 
>is lost by the division. However, it is mainly a matter of presentation to 
>users, so you could let them be you guide in this.
>
>Regards,
>Ype
>
>
>  
>
Well, the problem is that I don't known what the highest score might be 
until I run into one that is higher than the one I thought was the 
highest until then... I'm trying to use this for making result from 
searcher always come before another in a MultiSearcher, but I need to 
know the upper bound on the scores to get this to work.



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Are score values always between 0 and 1?

Posted by Ype Kingma <yk...@xs4all.nl>.
On Tuesday 15 October 2002 04:16, Dmitry Serebrennikov wrote:
> Greetings,
>
> I know that the FAQ says that they are, but in at least one instance in
> my index it appears to be equal to 1.94something. Are the scores
> guaranteed to be between 0 and 1, and if not, what would it take to make
> them such?

Division by the highest score perhaps?
I did this for a short while, but then I removed it again because information 
is lost by the division. However, it is mainly a matter of presentation to 
users, so you could let them be you guide in this.

Regards,
Ype


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Are score values always between 0 and 1?

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
Doug, thanks for a quick response.
If I understand it correctly, the answer to my next question is "no", 
but still,
    Are the weights at least bounded or can they potentially be any float?
    Also, when the new field and document boosts are incorporated, how 
does this change the picture?

Thanks again.
Dmitry.

Doug Cutting wrote:

> Dmitry Serebrennikov wrote:
>
>> I know that the FAQ says that they are, but in at least one instance 
>> in my index it appears to be equal to 1.94something. Are the scores 
>> guaranteed to be between 0 and 1
>
>
> No.
>
> > and if not, what would it take to make
>
>> them such?
>
>
> A different Similarity implementation.
>
> To do this right you need to divide each document's score by the 
> square root of the sum of all of the document's term weights.  This is 
> hard to do, since the term weights depend on each term's document 
> frequency and hence change when documents are added and deleted from 
> the index.  Thus this denominator would have to be recomputed for 
> every document each time the index changes.  Or you could use term 
> weights that don't depend on document frequency, or ...
>
> Doug
>
>
> -- 
> To unsubscribe, e-mail:   
> <ma...@jakarta.apache.org>
> For additional commands, e-mail: 
> <ma...@jakarta.apache.org>
>
>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Are score values always between 0 and 1?

Posted by Doug Cutting <cu...@lucene.com>.
Dmitry Serebrennikov wrote:
> I know that the FAQ says that they are, but in at least one instance in 
> my index it appears to be equal to 1.94something. Are the scores 
> guaranteed to be between 0 and 1

No.

 > and if not, what would it take to make
> them such?

A different Similarity implementation.

To do this right you need to divide each document's score by the square 
root of the sum of all of the document's term weights.  This is hard to 
do, since the term weights depend on each term's document frequency and 
hence change when documents are added and deleted from the index.  Thus 
this denominator would have to be recomputed for every document each 
time the index changes.  Or you could use term weights that don't depend 
on document frequency, or ...

Doug


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Are score values always between 0 and 1?

Posted by Ype Kingma <yk...@xs4all.nl>.
On Tuesday 15 October 2002 04:16, Dmitry Serebrennikov wrote:
> Greetings,
>
> I know that the FAQ says that they are, but in at least one instance in
> my index it appears to be equal to 1.94something. Are the scores
> guaranteed to be between 0 and 1, and if not, what would it take to make
> them such?

Division by the highest score perhaps?
I did this for a short while, but then I removed it again because information 
is lost by the division. However, it is mainly a matter of presentation to 
users, so you could let them be you guide in this.

Regards,
Ype


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>