You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by xing jiang <gi...@gmail.com> on 2006/01/27 08:16:10 UTC

How does the lucene normalize the score?

Hi,

I want to know how the lucene normalizes the score. I see hits class has
this function to get each document's score. But i dont know how lucene
calculates the normalized score and in the "Lucene in action", it only said
normalized score of the nth top scoring docuemnts.
--
Regards

Jiang Xing

Re: How does the lucene normalize the score?

Posted by Chris Hostetter <ho...@fucit.org>.
: ..but this means, that the scores are not comparable across queries,
: because a hit with the score '0.7' from one query mustn't be as 'good' as
: a '0.7' from another query...and this is only the case, whether the original,
: unnormalized top score value was less than 1.0.

Scores are not comparable between differnet queries, regardless of wether
the scores from one query are normalized or not.  This is mentioned in the
FAQ...

http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How does the lucene normalize the score?

Posted by du...@web.de.
..but this means, that the scores are not comparable across queries,
because a hit with the score '0.7' from one query mustn't be as 'good' as
a '0.7' from another query...and this is only the case, whether the original,
unnormalized top score value was less than 1.0.

Looks this really like a feasible way to normalize similarity values, especially
the distinction according to the top-score? Can someone really say, that a normalization
is meaningfull or not - related to the top score value?


I have made a further look, and it seems that the score-values inside the explanations
are not normalized?! We need normalized similarity values(e.g. in a range [0..1]), that
are comparable across queries. The situation now says that we have two score values:

1. an normalized one from the Hits-Class, without cross-query comparability
2. one unnormalized from IndexSearcher.explain(..), with cross-query comparability

I am a little  bit confused now..does this mean, that the default similarity implementation is
not adequate for such kind of problems?

best regards,

Chris



-- 
______________________________________________________________________

Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer

Knowledge Management Department
German Research Center for Artificial Intelligence DFKI GmbH
Erwin-Schrödinger-Straße 57, D-67663 Kaiserslautern, Germany

Phone: +49.631.205-3441
mailto:reuschling@dfki.de  http://www.dfki.uni-kl.de/~reuschling/
______________________________________________________________________




Chris Lamprecht wrote:
> It takes the highest scoring document, if greater than 1.0, and
> divides every hit's score by this number, leaving them all <= 1.0. 
> Actually, I just looked at the code, and it actually does this by
> taking 1/maxScore and then multiplying this by each score (equivalent
> results in the end, maybe more efficient(?)).  See the method
> getMoreDocs() in Hits.java (org.apache.lucene.search.Hits):
> 
> [...]
>     float scoreNorm = 1.0f;
> 
>     if (length > 0 && topDocs.getMaxScore() > 1.0f) {
>       scoreNorm = 1.0f / topDocs.getMaxScore();
>     }
> 
>     int end = scoreDocs.length < length ? scoreDocs.length : length;
>     for (int i = hitDocs.size(); i < end; i++) {
>       hitDocs.addElement(new HitDoc(scoreDocs[i].score * scoreNorm,
>                                     scoreDocs[i].doc));
>     }
> 
> 
> 
> On 1/27/06, xing jiang <gi...@gmail.com> wrote:
>>Hi,
>>
>>I want to know how the lucene normalizes the score. I see hits class has
>>this function to get each document's score. But i dont know how lucene
>>calculates the normalized score and in the "Lucene in action", it only said
>>normalized score of the nth top scoring docuemnts.
>>--
>>Regards
>>
>>Jiang Xing
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How does the lucene normalize the score?

Posted by xing jiang <gi...@gmail.com>.
hi,

thank you for your help.


On 1/27/06, Chris Lamprecht <cl...@gmail.com> wrote:
>
> It takes the highest scoring document, if greater than 1.0, and
> divides every hit's score by this number, leaving them all <= 1.0.
> Actually, I just looked at the code, and it actually does this by
> taking 1/maxScore and then multiplying this by each score (equivalent
> results in the end, maybe more efficient(?)).  See the method
> getMoreDocs() in Hits.java (org.apache.lucene.search.Hits):
>
> [...]
>    float scoreNorm = 1.0f;
>
>    if (length > 0 && topDocs.getMaxScore() > 1.0f) {
>      scoreNorm = 1.0f / topDocs.getMaxScore();
>    }
>
>    int end = scoreDocs.length < length ? scoreDocs.length : length;
>    for (int i = hitDocs.size(); i < end; i++) {
>      hitDocs.addElement(new HitDoc(scoreDocs[i].score * scoreNorm,
>                                    scoreDocs[i].doc));
>    }
>
>
>
> On 1/27/06, xing jiang <gi...@gmail.com> wrote:
> > Hi,
> >
> > I want to know how the lucene normalizes the score. I see hits class has
> > this function to get each document's score. But i dont know how lucene
> > calculates the normalized score and in the "Lucene in action", it only
> said
> > normalized score of the nth top scoring docuemnts.
> > --
> > Regards
> >
> > Jiang Xing
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


--
Regards

Jiang Xing

Re: How does the lucene normalize the score?

Posted by Yonik Seeley <ys...@gmail.com>.
On 1/27/06, Chris Lamprecht <cl...@gmail.com> wrote:
> Actually, I just looked at the code, and it actually does this by
> taking 1/maxScore and then multiplying this by each score (equivalent
> results in the end, maybe more efficient(?)).

Very much so... fdiv commonly takes 20 to 40 clock cycles, depending
on precision.  fmul commonly takes 3 clock cycles.   Same thing holds
with integer multiplication and division.

If one doesn't want normalized scores, they should use the expert
level search routines that return TopDocs or TopFieldDocs.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How does the lucene normalize the score?

Posted by Chris Lamprecht <cl...@gmail.com>.
It takes the highest scoring document, if greater than 1.0, and
divides every hit's score by this number, leaving them all <= 1.0. 
Actually, I just looked at the code, and it actually does this by
taking 1/maxScore and then multiplying this by each score (equivalent
results in the end, maybe more efficient(?)).  See the method
getMoreDocs() in Hits.java (org.apache.lucene.search.Hits):

[...]
    float scoreNorm = 1.0f;

    if (length > 0 && topDocs.getMaxScore() > 1.0f) {
      scoreNorm = 1.0f / topDocs.getMaxScore();
    }

    int end = scoreDocs.length < length ? scoreDocs.length : length;
    for (int i = hitDocs.size(); i < end; i++) {
      hitDocs.addElement(new HitDoc(scoreDocs[i].score * scoreNorm,
                                    scoreDocs[i].doc));
    }



On 1/27/06, xing jiang <gi...@gmail.com> wrote:
> Hi,
>
> I want to know how the lucene normalizes the score. I see hits class has
> this function to get each document's score. But i dont know how lucene
> calculates the normalized score and in the "Lucene in action", it only said
> normalized score of the nth top scoring docuemnts.
> --
> Regards
>
> Jiang Xing
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org