You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Terry Steichen <te...@net-frame.com> on 2004/03/22 18:04:45 UTC

Sorting Messes up Scores

When you use the new sorting features, the relevance scores get messed up.
(A recent test showed most scores now range up to 3.0 or so.)  As Tim
suggests below, I'd like to know if fixing this is important to others.  (It
definitely is to me.)  If so, I'll submit it as a bug.

Regards,

Terry

----- Original Message -----
From: <tj...@apache.org>
To: "Terry Steichen" <te...@net-frame.com>
Sent: Monday, March 22, 2004 11:38 AM
Subject: Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/search
FieldSortedHitQueue.java


> Terry,
>
> Yes - that's correct - it's quite possible the scores will have values
> greater than 1.0 when sorted.  It's something that was just kind of
> ignored, figuring that when the results are sorted by something other
> than score, having normalized scores probably isn't so important.
>
> If it's a concern, please feel free to raise it on the dev list.
>
> Tim
>
>
> > I've looked more closely at the Sorting code and have a concern but I'm
not
> > smart enough to tell whether it's real or not.
> >
> > When the Hits class collects returned hits, it then normalizes the
score.
> > However, in doing this, it assumes that the returned hits (in the form
of a
> > TopDocs class) are ordered by score.  So it takes first item (index of 0
in
> > the array) in the returned hits and uses this as the normalization
factor.
> >
> > When you introduce the sorting, what the Hits class gets back is not
> > TopDocs, but TopFieldDocs, which has already been sorted in some order
other
> > than score.  Hence, the built-in assumption of Hits (that the first
document
> > in the array is the highest score and appropriate to use for
normalization)
> > no longer holds.  Consequently the normalization will be anything but
> > normalized.
> >
> > Again, I emphasize my technical limitations, but does this make sense to
> > you?
> >
> > Regards,
> >
> > Terry
> >
> > PS: BTW, it appears that, if I compile your code under 1.4, it runs just
> > fine under 1.3.1 (providing the regex lib references are removed, as per
> > your patch).
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Sorting Messes up Scores

Posted by Jamie M <ja...@yahoo.com>.
yes, scores are important to me too even when the
results aren't sorted by score.

jamie

--- Terry Steichen <te...@net-frame.com> wrote:
> When you use the new sorting features, the relevance
> scores get messed up.
> (A recent test showed most scores now range up to
> 3.0 or so.)  As Tim
> suggests below, I'd like to know if fixing this is
> important to others.  (It
> definitely is to me.)  If so, I'll submit it as a
> bug.
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: <tj...@apache.org>
> To: "Terry Steichen" <te...@net-frame.com>
> Sent: Monday, March 22, 2004 11:38 AM
> Subject: Re: cvs commit:
> jakarta-lucene/src/java/org/apache/lucene/search
> FieldSortedHitQueue.java
> 
> 
> > Terry,
> >
> > Yes - that's correct - it's quite possible the
> scores will have values
> > greater than 1.0 when sorted.  It's something that
> was just kind of
> > ignored, figuring that when the results are sorted
> by something other
> > than score, having normalized scores probably
> isn't so important.
> >
> > If it's a concern, please feel free to raise it on
> the dev list.
> >
> > Tim
> >
> >
> > > I've looked more closely at the Sorting code and
> have a concern but I'm
> not
> > > smart enough to tell whether it's real or not.
> > >
> > > When the Hits class collects returned hits, it
> then normalizes the
> score.
> > > However, in doing this, it assumes that the
> returned hits (in the form
> of a
> > > TopDocs class) are ordered by score.  So it
> takes first item (index of 0
> in
> > > the array) in the returned hits and uses this as
> the normalization
> factor.
> > >
> > > When you introduce the sorting, what the Hits
> class gets back is not
> > > TopDocs, but TopFieldDocs, which has already
> been sorted in some order
> other
> > > than score.  Hence, the built-in assumption of
> Hits (that the first
> document
> > > in the array is the highest score and
> appropriate to use for
> normalization)
> > > no longer holds.  Consequently the normalization
> will be anything but
> > > normalized.
> > >
> > > Again, I emphasize my technical limitations, but
> does this make sense to
> > > you?
> > >
> > > Regards,
> > >
> > > Terry
> > >
> > > PS: BTW, it appears that, if I compile your code
> under 1.4, it runs just
> > > fine under 1.3.1 (providing the regex lib
> references are removed, as per
> > > your patch).
> >
> >
> >
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-dev-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org