You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2009/05/26 15:48:10 UTC

valid scores for sorting

I'm attempting to switch Solr to use the new Collector framework to
get per-segment sorting and have been hitting some issues.
The latest is a function query log(val) which produces both NaN and
-Infinity values, which kill the TopScoreDocCollector (invalid docids
are produced).

results = {org.apache.lucene.search.ScoreDoc[7]@2039}
[0] = {org.apache.lucene.search.ScoreDoc@2042}"doc=0 score=2.0"
[1] = {org.apache.lucene.search.ScoreDoc@2043}"doc=4 score=1.39794"
[2] = {org.apache.lucene.search.ScoreDoc@2044}"doc=3 score=1.0"
[3] = {org.apache.lucene.search.ScoreDoc@2045}"doc=5 score=0.69897"
[4] = {org.apache.lucene.search.ScoreDoc@2046}"doc=1 score=-2000000.0"
[5] = {org.apache.lucene.search.ScoreDoc@2047}"doc=2147483647 score=-Infinity"
[6] = {org.apache.lucene.search.ScoreDoc@2048}"doc=2147483647 score=-Infinity"

So either we need to clarify the valid values for score() or we need
to change how the queue does comparisons so that this works again.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Tue, May 26, 2009 at 2:41 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Tue, May 26, 2009 at 2:22 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> Hmm -- IndexSearcher tries to detect when SortComparatorSource is
>> used, and drive the search with the toplevel reader, so that code is
>> not supposed to be reached.  Do you remember what tickled it?
>
> Solr's search code is now using the IndexSearcher.search(query,
> luceneFilter, collector) method.
> Since it doesn't pass a Sort (it's part of the collector instead), I
> guess this logic is bypassed.
> From a back compat point of view, this is fine of course since
> "Collector" didn't previously exist.

Ahhh OK, phew :)

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, May 26, 2009 at 2:22 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Hmm -- IndexSearcher tries to detect when SortComparatorSource is
> used, and drive the search with the toplevel reader, so that code is
> not supposed to be reached.  Do you remember what tickled it?

Solr's search code is now using the IndexSearcher.search(query,
luceneFilter, collector) method.
Since it doesn't pass a Sort (it's part of the collector instead), I
guess this logic is bypassed.
>From a back compat point of view, this is fine of course since
"Collector" didn't previously exist.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Tue, May 26, 2009 at 1:58 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:

>>>> What other issues are you hitting?
>>>
>>> I hit an NPE when using old-style sort comparators.
>>
>> Do you have the traceback (or remember the gist)?
>
> I remember it...
>    case SortField.CUSTOM:
>      assert factory == null && comparatorSource != null;
>      return comparatorSource.newComparator(field, numHits, sortPos, reversed);
>
> comparatorSource was null.

Hmm -- IndexSearcher tries to detect when SortComparatorSource is
used, and drive the search with the toplevel reader, so that code is
not supposed to be reached.  Do you remember what tickled it?

>>> The main thing is that I didn't anticipate having to rewrite all the
>>> custom sort comparators Solr has.
>>
>> Because they were loading the FieldCache entry for the entire
>> MultiReader (vs normal field sorting which'd load per segment)?
>
> I think that could be it... I didn't trouble myself too long analyzing
> it (or the NPE above) - I just figured we should bite the bullet and
> start using non-deprecated classes.

OK

>> How many custom sort comparators does Solr have?
>
> I guess it's only 3 or 4... plus some code to use them to correctly
> merge results in distributed search.

OK

>> But are you now migrating away from Solr's private PQ?  (Else you
>> wouldn't have hit problems w/ Lucene's new TSDC).
>
> Yep.

Super!

> BTW, Kudos to everyone who worked on the new Collector classes... esp
> TopFieldCollector.create() - nice powerful stuff, easy to chain with
> other collectors, etc.

Shai did all the hard work ;)  (And, still is...)

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, May 26, 2009 at 1:44 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> What other issues are you hitting?
>>
>> I hit an NPE when using old-style sort comparators.
>
> Do you have the traceback (or remember the gist)?

I remember it...
    case SortField.CUSTOM:
      assert factory == null && comparatorSource != null;
      return comparatorSource.newComparator(field, numHits, sortPos, reversed);

comparatorSource was null.


>> The main thing is that I didn't anticipate having to rewrite all the
>> custom sort comparators Solr has.
>
> Because they were loading the FieldCache entry for the entire
> MultiReader (vs normal field sorting which'd load per segment)?

I think that could be it... I didn't trouble myself too long analyzing
it (or the NPE above) - I just figured we should bite the bullet and
start using non-deprecated classes.

> How many custom sort comparators does Solr have?

I guess it's only 3 or 4... plus some code to use them to correctly
merge results in distributed search.

[...]
> But are you now migrating away from Solr's private PQ?  (Else you
> wouldn't have hit problems w/ Lucene's new TSDC).

Yep.
BTW, Kudos to everyone who worked on the new Collector classes... esp
TopFieldCollector.create() - nice powerful stuff, easy to chain with
other collectors, etc.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> What other issues are you hitting?
>
> I hit an NPE when using old-style sort comparators.

Do you have the traceback (or remember the gist)?

> The main thing is that I didn't anticipate having to rewrite all the
> custom sort comparators Solr has.

Because they were loading the FieldCache entry for the entire
MultiReader (vs normal field sorting which'd load per segment)?

How many custom sort comparators does Solr have?

> There are multiple test cases still failing... not sure at this point
> if they are all related to custom sort comparators or not.
>
>> Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
>> (Nan, -Inf, in fact any non-positive score) were skipped entirely.
>
> OK... I never actually checked if they were included or not  - Solr
> used a priority queue directly due to historical Lucene limitations.

But are you now migrating away from Solr's private PQ?  (Else you
wouldn't have hit problems w/ Lucene's new TSDC).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK I'll commit these changes...

Mike

On Tue, May 26, 2009 at 1:34 PM, Shai Erera <se...@gmail.com> wrote:
> I also think we should state that for TSDC, those scores are invalid.
>
> In fact, we also changed TopDocs to return maxScore=NaN if max-score is not
> tracked, so I think in a sense we've already said that NaN is not a valid
> value.
>
> But anyway, given that I don't believe those scores are common, and the
> enhancements we could do to TSDC (pre-populating the queue with those
> sentinels) I think we should state it for TSDC, and if someone does need to
> use those scores, he can write his own version of TSDC, which does not
> pre-populate anything, or uses different sentinels.
>
> I don't think this warrants an issue though. Just a small addition to TSDC's
> javadoc and to CHANGES.
>
> Shai
>
> On Tue, May 26, 2009 at 7:30 PM, Yonik Seeley <yo...@lucidimagination.com>
> wrote:
>>
>> FYI, the upgrading work is going on in
>> https://issues.apache.org/jira/browse/SOLR-1111
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>> On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
>> <yo...@lucidimagination.com> wrote:
>> > On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
>> > <lu...@mikemccandless.com> wrote:
>> >> What other issues are you hitting?
>> >
>> > I hit an NPE when using old-style sort comparators.
>> > The main thing is that I didn't anticipate having to rewrite all the
>> > custom sort comparators Solr has.
>> > There are multiple test cases still failing... not sure at this point
>> > if they are all related to custom sort comparators or not.
>> >
>> >> Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
>> >> (Nan, -Inf, in fact any non-positive score) were skipped entirely.
>> >
>> > OK... I never actually checked if they were included or not  - Solr
>> > used a priority queue directly due to historical Lucene limitations.
>> >
>> > -Yonik
>> > http://www.lucidimagination.com
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Shai Erera <se...@gmail.com>.
I also think we should state that for TSDC, those scores are invalid.

In fact, we also changed TopDocs to return maxScore=NaN if max-score is not
tracked, so I think in a sense we've already said that NaN is not a valid
value.

But anyway, given that I don't believe those scores are common, and the
enhancements we could do to TSDC (pre-populating the queue with those
sentinels) I think we should state it for TSDC, and if someone does need to
use those scores, he can write his own version of TSDC, which does not
pre-populate anything, or uses different sentinels.

I don't think this warrants an issue though. Just a small addition to TSDC's
javadoc and to CHANGES.

Shai

On Tue, May 26, 2009 at 7:30 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> FYI, the upgrading work is going on in
> https://issues.apache.org/jira/browse/SOLR-1111
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
> > On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
> > <lu...@mikemccandless.com> wrote:
> >> What other issues are you hitting?
> >
> > I hit an NPE when using old-style sort comparators.
> > The main thing is that I didn't anticipate having to rewrite all the
> > custom sort comparators Solr has.
> > There are multiple test cases still failing... not sure at this point
> > if they are all related to custom sort comparators or not.
> >
> >> Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
> >> (Nan, -Inf, in fact any non-positive score) were skipped entirely.
> >
> > OK... I never actually checked if they were included or not  - Solr
> > used a priority queue directly due to historical Lucene limitations.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: valid scores for sorting

Posted by Yonik Seeley <yo...@lucidimagination.com>.
FYI, the upgrading work is going on in
https://issues.apache.org/jira/browse/SOLR-1111

-Yonik
http://www.lucidimagination.com



On Tue, May 26, 2009 at 12:29 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> What other issues are you hitting?
>
> I hit an NPE when using old-style sort comparators.
> The main thing is that I didn't anticipate having to rewrite all the
> custom sort comparators Solr has.
> There are multiple test cases still failing... not sure at this point
> if they are all related to custom sort comparators or not.
>
>> Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
>> (Nan, -Inf, in fact any non-positive score) were skipped entirely.
>
> OK... I never actually checked if they were included or not  - Solr
> used a priority queue directly due to historical Lucene limitations.
>
> -Yonik
> http://www.lucidimagination.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, May 26, 2009 at 12:16 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> What other issues are you hitting?

I hit an NPE when using old-style sort comparators.
The main thing is that I didn't anticipate having to rewrite all the
custom sort comparators Solr has.
There are multiple test cases still failing... not sure at this point
if they are all related to custom sort comparators or not.

> Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
> (Nan, -Inf, in fact any non-positive score) were skipped entirely.

OK... I never actually checked if they were included or not  - Solr
used a priority queue directly due to historical Lucene limitations.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Tue, May 26, 2009 at 9:48 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> I'm attempting to switch Solr to use the new Collector framework to
> get per-segment sorting and have been hitting some issues.

What other issues are you hitting?

> The latest is a function query log(val) which produces both NaN and
> -Infinity values, which kill the TopScoreDocCollector (invalid docids
> are produced).
>
>> Is NEG_INF a valid score for you?
>
> It was.... people are free to create whatever functions they want.  We
> never explicitly spelled out what happens when functions return -Inf
> or NaN, but everything still worked.  Now we actually lose documents
> and they are replaced with invalid docids.

Hmmm... in the past (TopDocCollector in 2.4.x), hits with such scores
(Nan, -Inf, in fact any non-positive score) were skipped entirely.

> To work around it, I've temporarily checked the score for NaN or
> -Infinity and replaced it with -MaxVal.  I guess that will become
> permanent if we decide that those are not valid scores for scorers to
> produce.

I think we should state that Nan/Inf/-Inf are not valid scores?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, May 26, 2009 at 9:52 AM, Shai Erera <se...@gmail.com> wrote:
> We've decided in 1575 to pre-populate HitQueue with sentinel values with
> score = Float.NEG_INF, as we assumed these scores will not be produced. TSDC
> instantiates HitQueue with pre-filling turned on.
>
> Is NEG_INF a valid score for you?

It was.... people are free to create whatever functions they want.  We
never explicitly spelled out what happens when functions return -Inf
or NaN, but everything still worked.  Now we actually lose documents
and they are replaced with invalid docids.

To work around it, I've temporarily checked the score for NaN or
-Infinity and replaced it with -MaxVal.  I guess that will become
permanent if we decide that those are not valid scores for scorers to
produce.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: valid scores for sorting

Posted by Shai Erera <se...@gmail.com>.
We've decided in 1575 to pre-populate HitQueue with sentinel values with
score = Float.NEG_INF, as we assumed these scores will not be produced. TSDC
instantiates HitQueue with pre-filling turned on.

Is NEG_INF a valid score for you?

On Tue, May 26, 2009 at 4:48 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> I'm attempting to switch Solr to use the new Collector framework to
> get per-segment sorting and have been hitting some issues.
> The latest is a function query log(val) which produces both NaN and
> -Infinity values, which kill the TopScoreDocCollector (invalid docids
> are produced).
>
> results = {org.apache.lucene.search.ScoreDoc[7]@2039}
> [0] = {org.apache.lucene.search.ScoreDoc@2042}"doc=0 score=2.0"
> [1] = {org.apache.lucene.search.ScoreDoc@2043}"doc=4 score=1.39794"
> [2] = {org.apache.lucene.search.ScoreDoc@2044}"doc=3 score=1.0"
> [3] = {org.apache.lucene.search.ScoreDoc@2045}"doc=5 score=0.69897"
> [4] = {org.apache.lucene.search.ScoreDoc@2046}"doc=1 score=-2000000.0"
> [5] = {org.apache.lucene.search.ScoreDoc@2047}"doc=2147483647
> score=-Infinity"
> [6] = {org.apache.lucene.search.ScoreDoc@2048}"doc=2147483647
> score=-Infinity"
>
> So either we need to clarify the valid values for score() or we need
> to change how the queue does comparisons so that this works again.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>