You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by BrianK <br...@bonton.com> on 2011/09/06 19:49:28 UTC

SOLR Sorting algorithm

We are running a SOLR query and are specifying a custom sort field to sort
our results based on our sort field rather than the default score.  For the
most part, the results are sorting by our field, but SOLR appears to be
sorting the results by some other field or alogorithm and it's not the score
field.  Our documents are populated from a database table and when running a
similar query/sort against the database we don't get the same sort sequence
as SOLR even though the sort is on the same field in both systems. 
IMPORTANT NOTE: the sort field/results field is not unique, the search
results in question have the same value (1 in this case), but the results
always come out in the same order.

Can someone explain or point me in the right direction to determine how SOLR
sorts results beyond the field specified in our query string.

Example Query: q=Kitchen Products&sort=sortSequence asc

Example Results:
name: Product 1
sortSequence: 1
score: 1.52221

name: Product 5
sortSequence: 1
score: 1.52221

name: Product 3
sortSequence: 1
score: 1.53112

name: Product 2
sortSequence: 2
score: 1.51112

etc.

Are there hidden fields like document date, creation date, or other field
that is not visible that might be factored into a sort?

--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Sorting-algorithm-tp3314295p3314295.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: SOLR Sorting algorithm

Posted by BrianK <br...@bonton.com>.
Thank you for the update, this helps us a lot with explaining why our
products don't appear exactly how our customers expected for our current
sort criteria.

--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Sorting-algorithm-tp3314295p3314889.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: SOLR Sorting algorithm

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Sep 6, 2011 at 5:49 PM, Ted Dunning <te...@gmail.com> wrote:
> Mostly these would preserve ordering, I expect.
> (not a guarantee by any stretch)

In the past, ordering was always preserved.
But now the default merge policy (TieredMergePolicy) freely selects
the best segments to merge, even if they are non-adjacent (and that
reorders ids with respect to other ids, unlike purging deletes).  I
even had to rewrite some tests that started failing when this change
went into effect, due to past assumptions about docids.

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference


> On Tue, Sep 6, 2011 at 2:00 PM, Yonik Seeley <yo...@lucidimagination.com>
> wrote:
>>
>> It's also transient, in that it can change across commit calls (either by
>> deleted documents being squeezed out, or by non-adjacent segments being
>> merged).
>
>

Re: SOLR Sorting algorithm

Posted by Ted Dunning <te...@gmail.com>.
Mostly these would preserve ordering, I expect.

(not a guarantee by any stretch)

On Tue, Sep 6, 2011 at 2:00 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> It's also transient, in that it can change across commit calls (either by
> deleted documents being squeezed out, or by non-adjacent segments being
> merged).
>

Re: SOLR Sorting algorithm

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Sep 6, 2011 at 4:48 PM, BrianK <br...@bonton.com> wrote:
> by "internal document id" you are referring to a field that is not visible to
> us.  We have an id field, I assume this is not the "document id" field you
> are talking about.  Assuming document id is not available to us, is it
> sorting this ascendind/descending  and is the document id simply a
> sequential number assigned as documents are loaded/indexed by solr?

Correct - it's not a field, but just the internal index or "ord" into
the internal data structures.
It's also transient, in that it can change across commit calls (either by
deleted documents being squeezed out, or by non-adjacent segments being merged).

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference

Re: SOLR Sorting algorithm

Posted by BrianK <br...@bonton.com>.
by "internal document id" you are referring to a field that is not visible to
us.  We have an id field, I assume this is not the "document id" field you
are talking about.  Assuming document id is not available to us, is it
sorting this ascendind/descending  and is the document id simply a
sequential number assigned as documents are loaded/indexed by solr?

--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Sorting-algorithm-tp3314295p3314790.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: SOLR Sorting algorithm

Posted by Yonik Seeley <yo...@lucidimagination.com>.
When sorting, ties are broken by the internal document id.  This gives
us a stable (if somewhat arbitrary) sort ordering.
If you want score to be the tiebreaker, you can specify it as the
secondary sort.

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference


On Tue, Sep 6, 2011 at 1:49 PM, BrianK <br...@bonton.com> wrote:
> We are running a SOLR query and are specifying a custom sort field to sort
> our results based on our sort field rather than the default score.  For the
> most part, the results are sorting by our field, but SOLR appears to be
> sorting the results by some other field or alogorithm and it's not the score
> field.  Our documents are populated from a database table and when running a
> similar query/sort against the database we don't get the same sort sequence
> as SOLR even though the sort is on the same field in both systems.
> IMPORTANT NOTE: the sort field/results field is not unique, the search
> results in question have the same value (1 in this case), but the results
> always come out in the same order.
>
> Can someone explain or point me in the right direction to determine how SOLR
> sorts results beyond the field specified in our query string.
>
> Example Query: q=Kitchen Products&sort=sortSequence asc
>
> Example Results:
> name: Product 1
> sortSequence: 1
> score: 1.52221
>
> name: Product 5
> sortSequence: 1
> score: 1.52221
>
> name: Product 3
> sortSequence: 1
> score: 1.53112
>
> name: Product 2
> sortSequence: 2
> score: 1.51112
>
> etc.
>
> Are there hidden fields like document date, creation date, or other field
> that is not visible that might be factored into a sort?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Sorting-algorithm-tp3314295p3314295.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>