You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brad Dewar <bd...@stfx.ca> on 2010/08/17 22:10:07 UTC

sort order of "missing" items

When items are sorted, are all the docs with the sort field missing considered "tied" in terms of their sort order, or are they "indeterminate", or do they have some arbitrary order imposed on them (e.g. _docid_)?

For example, would "b" be considered as part of the sort in the following query, or would all the missing 'a' fields be in some kind of order already, thus making the sort algorithm never check the 'b' field?

/select/?q=-a:[* TO *]&sort=a asc,b asc

And would sortMissingLast / sortMissingFirst affect the answer to that question?

I've been seeing weird behaviour in my index with queries (a little) like this one, but I haven't pinpointed the problem yet.

Brad



RE: sort order of "missing" items

Posted by Brad Dewar <bd...@stfx.ca>.
Just to close this thread:

Missing values are sorted as though equal to each other, as you would expect, and ties are broken only after all explicit sort criteria are evaluated.

In my specific case, the problem was that the application was querying field "a", but was in fact sorting by a copyField of "a", which was not necessarily equivalent.  So when "a" was missing, I was expecting a sort by "b", but instead got sort by "a-prime", then "b".

D'oh!

Brad




-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
Sent: August-18-10 4:47 PM
To: solr-user@lucene.apache.org
Subject: Re: sort order of "missing" items

On Tue, Aug 17, 2010 at 4:10 PM, Brad Dewar <bd...@stfx.ca> wrote:
> When items are sorted, are all the docs with the sort field missing considered "tied" in terms of their sort order, or are they "indeterminate", or do they have some arbitrary order imposed on them (e.g. _docid_)?

If it's a numeric field, it sorts as if the value was 0.
If it's a string field, a missing value is less than other values.
All ties (regardless of missing or not) are broken by docid, and all
docs with a missing value are tied.

The "string" field from the solr example schema has
sortMissingLast="true" set, and so missing will sort after documents
with the value, regardless of sort order.  Here's the blurb from the
example schema:

    <!-- The optional sortMissingLast and sortMissingFirst attributes are
         currently supported on types that are sorted internally as strings.
               This includes
"string","boolean","sint","slong","sfloat","sdouble","pdate"
       - If sortMissingLast="true", then a sort on this field will
cause documents
         without the field to come after documents with the field,
         regardless of the requested sort order (asc or desc).
       - If sortMissingFirst="true", then a sort on this field will
cause documents
         without the field to come before documents with the field,
         regardless of the requested sort order.
       - If sortMissingLast="false" and sortMissingFirst="false" (the default),
         then default lucene sorting will be used which places docs without the
         field first in an ascending sort and last in a descending sort.
    -->

> For example, would "b" be considered as part of the sort in the following query, or would all the missing 'a' fields be in some kind of order already, thus making the sort algorithm never check the 'b' field?
>
> /select/?q=-a:[* TO *]&sort=a asc,b asc
>
> And would sortMissingLast / sortMissingFirst affect the answer to that question?
>
> I've been seeing weird behaviour in my index with queries (a little) like this one, but I haven't pinpointed the problem yet.

Are you using Solr 1.4?  There was a bug with sortMissingLast/sortMissingFirst.
https://issues.apache.org/jira/browse/SOLR-1777

-Yonik
http://www.lucidimagination.com

Re: sort order of "missing" items

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Aug 17, 2010 at 4:10 PM, Brad Dewar <bd...@stfx.ca> wrote:
> When items are sorted, are all the docs with the sort field missing considered "tied" in terms of their sort order, or are they "indeterminate", or do they have some arbitrary order imposed on them (e.g. _docid_)?

If it's a numeric field, it sorts as if the value was 0.
If it's a string field, a missing value is less than other values.
All ties (regardless of missing or not) are broken by docid, and all
docs with a missing value are tied.

The "string" field from the solr example schema has
sortMissingLast="true" set, and so missing will sort after documents
with the value, regardless of sort order.  Here's the blurb from the
example schema:

    <!-- The optional sortMissingLast and sortMissingFirst attributes are
         currently supported on types that are sorted internally as strings.
               This includes
"string","boolean","sint","slong","sfloat","sdouble","pdate"
       - If sortMissingLast="true", then a sort on this field will
cause documents
         without the field to come after documents with the field,
         regardless of the requested sort order (asc or desc).
       - If sortMissingFirst="true", then a sort on this field will
cause documents
         without the field to come before documents with the field,
         regardless of the requested sort order.
       - If sortMissingLast="false" and sortMissingFirst="false" (the default),
         then default lucene sorting will be used which places docs without the
         field first in an ascending sort and last in a descending sort.
    -->

> For example, would "b" be considered as part of the sort in the following query, or would all the missing 'a' fields be in some kind of order already, thus making the sort algorithm never check the 'b' field?
>
> /select/?q=-a:[* TO *]&sort=a asc,b asc
>
> And would sortMissingLast / sortMissingFirst affect the answer to that question?
>
> I've been seeing weird behaviour in my index with queries (a little) like this one, but I haven't pinpointed the problem yet.

Are you using Solr 1.4?  There was a bug with sortMissingLast/sortMissingFirst.
https://issues.apache.org/jira/browse/SOLR-1777

-Yonik
http://www.lucidimagination.com