You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jamie Johnson <je...@gmail.com> on 2012/02/10 17:44:33 UTC

(Old) SolrCloud Date Sorting issue

Was there a fix recently to address sorting issues for Dates in solr
cloud?  On my cluster I have a date field which when I sort across the
cluster I get incorrect order executing the following query I get

solr/select?distrib=true&q=paul&sort=datetime_dt%20desc&fl=datetime_dt

<result name="response" numFound="46619" start="0">
  <doc>
    <date name="datetime_dt">2009-10-31T16:48:10Z</date>
  </doc>
  <doc>
    <date name="datetime_dt">2009-10-30T20:52:23Z</date>
  </doc>
  <doc>
    <date name="datetime_dt">2009-10-27T03:28:35Z</date>
  </doc>
  <doc>
    <date name="datetime_dt">2009-10-30T00:47:11Z</date>
  </doc>
...

if distrib is set to false, i.e
solr/select?distrib=false&q=paul&sort=datetime_dt%20desc&fl=datetime_dt

<result name="response" numFound="7726" start="0">
  <doc>
    <date name="datetime_dt">2009-10-26T04:39:51Z</date>
  </doc>
  <doc>
    <date name="datetime_dt">2009-10-24T23:24:30Z</date>
  </doc>
  <doc>
    <date name="datetime_dt">2009-10-24T10:53:58Z</date>
  </doc>
  <doc>
    <date name="datetime_dt">2009-10-23T19:14:01Z</date>
  </doc>
  <doc>
    <date name="datetime_dt">2009-10-19T03:15:24Z</date>
  </doc>

Again, I have not noticed this on trunk, but I'm working with a much
smaller data set so it's tough to say for sure right now

Re: (Old) SolrCloud Date Sorting issue

Posted by Jamie Johnson <je...@gmail.com>.
doing some copying I came up with the following

		boolean fsv =
req.getParams().getBool(ResponseBuilder.FIELD_SORT_VALUES,false);
	    if(fsv){
	    	NamedList sortVals = (NamedList) rsp.getValues().get("sort_values");
	      Sort sort = searcher.weightSort(rb.getSortSpec().getSort());
	      SortField[] sortFields = sort==null ? new
SortField[]{SortField.FIELD_SCORE} : sort.getSort();
			for (SortField sortField: sortFields) {
		        String fieldname = sortField.getField();
		        ArrayList<Object> list = (ArrayList<Object>) sortVals.get(fieldname);
		        for(int index = 0; index < removedDocs.length; index ++)
		        	list.remove(removedDocs[index]);
			}
	    }

this seems to have worked, need to do more testing but I don't
understand why it worked, what exactly is this doing?

On Fri, Feb 10, 2012 at 3:12 PM, Jamie Johnson <je...@gmail.com> wrote:
> I'd like to look at the pseudo fields you're talking about (don't
> really understand it right now), but need to get something working in
> the short term.  How do I go about removing these from the sort
> values?
>
> On Fri, Feb 10, 2012 at 3:06 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Fri, Feb 10, 2012 at 2:48 PM, Jamie Johnson <je...@gmail.com> wrote:
>>> So looking at query component it appears to sort the entire doc list
>>> at the end of process, my component is defined after this query so the
>>> doclist that I get should be sorted, right?  To me this should mean
>>> that I can remove items from this list and shift everything left as
>>> needed and it should work fine, but this isn't what appears to be
>>> happening.  For queries that are not distributed I don't see this
>>> issue, only for distributed queries.
>>
>> The document lists from the shards are merged by looking at the sort values.
>> Those are looked up by position in a different part of the response
>> (generated by fsv=true).
>> If you just mess with the doclists, those sort values will no longer
>> "line up" (doc #5 won't correspond to fsv slot #5).
>>
>> Short solution: if you remove a doc, remove that slot from all of the
>> sort values
>>
>> Better solution: We have pseudo-fields now... we should add sort
>> values directly to the documents so this type of parallel structure is
>> no longer needed.
>>
>> -Yonik
>> lucidimagination.com

Re: (Old) SolrCloud Date Sorting issue

Posted by Jamie Johnson <je...@gmail.com>.
I'd like to look at the pseudo fields you're talking about (don't
really understand it right now), but need to get something working in
the short term.  How do I go about removing these from the sort
values?

On Fri, Feb 10, 2012 at 3:06 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Feb 10, 2012 at 2:48 PM, Jamie Johnson <je...@gmail.com> wrote:
>> So looking at query component it appears to sort the entire doc list
>> at the end of process, my component is defined after this query so the
>> doclist that I get should be sorted, right?  To me this should mean
>> that I can remove items from this list and shift everything left as
>> needed and it should work fine, but this isn't what appears to be
>> happening.  For queries that are not distributed I don't see this
>> issue, only for distributed queries.
>
> The document lists from the shards are merged by looking at the sort values.
> Those are looked up by position in a different part of the response
> (generated by fsv=true).
> If you just mess with the doclists, those sort values will no longer
> "line up" (doc #5 won't correspond to fsv slot #5).
>
> Short solution: if you remove a doc, remove that slot from all of the
> sort values
>
> Better solution: We have pseudo-fields now... we should add sort
> values directly to the documents so this type of parallel structure is
> no longer needed.
>
> -Yonik
> lucidimagination.com

Re: (Old) SolrCloud Date Sorting issue

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Feb 10, 2012 at 2:48 PM, Jamie Johnson <je...@gmail.com> wrote:
> So looking at query component it appears to sort the entire doc list
> at the end of process, my component is defined after this query so the
> doclist that I get should be sorted, right?  To me this should mean
> that I can remove items from this list and shift everything left as
> needed and it should work fine, but this isn't what appears to be
> happening.  For queries that are not distributed I don't see this
> issue, only for distributed queries.

The document lists from the shards are merged by looking at the sort values.
Those are looked up by position in a different part of the response
(generated by fsv=true).
If you just mess with the doclists, those sort values will no longer
"line up" (doc #5 won't correspond to fsv slot #5).

Short solution: if you remove a doc, remove that slot from all of the
sort values

Better solution: We have pseudo-fields now... we should add sort
values directly to the documents so this type of parallel structure is
no longer needed.

-Yonik
lucidimagination.com

Re: (Old) SolrCloud Date Sorting issue

Posted by Jamie Johnson <je...@gmail.com>.
So looking at query component it appears to sort the entire doc list
at the end of process, my component is defined after this query so the
doclist that I get should be sorted, right?  To me this should mean
that I can remove items from this list and shift everything left as
needed and it should work fine, but this isn't what appears to be
happening.  For queries that are not distributed I don't see this
issue, only for distributed queries.


On Fri, Feb 10, 2012 at 2:23 PM, Jamie Johnson <je...@gmail.com> wrote:
> It looks like everything works fine without my custom component, which
> is good for Solr, bad for me.  The custom component does some
> additional authorization processing to remove docs that the user does
> not have access to.  To do this we're iterating through
> responseBuilder.getResults().docList and removing any documents that
> the user should not be able to see.  Removing bad items works fine,
> but sorting isn't quite right after doing this.  At this point is the
> docList completely sorted, or is there an optimization inside solr
> which only sorts the top X documents?  I'm grabbing at straws here
> because for the life of me I can't figure out what is causing this.
>
>
> I'm doing all of the filtering inside of the process method in my
> custom SearchComponent.
>
> On Fri, Feb 10, 2012 at 12:41 PM, Jamie Johnson <je...@gmail.com> wrote:
>> This is an snapshot of the solrcloud branch from somewhere between a
>> year and 6 months ago (can't really remember off hand) with some
>> custom components, I'm thinking that the custom components may be
>> messing something up.  I'm removing them now to test this without
>> those to make sure that the issue is on my end, will report shortly.
>>
>> On Fri, Feb 10, 2012 at 12:16 PM, Yonik Seeley
>> <yo...@lucidimagination.com> wrote:
>>> On Fri, Feb 10, 2012 at 11:44 AM, Jamie Johnson <je...@gmail.com> wrote:
>>>> Was there a fix recently to address sorting issues for Dates in solr
>>>> cloud?  On my cluster I have a date field which when I sort across the
>>>> cluster I get incorrect order executing the following query I get
>>>
>>> Yikes!  There haven't been any fixes recently that I know of.
>>> What version of Solr is this?
>>>
>>> -Yonik
>>> lucidimagination.com

Re: (Old) SolrCloud Date Sorting issue

Posted by Jamie Johnson <je...@gmail.com>.
It looks like everything works fine without my custom component, which
is good for Solr, bad for me.  The custom component does some
additional authorization processing to remove docs that the user does
not have access to.  To do this we're iterating through
responseBuilder.getResults().docList and removing any documents that
the user should not be able to see.  Removing bad items works fine,
but sorting isn't quite right after doing this.  At this point is the
docList completely sorted, or is there an optimization inside solr
which only sorts the top X documents?  I'm grabbing at straws here
because for the life of me I can't figure out what is causing this.


I'm doing all of the filtering inside of the process method in my
custom SearchComponent.

On Fri, Feb 10, 2012 at 12:41 PM, Jamie Johnson <je...@gmail.com> wrote:
> This is an snapshot of the solrcloud branch from somewhere between a
> year and 6 months ago (can't really remember off hand) with some
> custom components, I'm thinking that the custom components may be
> messing something up.  I'm removing them now to test this without
> those to make sure that the issue is on my end, will report shortly.
>
> On Fri, Feb 10, 2012 at 12:16 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Fri, Feb 10, 2012 at 11:44 AM, Jamie Johnson <je...@gmail.com> wrote:
>>> Was there a fix recently to address sorting issues for Dates in solr
>>> cloud?  On my cluster I have a date field which when I sort across the
>>> cluster I get incorrect order executing the following query I get
>>
>> Yikes!  There haven't been any fixes recently that I know of.
>> What version of Solr is this?
>>
>> -Yonik
>> lucidimagination.com

Re: (Old) SolrCloud Date Sorting issue

Posted by Jamie Johnson <je...@gmail.com>.
This is an snapshot of the solrcloud branch from somewhere between a
year and 6 months ago (can't really remember off hand) with some
custom components, I'm thinking that the custom components may be
messing something up.  I'm removing them now to test this without
those to make sure that the issue is on my end, will report shortly.

On Fri, Feb 10, 2012 at 12:16 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Feb 10, 2012 at 11:44 AM, Jamie Johnson <je...@gmail.com> wrote:
>> Was there a fix recently to address sorting issues for Dates in solr
>> cloud?  On my cluster I have a date field which when I sort across the
>> cluster I get incorrect order executing the following query I get
>
> Yikes!  There haven't been any fixes recently that I know of.
> What version of Solr is this?
>
> -Yonik
> lucidimagination.com

Re: (Old) SolrCloud Date Sorting issue

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Feb 10, 2012 at 11:44 AM, Jamie Johnson <je...@gmail.com> wrote:
> Was there a fix recently to address sorting issues for Dates in solr
> cloud?  On my cluster I have a date field which when I sort across the
> cluster I get incorrect order executing the following query I get

Yikes!  There haven't been any fixes recently that I know of.
What version of Solr is this?

-Yonik
lucidimagination.com