You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "J.B. Langston (JIRA)" <ji...@apache.org> on 2014/03/19 15:46:57 UTC

[jira] [Comment Edited] (SOLR-5878) Solr returns duplicates when using distributed search with group.format=simple

    [ https://issues.apache.org/jira/browse/SOLR-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940522#comment-13940522 ] 

J.B. Langston edited comment on SOLR-5878 at 3/19/14 2:46 PM:
--------------------------------------------------------------

Sorry for not following protocol. Do you want me to move to the list now or continue here since it's already open?

I may have misstated the problem here. The duplicates aren't the problem; rather that it ignores the rows parameter when using sharding and group.format=simple at the same time.  You'll notice that there is a rows=5 param in the url, but in the output there are 16 documents returned.  This prevents the use of rows and start params to page through the data.

You're right about the cont_stub field not being the unique key. id is the unique key and indeed there are multiple documents with the same value for cont_stub and different values for the unique key.  I was filing this on behalf of a customer and as I was reproducing it, I noticed the duplicates and got distracted by those. Sorry for the confusion; I can update the description to reflect the true problem if you like, or I ask on the mailing list before continuing here.


was (Author: jblangston@datastax.com):
Sorry for not following protocol. Do you want me to move to the list now or continue here since it's already open?

I may have misstated the problem here. The duplicates aren't the problem; rather that it ignores the rows parameter when using sharding and group.format=simple at the same time.  You'll notice that there is a rows=5 param in the url, but the output and there are 16 documents returned.  This prevents the use of rows and start params to page through the data.

You're right about the cont_stub field not being the unique key. id is the unique key and indeed there are multiple documents with the same value for cont_stub and different values for the unique key.  I was filing this on behalf of a customer and as I was reproducing it, I noticed the duplicates and got distracted by those. Sorry for the confusion; I can update the description to reflect the true problem if you like, or I ask on the mailing list before continuing here.

> Solr returns duplicates when using distributed search with group.format=simple
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-5878
>                 URL: https://issues.apache.org/jira/browse/SOLR-5878
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.6
>            Reporter: J.B. Langston
>
> Solr returns duplicate documents when group.format=simple is supplied on a distributed search. This does not happen on the standard group format or when not using distributed search. 
> For example:
> {code}
> http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1&q=*%3A*&fq=evt_stub%3A(452deed8-c3a2-49a8-878d-8356da315e6a)&start=0&rows=5&fl=cont_stub&wt=xml&indent=true&group=true&group.field=cont_stub&group.format=simple&group.limit=1000
> {code}
> Returns:
> {code}
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
>   <int name="status">0</int>
>   <int name="QTime">253</int>
> </lst>
> <lst name="grouped">
>   <lst name="cont_stub">
>     <int name="matches">56</int>
>     <result name="doclist" numFound="56" start="0" maxScore="1.0">
>       <doc>
>         <str name="cont_stub">e60eb0f9-bce7-4da9-819c-b356dfc1c4f7</str></doc>
>       <doc>
>         <str name="cont_stub">e60eb0f9-bce7-4da9-819c-b356dfc1c4f7</str></doc>
>       <doc>
>         <str name="cont_stub">e60eb0f9-bce7-4da9-819c-b356dfc1c4f7</str></doc>
>       <doc>
>         <str name="cont_stub">faf0a7ea-4252-4eda-990a-4fcc6b5e63e3</str></doc>
>       <doc>
>         <str name="cont_stub">faf0a7ea-4252-4eda-990a-4fcc6b5e63e3</str></doc>
>       <doc>
>         <str name="cont_stub">faf0a7ea-4252-4eda-990a-4fcc6b5e63e3</str></doc>
>       <doc>
>         <str name="cont_stub">dd94ec0b-f171-441d-8fb8-af6a22ebf168</str></doc>
>       <doc>
>         <str name="cont_stub">dd94ec0b-f171-441d-8fb8-af6a22ebf168</str></doc>
>       <doc>
>         <str name="cont_stub">dd94ec0b-f171-441d-8fb8-af6a22ebf168</str></doc>
>       <doc>
>         <str name="cont_stub">feede138-2fe4-4742-ac63-e7cecfd86c81</str></doc>
>       <doc>
>         <str name="cont_stub">feede138-2fe4-4742-ac63-e7cecfd86c81</str></doc>
>       <doc>
>         <str name="cont_stub">feede138-2fe4-4742-ac63-e7cecfd86c81</str></doc>
>       <doc>
>         <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
>       <doc>
>         <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
>       <doc>
>         <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
>       <doc>
>         <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
>     </result>
>   </lst>
> </lst>
> </response>
> {code}
> It should only return 5 documents.  Removing the distributed search and searching on either core will return the requested number of rows. Removing group.format=simple will also return the requested number of rows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org