You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by cmti95035 <cm...@gmail.com> on 2017/02/09 04:35:52 UTC

inconsistent result count when doing paging

Hi,

I noticed in our production environment that the returned result count is
inconsistent when doing paging.

For example, for a certain query, for the first page (start = 0, rows = 30),
the corresponding "numFound" is 3402; and then it returned 3378, 3361 for
the 2nd and 3rd page, respectively (start = 30, 60 respectively). A sample
query looks like the following:
q:TMCN:(美丽 OR ?美丽 OR 美丽? OR 丽美)
raw query parameters:
fl=*&start=60&rows=30&shards=172.10.10.3:9080/solr/tm01,172.10.10.3:9080/solr/tm02,172.10.10.3:9080/solr/tm03,172.10.10.3:9080/solr/tm04,172.10.10.3:9080/solr/tm05,172.10.10.3:9080/solr/tm06,172.10.10.3:9080/solr/tm07,172.10.10.3:9080/solr/tm08,172.10.10.3:9080/solr/tm09,172.10.10.3:9080/solr/tm10,172.10.10.3:9080/solr/tm11,172.10.10.3:9080/solr/tm12,172.10.10.3:9080/solr/tm13,172.10.10.3:9080/solr/tm14,172.10.10.3:9080/solr/tm15,172.10.10.3:9080/solr/tm16,172.10.10.3:9080/solr/tm17,172.10.10.3:9080/solr/tm18,172.10.10.3:9080/solr/tm19,172.10.10.3:9080/solr/tm20,172.10.10.3:9080/solr/tm21,172.10.10.3:9080/solr/tm22,172.10.10.3:9080/solr/tm23,172.10.10.3:9080/solr/tm24,172.10.10.3:9080/solr/tm25,172.10.10.3:9080/solr/tm26,172.10.10.3:9080/solr/tm27,172.10.10.3:9080/solr/tm28,172.10.10.3:9080/solr/tm29,172.10.10.3:9080/solr/tm30,172.10.10.3:9080/solr/tm31,172.10.10.3:9080/solr/tm32,172.10.10.3:9080/solr/tm33,172.10.10.3:9080/solr/tm34,172.10.10.3:9080/solr/tm35,172.10.10.3:9080/solr/tm36,172.10.10.3:9080/solr/tm37,172.10.10.3:9080/solr/tm38,172.10.10.3:9080/solr/tm39,172.10.10.3:9080/solr/tm40,172.10.10.3:9080/solr/tm41,172.10.10.3:9080/solr/tm42,172.10.10.3:9080/solr/tm43,172.10.10.3:9080/solr/tm44,172.10.10.3:9080/solr/tm45&facet=true&facet.missing=false&facet.field=intCls&facet.field=appDate&facet.field=TMStatus

The query was against multiple shards at a time. With limited tries I
noticed that the return count is consistent if the number of shards are less
than 5. 

Please help!

Thanks,

James



--
View this message in context: http://lucene.472066.n3.nabble.com/inconsistent-result-count-when-doing-paging-tp4319427.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: inconsistent result count when doing paging

Posted by cmti95035 <cm...@gmail.com>.
Thanks Shawn! I will double check to make sure the uniqueKey are really
unique across all shards.



--
View this message in context: http://lucene.472066.n3.nabble.com/inconsistent-result-count-when-doing-paging-tp4319427p4319633.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: inconsistent result count when doing paging

Posted by Shawn Heisey <ap...@elyograg.org>.
On 2/8/2017 9:35 PM, cmti95035 wrote:
> I noticed in our production environment that the returned result count is
> inconsistent when doing paging.
>
> For example, for a certain query, for the first page (start = 0, rows = 30),
> the corresponding "numFound" is 3402; and then it returned 3378, 3361 for
> the 2nd and 3rd page, respectively (start = 30, 60 respectively). A sample
> query looks like the following:
> q:TMCN:(\u7f8e\u4e3d OR ?\u7f8e\u4e3d OR \u7f8e\u4e3d? OR \u4e3d\u7f8e)
> raw query parameters:
> fl=*&start=60&rows=30&shards=172.10.10.3:9080/solr/tm01,172.10.10.3:9080
<snip>
> /solr/tm44,172.10.10.3:9080/solr/tm45&facet=true&facet.missing=false&facet.field=intCls&facet.field=appDate&facet.field=TMStatus
>
> The query was against multiple shards at a time. With limited tries I
> noticed that the return count is consistent if the number of shards are less
> than 5. 

When a distributed search returns different numFound values on different
requests for the same query, it almost always means that your uniqueKey
field is not unique between the different shards -- you have documents
using the same uniqueKey value in more than one shard.

The reason you see different counts has to do with which shards get
their results back to the coordinating node first, so on one query there
may be a different number of duplicate documents than on a subsequent
query, and the fact that Solr will remove duplicates from the combined
results before calculating the total.  Probably when you reduce the
number of shards, you are removing shards from the list that contain the
duplicate documents, so the problem doesn't happen.

It is *critical* that the uniqueKey field remains unique across the
entire distributed index.  Using SolrCloud with *fully* automatic
document routing will typically ensure that everything is unique across
the entire collection, but in other situations, making sure this happens
will be up to you.

Thanks,
Shawn