You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tomás Fernández Löbbe (JIRA)" <ji...@apache.org> on 2014/07/29 23:25:42 UTC
[jira] [Commented] (SOLR-6299) Facet count on facet queries returns
different results if #shards > 1
[ https://issues.apache.org/jira/browse/SOLR-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078396#comment-14078396 ]
Tomás Fernández Löbbe commented on SOLR-6299:
---------------------------------------------
I think the issue must be with the combination grouping+facet-query. Grouping already gives you bad group counts if you don't make sure that all docs of a group fall in the same shard.
> Facet count on facet queries returns different results if #shards > 1
> ---------------------------------------------------------------------
>
> Key: SOLR-6299
> URL: https://issues.apache.org/jira/browse/SOLR-6299
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 5.0
> Reporter: Vamsee Yarlagadda
> Labels: faceting
>
> I am trying to run some facet counts on facet queries and looks like i am getting different counts if i use >1 shards in the SolrCloud cluster.
> Here is the upstream unit test:
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/request/SimpleFacetsTest.java#L173
> Setup:
> * Ingested 5 solr docs.
> {code}
> {
> "responseHeader": {
> "status": 0,
> "QTime": 22,
> "params": {
> "indent": "true",
> "q": "*:*",
> "_": "1406346687337",
> "wt": "json"
> }
> },
> "response": {
> "numFound": 5,
> "start": 0,
> "maxScore": 1,
> "docs": [
> {
> "id": 2004,
> "range_facet_l": [
> 2004
> ],
> "hotel_s1": "b",
> "airport_s1": "ams",
> "duration_i1": 5,
> "_version_": 1474661321774465000,
> "timestamp": "2014-07-26T03:50:27.975Z",
> "multiDefault": [
> "muLti-Default"
> ],
> "intDefault": 42
> },
> {
> "id": 2000,
> "range_facet_l": [
> 2000
> ],
> "hotel_s1": "a",
> "airport_s1": "ams",
> "duration_i1": 5,
> "_version_": 1474661323604230100,
> "timestamp": "2014-07-26T03:50:29.734Z",
> "multiDefault": [
> "muLti-Default"
> ],
> "intDefault": 42
> },
> {
> "id": 2003,
> "range_facet_l": [
> 2003
> ],
> "hotel_s1": "b",
> "airport_s1": "ams",
> "duration_i1": 5,
> "_version_": 1474661326312702000,
> "timestamp": "2014-07-26T03:50:32.317Z",
> "multiDefault": [
> "muLti-Default"
> ],
> "intDefault": 42
> },
> {
> "id": 2001,
> "range_facet_l": [
> 2001
> ],
> "hotel_s1": "a",
> "airport_s1": "dus",
> "duration_i1": 10,
> "_version_": 1474661326389248000,
> "timestamp": "2014-07-26T03:50:32.375Z",
> "multiDefault": [
> "muLti-Default"
> ],
> "intDefault": 42
> },
> {
> "id": 2002,
> "range_facet_l": [
> 2002
> ],
> "hotel_s1": "b",
> "airport_s1": "ams",
> "duration_i1": 10,
> "_version_": 1474661326464745500,
> "timestamp": "2014-07-26T03:50:32.446Z",
> "multiDefault": [
> "muLti-Default"
> ],
> "intDefault": 42
> }
> ]
> }
> }
> {code}
> Here is the query being run:
> {code}
> Test code:
> assertQ(
> req(
> "q", "*:*",
> "fq", "id:[2000 TO 2004]",
> "group", "true",
> "group.facet", "true",
> "group.field", "hotel_s1",
> "facet", "true",
> "facet.limit", facetLimit,
> "facet.query", "airport_s1:ams"
> ),
> "//lst[@name='facet_queries']/int[@name='airport_s1:ams'][.='2']"
> );
> $ curl "http://localhost:8983/solr/collection1/select?facet=true&facet.query=airport_s1%3Aams&q=*%3A*&facet.limit=-100&group.field=hotel_s1&group=true&group.facet=true&fq=id%3A%5B2000+TO+2004%5D&indent=true&wt=xml"
> {code}
> Now, if i issue a query statement - On *1* shard system (Works as expected)
> {code}
> $ curl "http://localhost:8983/solr/collection1/select?facet=true&facet.query=airport_s1%3Aams&q=*%3A*&facet.limit=-100&group.field=hotel_s1&group=true&group.facet=true&fq=id%3A%5B2000+TO+2004%5D&indent=true&wt=xml"
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">17</int>
> <lst name="params">
> <str name="facet">true</str>
> <str name="indent">true</str>
> <str name="facet.query">airport_s1:ams</str>
> <str name="q">*:*</str>
> <str name="facet.limit">-100</str>
> <str name="group.field">hotel_s1</str>
> <str name="group">true</str>
> <str name="wt">xml</str>
> <str name="fq">id:[2000 TO 2004]</str>
> <str name="group.facet">true</str>
> </lst>
> </lst>
> <lst name="grouped">
> <lst name="hotel_s1">
> <int name="matches">5</int>
> <arr name="groups">
> <lst>
> <str name="groupValue">a</str>
> <result name="doclist" numFound="2" start="0">
> <doc>
> <int name="id">2001</int>
> <arr name="range_facet_l">
> <long>2001</long>
> </arr>
> <str name="hotel_s1">a</str>
> <str name="airport_s1">dus</str>
> <int name="duration_i1">10</int>
> <long name="_version_">1474989437819551744</long>
> <date name="timestamp">2014-07-29T18:45:43.819Z</date>
> <arr name="multiDefault">
> <str>muLti-Default</str>
> </arr>
> <int name="intDefault">42</int></doc>
> </result>
> </lst>
> <lst>
> <str name="groupValue">b</str>
> <result name="doclist" numFound="3" start="0">
> <doc>
> <int name="id">2003</int>
> <arr name="range_facet_l">
> <long>2003</long>
> </arr>
> <str name="hotel_s1">b</str>
> <str name="airport_s1">ams</str>
> <int name="duration_i1">5</int>
> <long name="_version_">1474989439611568128</long>
> <date name="timestamp">2014-07-29T18:45:45.528Z</date>
> <arr name="multiDefault">
> <str>muLti-Default</str>
> </arr>
> <int name="intDefault">42</int></doc>
> </result>
> </lst>
> </arr>
> </lst>
> </lst>
> <lst name="facet_counts">
> <lst name="facet_queries">
> <int name="airport_s1:ams">2</int>
> </lst>
> <lst name="facet_fields"/>
> <lst name="facet_dates"/>
> <lst name="facet_ranges"/>
> </lst>
> </response>
> {code}
> Now, if i run the same query on 2 shard system, i see facet count as *3* instead of *2*.
> Solr result on 2 shard cluster:
> {code}
> [systest@search-testing-c5-1 search]$ curl "http://localhost:8983/solr/collection1/select?facet=true&facet.query=airport_s1%3Aams&q=*%3A*&facet.limit=-100&group.field=hotel_s1&group=true&group.facet=true&fq=id%3A%5B2000+TO+2004%5D&indent=true&wt=xml"
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">69</int>
> <lst name="params">
> <str name="facet">true</str>
> <str name="indent">true</str>
> <str name="facet.query">airport_s1:ams</str>
> <str name="q">*:*</str>
> <str name="facet.limit">-100</str>
> <str name="group.field">hotel_s1</str>
> <str name="group">true</str>
> <str name="wt">xml</str>
> <str name="fq">id:[2000 TO 2004]</str>
> <str name="group.facet">true</str>
> </lst>
> </lst>
> <lst name="grouped">
> <lst name="hotel_s1">
> <int name="matches">5</int>
> <arr name="groups">
> <lst>
> <str name="groupValue">b</str>
> <result name="doclist" numFound="3" start="0" maxScore="1.0">
> <doc>
> <int name="id">2002</int>
> <arr name="range_facet_l">
> <long>2002</long>
> </arr>
> <str name="hotel_s1">b</str>
> <str name="airport_s1">ams</str>
> <int name="duration_i1">10</int>
> <long name="_version_">1474661326464745472</long>
> <date name="timestamp">2014-07-26T03:50:32.446Z</date>
> <arr name="multiDefault">
> <str>muLti-Default</str>
> </arr>
> <int name="intDefault">42</int></doc>
> </result>
> </lst>
> <lst>
> <str name="groupValue">a</str>
> <result name="doclist" numFound="2" start="0" maxScore="1.0">
> <doc>
> <int name="id">2001</int>
> <arr name="range_facet_l">
> <long>2001</long>
> </arr>
> <str name="hotel_s1">a</str>
> <str name="airport_s1">dus</str>
> <int name="duration_i1">10</int>
> <long name="_version_">1474661326389248000</long>
> <date name="timestamp">2014-07-26T03:50:32.375Z</date>
> <arr name="multiDefault">
> <str>muLti-Default</str>
> </arr>
> <int name="intDefault">42</int></doc>
> </result>
> </lst>
> </arr>
> </lst>
> </lst>
> <lst name="facet_counts">
> <lst name="facet_queries">
> <int name="airport_s1:ams">3</int>
> </lst>
> <lst name="facet_fields"/>
> <lst name="facet_dates"/>
> <lst name="facet_ranges"/>
> </lst>
> </response>
> {code}
> In order to replicate this, we can simply run the above test on >1 shard system and the solr response will be different.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org