You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tomás Fernández Löbbe (JIRA)" <ji...@apache.org> on 2014/07/29 23:25:42 UTC

[jira] [Commented] (SOLR-6299) Facet count on facet queries returns different results if #shards > 1

    [ https://issues.apache.org/jira/browse/SOLR-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078396#comment-14078396 ] 

Tomás Fernández Löbbe commented on SOLR-6299:
---------------------------------------------

I think the issue must be with the combination grouping+facet-query. Grouping already gives you bad group counts if you don't make sure that all docs of a group fall in the same shard. 

> Facet count on facet queries returns different results if #shards > 1
> ---------------------------------------------------------------------
>
>                 Key: SOLR-6299
>                 URL: https://issues.apache.org/jira/browse/SOLR-6299
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.0
>            Reporter: Vamsee Yarlagadda
>              Labels: faceting
>
> I am trying to run some facet counts on facet queries and looks like i am getting different counts if i use >1 shards in the SolrCloud cluster.
> Here is the upstream unit test:
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/request/SimpleFacetsTest.java#L173
> Setup:
> * Ingested 5 solr docs.
> {code}
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 22,
>     "params": {
>       "indent": "true",
>       "q": "*:*",
>       "_": "1406346687337",
>       "wt": "json"
>     }
>   },
>   "response": {
>     "numFound": 5,
>     "start": 0,
>     "maxScore": 1,
>     "docs": [
>       {
>         "id": 2004,
>         "range_facet_l": [
>           2004
>         ],
>         "hotel_s1": "b",
>         "airport_s1": "ams",
>         "duration_i1": 5,
>         "_version_": 1474661321774465000,
>         "timestamp": "2014-07-26T03:50:27.975Z",
>         "multiDefault": [
>           "muLti-Default"
>         ],
>         "intDefault": 42
>       },
>       {
>         "id": 2000,
>         "range_facet_l": [
>           2000
>         ],
>         "hotel_s1": "a",
>         "airport_s1": "ams",
>         "duration_i1": 5,
>         "_version_": 1474661323604230100,
>         "timestamp": "2014-07-26T03:50:29.734Z",
>         "multiDefault": [
>           "muLti-Default"
>         ],
>         "intDefault": 42
>       },
>       {
>         "id": 2003,
>         "range_facet_l": [
>           2003
>         ],
>         "hotel_s1": "b",
>         "airport_s1": "ams",
>         "duration_i1": 5,
>         "_version_": 1474661326312702000,
>         "timestamp": "2014-07-26T03:50:32.317Z",
>         "multiDefault": [
>           "muLti-Default"
>         ],
>         "intDefault": 42
>       },
>       {
>         "id": 2001,
>         "range_facet_l": [
>           2001
>         ],
>         "hotel_s1": "a",
>         "airport_s1": "dus",
>         "duration_i1": 10,
>         "_version_": 1474661326389248000,
>         "timestamp": "2014-07-26T03:50:32.375Z",
>         "multiDefault": [
>           "muLti-Default"
>         ],
>         "intDefault": 42
>       },
>       {
>         "id": 2002,
>         "range_facet_l": [
>           2002
>         ],
>         "hotel_s1": "b",
>         "airport_s1": "ams",
>         "duration_i1": 10,
>         "_version_": 1474661326464745500,
>         "timestamp": "2014-07-26T03:50:32.446Z",
>         "multiDefault": [
>           "muLti-Default"
>         ],
>         "intDefault": 42
>       }
>     ]
>   }
> }
> {code}
> Here is the query being run:
> {code}
> Test code:
>     assertQ(
>         req(
>             "q", "*:*",
>             "fq", "id:[2000 TO 2004]",
>             "group", "true",
>             "group.facet", "true",
>             "group.field", "hotel_s1",
>             "facet", "true",
>             "facet.limit", facetLimit,
>             "facet.query", "airport_s1:ams"
>         ),
>         "//lst[@name='facet_queries']/int[@name='airport_s1:ams'][.='2']"
>     );
> $ curl  "http://localhost:8983/solr/collection1/select?facet=true&facet.query=airport_s1%3Aams&q=*%3A*&facet.limit=-100&group.field=hotel_s1&group=true&group.facet=true&fq=id%3A%5B2000+TO+2004%5D&indent=true&wt=xml" 
> {code}
> Now, if i issue a query statement - On *1* shard system (Works as expected)
> {code}
> $ curl  "http://localhost:8983/solr/collection1/select?facet=true&facet.query=airport_s1%3Aams&q=*%3A*&facet.limit=-100&group.field=hotel_s1&group=true&group.facet=true&fq=id%3A%5B2000+TO+2004%5D&indent=true&wt=xml" 
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
>   <int name="status">0</int>
>   <int name="QTime">17</int>
>   <lst name="params">
>     <str name="facet">true</str>
>     <str name="indent">true</str>
>     <str name="facet.query">airport_s1:ams</str>
>     <str name="q">*:*</str>
>     <str name="facet.limit">-100</str>
>     <str name="group.field">hotel_s1</str>
>     <str name="group">true</str>
>     <str name="wt">xml</str>
>     <str name="fq">id:[2000 TO 2004]</str>
>     <str name="group.facet">true</str>
>   </lst>
> </lst>
> <lst name="grouped">
>   <lst name="hotel_s1">
>     <int name="matches">5</int>
>     <arr name="groups">
>       <lst>
>         <str name="groupValue">a</str>
>         <result name="doclist" numFound="2" start="0">
>           <doc>
>             <int name="id">2001</int>
>             <arr name="range_facet_l">
>               <long>2001</long>
>             </arr>
>             <str name="hotel_s1">a</str>
>             <str name="airport_s1">dus</str>
>             <int name="duration_i1">10</int>
>             <long name="_version_">1474989437819551744</long>
>             <date name="timestamp">2014-07-29T18:45:43.819Z</date>
>             <arr name="multiDefault">
>               <str>muLti-Default</str>
>             </arr>
>             <int name="intDefault">42</int></doc>
>         </result>
>       </lst>
>       <lst>
>         <str name="groupValue">b</str>
>         <result name="doclist" numFound="3" start="0">
>           <doc>
>             <int name="id">2003</int>
>             <arr name="range_facet_l">
>               <long>2003</long>
>             </arr>
>             <str name="hotel_s1">b</str>
>             <str name="airport_s1">ams</str>
>             <int name="duration_i1">5</int>
>             <long name="_version_">1474989439611568128</long>
>             <date name="timestamp">2014-07-29T18:45:45.528Z</date>
>             <arr name="multiDefault">
>               <str>muLti-Default</str>
>             </arr>
>             <int name="intDefault">42</int></doc>
>         </result>
>       </lst>
>     </arr>
>   </lst>
> </lst>
> <lst name="facet_counts">
>   <lst name="facet_queries">
>     <int name="airport_s1:ams">2</int>
>   </lst>
>   <lst name="facet_fields"/>
>   <lst name="facet_dates"/>
>   <lst name="facet_ranges"/>
> </lst>
> </response>
> {code}
> Now, if i run the same query on 2 shard system, i see facet count as *3* instead of *2*.
> Solr result on 2 shard cluster:
> {code}
> [systest@search-testing-c5-1 search]$ curl  "http://localhost:8983/solr/collection1/select?facet=true&facet.query=airport_s1%3Aams&q=*%3A*&facet.limit=-100&group.field=hotel_s1&group=true&group.facet=true&fq=id%3A%5B2000+TO+2004%5D&indent=true&wt=xml" 
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
>   <int name="status">0</int>
>   <int name="QTime">69</int>
>   <lst name="params">
>     <str name="facet">true</str>
>     <str name="indent">true</str>
>     <str name="facet.query">airport_s1:ams</str>
>     <str name="q">*:*</str>
>     <str name="facet.limit">-100</str>
>     <str name="group.field">hotel_s1</str>
>     <str name="group">true</str>
>     <str name="wt">xml</str>
>     <str name="fq">id:[2000 TO 2004]</str>
>     <str name="group.facet">true</str>
>   </lst>
> </lst>
> <lst name="grouped">
>   <lst name="hotel_s1">
>     <int name="matches">5</int>
>     <arr name="groups">
>       <lst>
>         <str name="groupValue">b</str>
>         <result name="doclist" numFound="3" start="0" maxScore="1.0">
>           <doc>
>             <int name="id">2002</int>
>             <arr name="range_facet_l">
>               <long>2002</long>
>             </arr>
>             <str name="hotel_s1">b</str>
>             <str name="airport_s1">ams</str>
>             <int name="duration_i1">10</int>
>             <long name="_version_">1474661326464745472</long>
>             <date name="timestamp">2014-07-26T03:50:32.446Z</date>
>             <arr name="multiDefault">
>               <str>muLti-Default</str>
>             </arr>
>             <int name="intDefault">42</int></doc>
>         </result>
>       </lst>
>       <lst>
>         <str name="groupValue">a</str>
>         <result name="doclist" numFound="2" start="0" maxScore="1.0">
>           <doc>
>             <int name="id">2001</int>
>             <arr name="range_facet_l">
>               <long>2001</long>
>             </arr>
>             <str name="hotel_s1">a</str>
>             <str name="airport_s1">dus</str>
>             <int name="duration_i1">10</int>
>             <long name="_version_">1474661326389248000</long>
>             <date name="timestamp">2014-07-26T03:50:32.375Z</date>
>             <arr name="multiDefault">
>               <str>muLti-Default</str>
>             </arr>
>             <int name="intDefault">42</int></doc>
>         </result>
>       </lst>
>     </arr>
>   </lst>
> </lst>
> <lst name="facet_counts">
>   <lst name="facet_queries">
>     <int name="airport_s1:ams">3</int>
>   </lst>
>   <lst name="facet_fields"/>
>   <lst name="facet_dates"/>
>   <lst name="facet_ranges"/>
> </lst>
> </response>
> {code} 
> In order to replicate this, we can simply run the above test on >1 shard system and the solr response will be different.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org