You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shawn Heisey (JIRA)" <ji...@apache.org> on 2014/08/07 17:30:11 UTC

[jira] [Comment Edited] (SOLR-6314) Multi-threaded facet counts differ when SolrCloud has >1 shard

    [ https://issues.apache.org/jira/browse/SOLR-6314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089341#comment-14089341 ] 

Shawn Heisey edited comment on SOLR-6314 at 8/7/14 3:28 PM:
------------------------------------------------------------

I can't imagine any situations where having the f1 key in the results twice (with the same info) is useful, so I would argue that the non-distributed query should behave like the distributed query and only return it once.  If someone has a reasonable use case for the current behavior, then you can forget what I said and make distributed work like non-distributed ... but hopefully with an optimization in both instances so that it only calculates the results once.

One thing that a user might do is include two *nearly* identical keys like this, either for migration purposes, or to work around web developers who are completely clueless:

{code}
facet.field=f1&facet.field={!key=foo}f1
{code}

This *does* work properly in Solr 4.9 with a distributed facet -- the original and re-keyed facets are both returned, with identical info.  I don't know if it runs the facet twice or re-uses the info that is already calculated.



was (Author: elyograg):
I can't imagine any situations where having the f1 key in the results twice (with the same info) is useful, so I would argue that the non-distributed query should behave like the distributed query and only return it once.  If someone has a reasonable use case for the current behavior, then you can forget what I said and make distributed work like non-distributed ... but hopefully with an optimization in both instances so that it only calculates the results once.

One thing that a user might do is include two *nearly* identical keys like this, either for migration purposes, or to work around web developers who are completely clueless:

facet.field=f1
facet.field={!key=foo}f1

This *does* work properly in Solr 4.9 with a distributed facet -- the original and re-keyed facets are both returned, with identical info.  I don't know if it runs the facet twice or re-uses the info that is already calculated.


> Multi-threaded facet counts differ when SolrCloud has >1 shard
> --------------------------------------------------------------
>
>                 Key: SOLR-6314
>                 URL: https://issues.apache.org/jira/browse/SOLR-6314
>             Project: Solr
>          Issue Type: Bug
>          Components: SearchComponents - other, SolrCloud
>    Affects Versions: 5.0
>            Reporter: Vamsee Yarlagadda
>            Assignee: Erick Erickson
>
> I am trying to work with multi-threaded faceting on SolrCloud and in the process i was hit by some issues.
> I am currently running the below upstream test on different SolrCloud configurations and i am getting a different result set per configuration.
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/request/TestFaceting.java#L654
> Setup:
> - *Indexed 50 docs into SolrCloud.*
> - *If the SolrCloud has only 1 shard, the facet field query has the below output (which matches with the expected upstream test output - # facet fields ~ 50).*
> {code}
> $ curl  "http://localhost:8983/solr/collection1/select?facet=true&fl=id&indent=true&q=id%3A*&facet.limit=-1&facet.threads=1000&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&rows=1&wt=xml"
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
>   <int name="status">0</int>
>   <int name="QTime">21</int>
>   <lst name="params">
>     <str name="facet">true</str>
>     <str name="fl">id</str>
>     <str name="indent">true</str>
>     <str name="q">id:*</str>
>     <str name="facet.limit">-1</str>
>     <str name="facet.threads">1000</str>
>     <arr name="facet.field">
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>     </arr>
>     <str name="wt">xml</str>
>     <str name="rows">1</str>
>   </lst>
> </lst>
> <result name="response" numFound="50" start="0">
>   <doc>
>     <float name="id">0.0</float></doc>
> </result>
> <lst name="facet_counts">
>   <lst name="facet_queries"/>
>   <lst name="facet_fields">
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>   </lst>
>   <lst name="facet_dates"/>
>   <lst name="facet_ranges"/>
> </lst>
> </response> 
> {code}
> - *Now, if a create a new collection with 2 shards (>1 shard SolrCloud), the same above query results in a different output. (# facet fields ~ 10 ;  Expected 50)*
> {code}
> $ curl  "http://localhost:8983/solr/collection1/select?facet=true&fl=id&indent=true&q=id%3A*&facet.limit=-1&facet.threads=1000&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&rows=1&wt=xml"
>  
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
>   <int name="status">0</int>
>   <int name="QTime">31</int>
>   <lst name="params">
>     <str name="facet">true</str>
>     <str name="fl">id</str>
>     <str name="indent">true</str>
>     <str name="q">id:*</str>
>     <str name="facet.limit">-1</str>
>     <str name="facet.threads">1000</str>
>     <arr name="facet.field">
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>     </arr>
>     <str name="wt">xml</str>
>     <str name="rows">1</str>
>   </lst>
> </lst>
> <result name="response" numFound="50" start="0" maxScore="1.0">
>   <doc>
>     <float name="id">2.0</float></doc>
> </result>
> <lst name="facet_counts">
>   <lst name="facet_queries"/>
>   <lst name="facet_fields">
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>   </lst>
>   <lst name="facet_dates"/>
>   <lst name="facet_ranges"/>
> </lst>
> </response>
> {code}
> This behavior is quite strange as it is being dependent on the number of shards in SolrCloud. It would be great if someone can shed some light on this?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org