You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ronald Matamoros <RM...@searchtechnologies.com> on 2014/05/27 17:25:04 UTC

SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

Good afternoon,

Is the f.<field>.facet.mincount option supported on a distributed search?
Under SolrCloud experiencing that some buckets are ignored when using the option "f.<field>.facet.mincount=1".

The Solr logs do not indicate any error or warning during execution.
The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour.

Replicated the issue on both Solr 4.5.1 & 4.8.1.
Attached a PDF that provides additional details and steps to replicate the behaviour using the out of the box Solr distribution.

Any insight or recommendation to tackle this situation is much appreciated.

Example, 

      Removing the f.<field>.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched.

        <lst name="facet_ranges">
         <lst name="price">
           <lst name="counts">
             <int name="0.0">0</int>
             <int name="50.0">1</int>
             <int name="100.0">0</int>
             <int name="150.0">3</int>
             <int name="200.0">0</int>
             <int name="250.0">1</int>
             <int name="300.0">0</int>
             <int name="350.0">0</int>
             <int name="400.0">0</int>
             <int name="450.0">0</int>
             <int name="500.0">0</int>
             <int name="550.0">0</int>
             <int name="600.0">0</int>
             <int name="650.0">0</int>
             <int name="700.0">0</int>
             <int name="750.0">1</int>
             <int name="800.0">0</int>
             <int name="850.0">0</int>
             <int name="900.0">0</int>
             <int name="950.0">0</int>
           </lst>
           <float name="gap">50.0</float>
           <float name="start">0.0</float>
           <float name="end">1000.0</float>
           <int name="before">0</int>
           <int name="after">0</int>
           <int name="between">2</int>
         </lst>
       </lst>

      Using the f.<field>.facet.mincount=1 option removes the 0 count buckets but will also omit bucket <int name="250.0">

       <lst name="facet_ranges">
          <lst name="price">
            <lst name="counts">
                <int name="50.0">1</int>
                <int name="150.0">3</int>
                <int name="750.0">1</int>
             </lst>
             <float name="gap">50.0</float>
             <float name="start">0.0</float>
             <float name="end">1000.0</float>
             <int name="before">0</int>
             <int name="after">0</int>
             <int name="between">4</int>
          </lst>
        </lst>

     Refreshing the query using the browser's F5 option renders a different bucket list 
     (you may need to refresh multiple times)

       <lst name="facet_ranges">
          <lst name="price">
            <lst name="counts">
                <int name="150.0">3</int>
                <int name="250.0">1</int>
             </lst>
             <float name="gap">50.0</float>
             <float name="start">0.0</float>
             <float name="end">1000.0</float>
             <int name="before">0</int>
             <int name="after">0</int>
             <int name="between">2</int>
          </lst>
        </lst>

Regards 
Ronald Matamoros

Re: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

Posted by Shawn Heisey <so...@elyograg.org>.
On 5/29/2014 12:06 PM, Ronald Matamoros wrote:
> Hi all,
>
> At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket.
> Any insight or recommendation is appreciated.

<snip>

>           Note: the value in <int name="between"> changes with every other refresh of the query. 

Whenever distributed search results change from one query to the next,
it's almost always caused by having documents with the same uniqueKey in
more than one shard.  Solr is able to remove these duplicates from the
results, but there are other aspects of distributed searching that
cannot be dealt with when there are duplicate documents.  This leads to
problems like numFound changing from one request to the next.

To avoid these problems with SolrCloud, you'll likely want to create a
new collection and set its router to compositeId.  This ensures that
indexed documents are distributed to shards according to the hash of
their uniqueKey, not imported directly into the node where you made the
update request.

It's possible that my guess here is completely wrong, but this is
usually the problem.

Thanks,
Shawn


RE: COMMERCIAL: RE: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

Posted by Ronald Matamoros <RM...@searchtechnologies.com>.
Hi Chris,

Created ticket https://issues.apache.org/jira/browse/SOLR-6154
Included to the ticket the data.xml and a PDF with instructions on how to replicate.

Sending different updates to different ports was just how the confluence tutorial made the steps; it does not affect the result of the test

As soon as I have more information will post to the ticket.
Appreciate the interest, let me know about any suggestion or feedback  

Thank you
Ronald Matamoros


-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: 06 June 2014 22:00
To: solr-user@lucene.apache.org
Subject: COMMERCIAL: RE: SolrCloud: facet range option f.<field>.facet.mincount=1 omits buckets on response



Ronald: I'm having a little trouble understading the  steps o reproduce that you are describing -- in particular Step "1 f ii" because i'm not really sure i understand what exactly you are putting in "mem2.xml"

Also: Since you don't appera to be using implicit routing, i'm not clear on why you are explicitly sending differnet updates to different ports in Step "1 f i" -- does that affect the results of your test?


If you can reliably reproduce using modified data from the example, could you please open a Jira outline these steps and atached the modified data to index directly to that issue?  (FWIW: If it doesn't matter what port you use to send which documents, then you should be able to create a single unified "data.xml" file containing all the docs to index in a single
command)



: Date: Thu, 29 May 2014 18:06:38 +0000
: From: Ronald Matamoros <RM...@searchtechnologies.com>
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
: Subject: RE: SolrCloud: facet range option f.<field>.facet.mincount=1 omits
:     buckets on response
: 
: Hi all,
: 
: At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket.
: Any insight or recommendation is appreciated.
: 
: Including the replication steps as text:
: 
: -----------------------------------------------------------------
: Solr versions where issue was replicated.
:   * 4.5.1 (Linux)
:   * 4.8.1 (Windows + Cygwin)
: 
: Replicating
: 
:   1. Created two-shard environment - no replication 
:      https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
: 
:      a. Download Solr distribution from http://lucene.apache.org/solr/downloads.html 
:      b. Unzipped solr-4.8.1.zip to a temporary location: <SOLR_DIST_HOME> 
:      c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar
:      d. Create nodes
:           i. cd <SOLR_DIST_HOME>
:           ii. Via Windows Explorer copied example to node1
:           iii. Via Windows Explorer copied example to node2
: 
:      e. Start Nodes 
:           i. Start node 1
: 
:                cd node1
:                java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar
: 
:           ii. Start node 2
: 
:                cd node2
:                java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
: 
:      f. Fed sample documents
:           i. Out of the box
: 
:                curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem.xml"
:                curl http://localhost:7574/solr/update?commit=true -H "Content-Type: text/xml" -d "@monitor2.xml"
: 
:           ii. Create a copy of mem.xml to mem2.xml; modified identifiers, names, prices and fed
: 
:                curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem2.xml"
: 
:                <add>
:                  <doc>
:                    <field name="id">COMPANY1</field>
:                    <field name="name">COMPANY1 Device</field>
:                    <field name="manu">COMPANY1 Device Mfg</field>
:                    .
:                    <field name="price">190</field>
:                    .
:                  </doc>
:                  <doc>
:                    <field name="id">COMPANY2</field>
:                    <field name="name">COMPANY2 flatscreen</field>
:                    <field name="manu">COMPANY2 Device Mfg.</field>
:                    .
:                    <field name="price">200.00</field>
:                    .
:                  </doc>
:                  <doc>
:                    <field name="id">COMPANY3</field>
:                    <field name="name">COMPANY3 Laptop</field>
:                    <field name="manu">COMPANY3 Device Mfg.</field>
:                    .
:                    <field name="price">800.00</field>
:                    .
:                  </doc>
:                  
:                  </add>
: 
:   2. Query **without** f.price.facet.mincount=1, counts and buckets are OK
: 
:      http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false
:  
:      Only six documents have prices
:  
:           <lst name="facet_ranges">
:             <lst name="price">
:               <lst name="counts">
:                 <int name="0.0">0</int>
:                 <int name="50.0">1</int>
:                 <int name="100.0">0</int>
:                 <int name="150.0">3</int>
:                 <int name="200.0">0</int>
:                 <int name="250.0">1</int>
:                 <int name="300.0">0</int>
:                 <int name="350.0">0</int>
:                 <int name="400.0">0</int>
:                 <int name="450.0">0</int>
:                 <int name="500.0">0</int>
:                 <int name="550.0">0</int>
:                 <int name="600.0">0</int>
:                 <int name="650.0">0</int>
:                 <int name="700.0">0</int>
:                 <int name="750.0">1</int>
:                 <int name="800.0">0</int>
:                 <int name="850.0">0</int>
:                 <int name="900.0">0</int>
:                 <int name="950.0">0</int>
:               </lst>
:               <float name="gap">50.0</float>
:               <float name="start">0.0</float>
:               <float name="end">1000.0</float>
:               <int name="before">0</int>
:               <int name="after">0</int>
:               <int name="between">2</int>
:             </lst>
:           </lst>
: 
:           Note: the value in <int name="between"> changes with every other refresh of the query. 
: 
:   3.    Use of &f.price.facet.mincount=1, missing bucket  <int name="250.0">1</int>
: 
:      http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false&f.price.facet.mincount=1
: 
:           <lst name="facet_ranges">
:             <lst name="price">
:               <lst name="counts">
:                 <int name="50.0">1</int>
:                 <int name="150.0">3</int>
:                 <int name="750.0">1</int>
:               </lst>
:               <float name="gap">50.0</float>
:               <float name="start">0.0</float>
:               <float name="end">1000.0</float>
:               <int name="before">0</int>
:               <int name="after">0</int>
:               <int name="between">4</int>
:             </lst>
:           </lst>
: 
:      Refresh of the Query (may need to do this multiple times with F5 key on browser)
: 
:           <lst name="facet_ranges">
:             <lst name="price">
:               <lst name="counts">
:                 <int name="150.0">3</int>
:                 <int name="250.0">1</int>
:               </lst>
:               <float name="gap">50.0</float>
:               <float name="start">0.0</float>
:               <float name="end">1000.0</float>
:               <int name="before">0</int>
:               <int name="after">0</int>
:               <int name="between">2</int>
:             </lst>
:           </lst>
: 
: Thank you,
: Ronald Matamoros
: 
: -----Original Message-----
: From: Ronald Matamoros [mailto:RMatamoros@searchtechnologies.com]
: Sent: 27 May 2014 16:25
: To: solr-user@lucene.apache.org
: Subject: COMMERCIAL: SolrCloud: facet range option f.<field>.facet.mincount=1 omits buckets on response
: 
: Good afternoon,
: 
: Is the f.<field>.facet.mincount option supported on a distributed search?
: Under SolrCloud experiencing that some buckets are ignored when using the option "f.<field>.facet.mincount=1".
: 
: The Solr logs do not indicate any error or warning during execution.
: The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour.
: 
: Replicated the issue on both Solr 4.5.1 & 4.8.1.
: Attached a PDF that provides additional details and steps to replicate the behaviour using the out of the box Solr distribution.
: 
: Any insight or recommendation to tackle this situation is much appreciated.
: 
: Example,
: 
:       Removing the f.<field>.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched.
: 
:         <lst name="facet_ranges">
:          <lst name="price">
:            <lst name="counts">
:              <int name="0.0">0</int>
:              <int name="50.0">1</int>
:              <int name="100.0">0</int>
:              <int name="150.0">3</int>
:              <int name="200.0">0</int>
:              <int name="250.0">1</int>
:              <int name="300.0">0</int>
:              <int name="350.0">0</int>
:              <int name="400.0">0</int>
:              <int name="450.0">0</int>
:              <int name="500.0">0</int>
:              <int name="550.0">0</int>
:              <int name="600.0">0</int>
:              <int name="650.0">0</int>
:              <int name="700.0">0</int>
:              <int name="750.0">1</int>
:              <int name="800.0">0</int>
:              <int name="850.0">0</int>
:              <int name="900.0">0</int>
:              <int name="950.0">0</int>
:            </lst>
:            <float name="gap">50.0</float>
:            <float name="start">0.0</float>
:            <float name="end">1000.0</float>
:            <int name="before">0</int>
:            <int name="after">0</int>
:            <int name="between">2</int>
:          </lst>
:        </lst>
: 
:       Using the f.<field>.facet.mincount=1 option removes the 0 count buckets but will also omit bucket <int name="250.0">
: 
:        <lst name="facet_ranges">
:           <lst name="price">
:             <lst name="counts">
:                 <int name="50.0">1</int>
:                 <int name="150.0">3</int>
:                 <int name="750.0">1</int>
:              </lst>
:              <float name="gap">50.0</float>
:              <float name="start">0.0</float>
:              <float name="end">1000.0</float>
:              <int name="before">0</int>
:              <int name="after">0</int>
:              <int name="between">4</int>
:           </lst>
:         </lst>
: 
:      Refreshing the query using the browser's F5 option renders a different bucket list 
:      (you may need to refresh multiple times)
: 
:        <lst name="facet_ranges">
:           <lst name="price">
:             <lst name="counts">
:                 <int name="150.0">3</int>
:                 <int name="250.0">1</int>
:              </lst>
:              <float name="gap">50.0</float>
:              <float name="start">0.0</float>
:              <float name="end">1000.0</float>
:              <int name="before">0</int>
:              <int name="after">0</int>
:              <int name="between">2</int>
:           </lst>
:         </lst>
: 
: Regards
: Ronald Matamoros
: 

-Hoss
http://www.lucidworks.com/

RE: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

Posted by Chris Hostetter <ho...@fucit.org>.

Ronald: I'm having a little trouble understading the  steps o reproduce 
that you are describing -- in particular Step "1 f ii" because i'm not 
really sure i understand what exactly you are putting in "mem2.xml"

Also: Since you don't appera to be using implicit routing, i'm not clear 
on why you are explicitly sending differnet updates to different ports in 
Step "1 f i" -- does that affect the results of your test?


If you can reliably reproduce using modified data from the example, could 
you please open a Jira outline these steps and atached the modified data 
to index directly to that issue?  (FWIW: If it doesn't matter what port 
you use to send which documents, then you should be able to create a single 
unified "data.xml" file containing all the docs to index in a single 
command)



: Date: Thu, 29 May 2014 18:06:38 +0000
: From: Ronald Matamoros <RM...@searchtechnologies.com>
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
: Subject: RE: SolrCloud: facet range option f.<field>.facet.mincount=1 omits
:     buckets on response
: 
: Hi all,
: 
: At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket.
: Any insight or recommendation is appreciated.
: 
: Including the replication steps as text:
: 
: -----------------------------------------------------------------
: Solr versions where issue was replicated.
:   * 4.5.1 (Linux)
:   * 4.8.1 (Windows + Cygwin)
: 
: Replicating
: 
:   1. Created two-shard environment - no replication 
:      https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
: 
:      a. Download Solr distribution from http://lucene.apache.org/solr/downloads.html 
:      b. Unzipped solr-4.8.1.zip to a temporary location: <SOLR_DIST_HOME> 
:      c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar
:      d. Create nodes
:           i. cd <SOLR_DIST_HOME>
:           ii. Via Windows Explorer copied example to node1
:           iii. Via Windows Explorer copied example to node2
: 
:      e. Start Nodes 
:           i. Start node 1
: 
:                cd node1
:                java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar
: 
:           ii. Start node 2
: 
:                cd node2
:                java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
: 
:      f. Fed sample documents
:           i. Out of the box
: 
:                curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem.xml"
:                curl http://localhost:7574/solr/update?commit=true -H "Content-Type: text/xml" -d "@monitor2.xml"
: 
:           ii. Create a copy of mem.xml to mem2.xml; modified identifiers, names, prices and fed
: 
:                curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem2.xml"
: 
:                <add>
:                  <doc>
:                    <field name="id">COMPANY1</field>
:                    <field name="name">COMPANY1 Device</field>
:                    <field name="manu">COMPANY1 Device Mfg</field>
:                    .
:                    <field name="price">190</field>
:                    .
:                  </doc>
:                  <doc>
:                    <field name="id">COMPANY2</field>
:                    <field name="name">COMPANY2 flatscreen</field>
:                    <field name="manu">COMPANY2 Device Mfg.</field>
:                    .
:                    <field name="price">200.00</field>
:                    .
:                  </doc>
:                  <doc>
:                    <field name="id">COMPANY3</field>
:                    <field name="name">COMPANY3 Laptop</field>
:                    <field name="manu">COMPANY3 Device Mfg.</field>
:                    .
:                    <field name="price">800.00</field>
:                    .
:                  </doc>
:                  
:                  </add>
: 
:   2. Query **without** f.price.facet.mincount=1, counts and buckets are OK
: 
:      http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false
:  
:      Only six documents have prices
:  
:           <lst name="facet_ranges">
:             <lst name="price">
:               <lst name="counts">
:                 <int name="0.0">0</int>
:                 <int name="50.0">1</int>
:                 <int name="100.0">0</int>
:                 <int name="150.0">3</int>
:                 <int name="200.0">0</int>
:                 <int name="250.0">1</int>
:                 <int name="300.0">0</int>
:                 <int name="350.0">0</int>
:                 <int name="400.0">0</int>
:                 <int name="450.0">0</int>
:                 <int name="500.0">0</int>
:                 <int name="550.0">0</int>
:                 <int name="600.0">0</int>
:                 <int name="650.0">0</int>
:                 <int name="700.0">0</int>
:                 <int name="750.0">1</int>
:                 <int name="800.0">0</int>
:                 <int name="850.0">0</int>
:                 <int name="900.0">0</int>
:                 <int name="950.0">0</int>
:               </lst>
:               <float name="gap">50.0</float>
:               <float name="start">0.0</float>
:               <float name="end">1000.0</float>
:               <int name="before">0</int>
:               <int name="after">0</int>
:               <int name="between">2</int>
:             </lst>
:           </lst>
: 
:           Note: the value in <int name="between"> changes with every other refresh of the query. 
: 
:   3.    Use of &f.price.facet.mincount=1, missing bucket  <int name="250.0">1</int>
: 
:      http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false&f.price.facet.mincount=1
: 
:           <lst name="facet_ranges">
:             <lst name="price">
:               <lst name="counts">
:                 <int name="50.0">1</int>
:                 <int name="150.0">3</int>
:                 <int name="750.0">1</int>
:               </lst>
:               <float name="gap">50.0</float>
:               <float name="start">0.0</float>
:               <float name="end">1000.0</float>
:               <int name="before">0</int>
:               <int name="after">0</int>
:               <int name="between">4</int>
:             </lst>
:           </lst>
: 
:      Refresh of the Query (may need to do this multiple times with F5 key on browser)
: 
:           <lst name="facet_ranges">
:             <lst name="price">
:               <lst name="counts">
:                 <int name="150.0">3</int>
:                 <int name="250.0">1</int>
:               </lst>
:               <float name="gap">50.0</float>
:               <float name="start">0.0</float>
:               <float name="end">1000.0</float>
:               <int name="before">0</int>
:               <int name="after">0</int>
:               <int name="between">2</int>
:             </lst>
:           </lst>
: 
: Thank you,
: Ronald Matamoros
: 
: -----Original Message-----
: From: Ronald Matamoros [mailto:RMatamoros@searchtechnologies.com] 
: Sent: 27 May 2014 16:25
: To: solr-user@lucene.apache.org
: Subject: COMMERCIAL: SolrCloud: facet range option f.<field>.facet.mincount=1 omits buckets on response
: 
: Good afternoon,
: 
: Is the f.<field>.facet.mincount option supported on a distributed search?
: Under SolrCloud experiencing that some buckets are ignored when using the option "f.<field>.facet.mincount=1".
: 
: The Solr logs do not indicate any error or warning during execution.
: The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour.
: 
: Replicated the issue on both Solr 4.5.1 & 4.8.1.
: Attached a PDF that provides additional details and steps to replicate the behaviour using the out of the box Solr distribution.
: 
: Any insight or recommendation to tackle this situation is much appreciated.
: 
: Example, 
: 
:       Removing the f.<field>.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched.
: 
:         <lst name="facet_ranges">
:          <lst name="price">
:            <lst name="counts">
:              <int name="0.0">0</int>
:              <int name="50.0">1</int>
:              <int name="100.0">0</int>
:              <int name="150.0">3</int>
:              <int name="200.0">0</int>
:              <int name="250.0">1</int>
:              <int name="300.0">0</int>
:              <int name="350.0">0</int>
:              <int name="400.0">0</int>
:              <int name="450.0">0</int>
:              <int name="500.0">0</int>
:              <int name="550.0">0</int>
:              <int name="600.0">0</int>
:              <int name="650.0">0</int>
:              <int name="700.0">0</int>
:              <int name="750.0">1</int>
:              <int name="800.0">0</int>
:              <int name="850.0">0</int>
:              <int name="900.0">0</int>
:              <int name="950.0">0</int>
:            </lst>
:            <float name="gap">50.0</float>
:            <float name="start">0.0</float>
:            <float name="end">1000.0</float>
:            <int name="before">0</int>
:            <int name="after">0</int>
:            <int name="between">2</int>
:          </lst>
:        </lst>
: 
:       Using the f.<field>.facet.mincount=1 option removes the 0 count buckets but will also omit bucket <int name="250.0">
: 
:        <lst name="facet_ranges">
:           <lst name="price">
:             <lst name="counts">
:                 <int name="50.0">1</int>
:                 <int name="150.0">3</int>
:                 <int name="750.0">1</int>
:              </lst>
:              <float name="gap">50.0</float>
:              <float name="start">0.0</float>
:              <float name="end">1000.0</float>
:              <int name="before">0</int>
:              <int name="after">0</int>
:              <int name="between">4</int>
:           </lst>
:         </lst>
: 
:      Refreshing the query using the browser's F5 option renders a different bucket list 
:      (you may need to refresh multiple times)
: 
:        <lst name="facet_ranges">
:           <lst name="price">
:             <lst name="counts">
:                 <int name="150.0">3</int>
:                 <int name="250.0">1</int>
:              </lst>
:              <float name="gap">50.0</float>
:              <float name="start">0.0</float>
:              <float name="end">1000.0</float>
:              <int name="before">0</int>
:              <int name="after">0</int>
:              <int name="between">2</int>
:           </lst>
:         </lst>
: 
: Regards 
: Ronald Matamoros
: 

-Hoss
http://www.lucidworks.com/

RE: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

Posted by Ronald Matamoros <RM...@searchtechnologies.com>.
Hi all,

At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket.
Any insight or recommendation is appreciated.

Including the replication steps as text:

-----------------------------------------------------------------
Solr versions where issue was replicated.
  * 4.5.1 (Linux)
  * 4.8.1 (Windows + Cygwin)

Replicating

  1. Created two-shard environment - no replication 
     https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud

     a. Download Solr distribution from http://lucene.apache.org/solr/downloads.html 
     b. Unzipped solr-4.8.1.zip to a temporary location: <SOLR_DIST_HOME> 
     c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar
     d. Create nodes
          i. cd <SOLR_DIST_HOME>
          ii. Via Windows Explorer copied example to node1
          iii. Via Windows Explorer copied example to node2

     e. Start Nodes 
          i. Start node 1

               cd node1
               java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar

          ii. Start node 2

               cd node2
               java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar

     f. Fed sample documents
          i. Out of the box

               curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem.xml"
               curl http://localhost:7574/solr/update?commit=true -H "Content-Type: text/xml" -d "@monitor2.xml"

          ii. Create a copy of mem.xml to mem2.xml; modified identifiers, names, prices and fed

               curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem2.xml"

               <add>
                 <doc>
                   <field name="id">COMPANY1</field>
                   <field name="name">COMPANY1 Device</field>
                   <field name="manu">COMPANY1 Device Mfg</field>
                   .
                   <field name="price">190</field>
                   .
                 </doc>
                 <doc>
                   <field name="id">COMPANY2</field>
                   <field name="name">COMPANY2 flatscreen</field>
                   <field name="manu">COMPANY2 Device Mfg.</field>
                   .
                   <field name="price">200.00</field>
                   .
                 </doc>
                 <doc>
                   <field name="id">COMPANY3</field>
                   <field name="name">COMPANY3 Laptop</field>
                   <field name="manu">COMPANY3 Device Mfg.</field>
                   .
                   <field name="price">800.00</field>
                   .
                 </doc>
                 
                 </add>

  2. Query **without** f.price.facet.mincount=1, counts and buckets are OK

     http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false
 
     Only six documents have prices
 
          <lst name="facet_ranges">
            <lst name="price">
              <lst name="counts">
                <int name="0.0">0</int>
                <int name="50.0">1</int>
                <int name="100.0">0</int>
                <int name="150.0">3</int>
                <int name="200.0">0</int>
                <int name="250.0">1</int>
                <int name="300.0">0</int>
                <int name="350.0">0</int>
                <int name="400.0">0</int>
                <int name="450.0">0</int>
                <int name="500.0">0</int>
                <int name="550.0">0</int>
                <int name="600.0">0</int>
                <int name="650.0">0</int>
                <int name="700.0">0</int>
                <int name="750.0">1</int>
                <int name="800.0">0</int>
                <int name="850.0">0</int>
                <int name="900.0">0</int>
                <int name="950.0">0</int>
              </lst>
              <float name="gap">50.0</float>
              <float name="start">0.0</float>
              <float name="end">1000.0</float>
              <int name="before">0</int>
              <int name="after">0</int>
              <int name="between">2</int>
            </lst>
          </lst>

          Note: the value in <int name="between"> changes with every other refresh of the query. 

  3.    Use of &f.price.facet.mincount=1, missing bucket  <int name="250.0">1</int>

     http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false&f.price.facet.mincount=1

          <lst name="facet_ranges">
            <lst name="price">
              <lst name="counts">
                <int name="50.0">1</int>
                <int name="150.0">3</int>
                <int name="750.0">1</int>
              </lst>
              <float name="gap">50.0</float>
              <float name="start">0.0</float>
              <float name="end">1000.0</float>
              <int name="before">0</int>
              <int name="after">0</int>
              <int name="between">4</int>
            </lst>
          </lst>

     Refresh of the Query (may need to do this multiple times with F5 key on browser)

          <lst name="facet_ranges">
            <lst name="price">
              <lst name="counts">
                <int name="150.0">3</int>
                <int name="250.0">1</int>
              </lst>
              <float name="gap">50.0</float>
              <float name="start">0.0</float>
              <float name="end">1000.0</float>
              <int name="before">0</int>
              <int name="after">0</int>
              <int name="between">2</int>
            </lst>
          </lst>

Thank you,
Ronald Matamoros

-----Original Message-----
From: Ronald Matamoros [mailto:RMatamoros@searchtechnologies.com] 
Sent: 27 May 2014 16:25
To: solr-user@lucene.apache.org
Subject: COMMERCIAL: SolrCloud: facet range option f.<field>.facet.mincount=1 omits buckets on response

Good afternoon,

Is the f.<field>.facet.mincount option supported on a distributed search?
Under SolrCloud experiencing that some buckets are ignored when using the option "f.<field>.facet.mincount=1".

The Solr logs do not indicate any error or warning during execution.
The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour.

Replicated the issue on both Solr 4.5.1 & 4.8.1.
Attached a PDF that provides additional details and steps to replicate the behaviour using the out of the box Solr distribution.

Any insight or recommendation to tackle this situation is much appreciated.

Example, 

      Removing the f.<field>.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched.

        <lst name="facet_ranges">
         <lst name="price">
           <lst name="counts">
             <int name="0.0">0</int>
             <int name="50.0">1</int>
             <int name="100.0">0</int>
             <int name="150.0">3</int>
             <int name="200.0">0</int>
             <int name="250.0">1</int>
             <int name="300.0">0</int>
             <int name="350.0">0</int>
             <int name="400.0">0</int>
             <int name="450.0">0</int>
             <int name="500.0">0</int>
             <int name="550.0">0</int>
             <int name="600.0">0</int>
             <int name="650.0">0</int>
             <int name="700.0">0</int>
             <int name="750.0">1</int>
             <int name="800.0">0</int>
             <int name="850.0">0</int>
             <int name="900.0">0</int>
             <int name="950.0">0</int>
           </lst>
           <float name="gap">50.0</float>
           <float name="start">0.0</float>
           <float name="end">1000.0</float>
           <int name="before">0</int>
           <int name="after">0</int>
           <int name="between">2</int>
         </lst>
       </lst>

      Using the f.<field>.facet.mincount=1 option removes the 0 count buckets but will also omit bucket <int name="250.0">

       <lst name="facet_ranges">
          <lst name="price">
            <lst name="counts">
                <int name="50.0">1</int>
                <int name="150.0">3</int>
                <int name="750.0">1</int>
             </lst>
             <float name="gap">50.0</float>
             <float name="start">0.0</float>
             <float name="end">1000.0</float>
             <int name="before">0</int>
             <int name="after">0</int>
             <int name="between">4</int>
          </lst>
        </lst>

     Refreshing the query using the browser's F5 option renders a different bucket list 
     (you may need to refresh multiple times)

       <lst name="facet_ranges">
          <lst name="price">
            <lst name="counts">
                <int name="150.0">3</int>
                <int name="250.0">1</int>
             </lst>
             <float name="gap">50.0</float>
             <float name="start">0.0</float>
             <float name="end">1000.0</float>
             <int name="before">0</int>
             <int name="after">0</int>
             <int name="between">2</int>
          </lst>
        </lst>

Regards 
Ronald Matamoros