You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Chris Russell (JIRA)" <ji...@apache.org> on 2012/06/28 17:55:44 UTC

[jira] [Created] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Chris Russell created SOLR-3583:
-----------------------------------

             Summary: Percentiles for facets, pivot facets, and distributed pivot facets
                 Key: SOLR-3583
                 URL: https://issues.apache.org/jira/browse/SOLR-3583
             Project: Solr
          Issue Type: Improvement
            Reporter: Chris Russell
            Priority: Minor


Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Russell updated SOLR-3583:
--------------------------------

    Attachment: SOLR-3583.patch

Now works with facet.missing=true
                
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch
>
>
> Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507718#comment-13507718 ] 

Chris Russell edited comment on SOLR-3583 at 12/3/12 6:52 PM:
--------------------------------------------------------------

Think I may have introduced a performance issue re:faceting, unit tests taking a lot longer to run.  Am investigating.
Nevermind, I was accidentally also running the Lucene tests.
                
      was (Author: selah):
    Think I may have introduced a performance issue re:faceting, unit tests taking a lot longer to run.  Am investigating.
                  
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch, SOLR-3583.patch
>
>
> Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482725#comment-13482725 ] 

Chris Russell commented on SOLR-3583:
-------------------------------------

I have gotten some time recently to work on this.
I have disentangled my additions from the SOLR-2894 patch, and will be making a few enhancements before attempting to make it trunk-compatible.
                
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Monica Skidmore (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483218#comment-13483218 ] 

Monica Skidmore commented on SOLR-3583:
---------------------------------------

I have internal customers at my company eager to use this feature; I'm excited that you're updating it for 4.0 and hoping it can be committed soon!
                
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507673#comment-13507673 ] 

Chris Russell edited comment on SOLR-3583 at 11/30/12 9:47 PM:
---------------------------------------------------------------

Updated to trunk 1404975
Disentangled from SOLR-2894.  This patch no longer includes that patch.
You must first apply the 12th Nov 2012 version of SOLR-2894 which I updated to apply to the same version of trunk before applying this patch.

Based on some changes that I had to work around while updating to trunk, I feel that this will not work properly with facet.missing=true.  I am working on correcting this. (Pivot facets changed somewhat significantly in the interim.)
                
      was (Author: selah):
    Updated to trunk 1404975
You must first apply the 12th Nov 2012 version of SOLR-2894 which I updated to apply to the same version of trunk.
Based on some changes that I had to work around while updating to trunk, I feel that this will not work properly with facet.missing=true.  I am working on correcting this.
                  
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch, SOLR-3583.patch
>
>
> Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Russell updated SOLR-3583:
--------------------------------

    Comment: was deleted

(was: Think I may have introduced a performance issue re:faceting, unit tests taking a lot longer to run.  Am investigating.
Nevermind, I was accidentally also running the Lucene tests.)
    
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch, SOLR-3583.patch
>
>
> Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403331#comment-13403331 ] 

Chris Russell edited comment on SOLR-3583 at 6/28/12 6:29 PM:
--------------------------------------------------------------

This patch builds upon the distributed pivot facets introduced in SOLR-2894 and adds the ability to request rudimentary percentiles when faceting.  The percentiles are calculated by using range facets to create "buckets" which divide up the field in question.  A range facet is done on each bucket to determine the number of documents whose value falls within that bucket.  An average value for each bucket is determined by averaging the upper and lower bound of that bucket.  The count of documents for each bucket as well as the bucket average are used when determining percentiles, with the bucket average being returned as the percentile value.  Thus the accuracy of the value is determined by bucket size.  Smaller buckets will yield more accurate values but will be more computationally intensive.  

The choice to use buckets and have "fuzzy" values was made because 1) We were using query facets to do this already and desired a solution that involved less querying and 2) Our use case involves document counts on the order of tens of millions and distributed coalescing distinct values during distributed search seemed problematic from a performance standpoint.

Usage:
  Querying:
  Faceting must be enabled (facet=true).  Then you may use the following parameters to define your percentiles request:
  percentiles=true : enables facet statistics
  percentiles.field=fieldname : field to calculate facets for; can be specified more than once
  percentiles.requested.percentiles=25,50,75 : requested percentiles i.e. 25th,50th,75th
  percentiles.lower.fence=0 : lower bound for percentiles calculation i.e. lower edge of first bucket
  percentiles.upper.fence=5000 : upper bound for percentiles calculation i.e. upper edge of last bucket
  percentiles.gap=10 : bucket size i.e. bucket1 0-10, bucket2 10-20, etc (double counting on edges avoided similar to range facets)
  percentiles.averages=true : set this if you would like average and doc count reported for each field (average is weighted average of bucket midpoints)
  facet.pivot=field1,field2 : if you ask for pivots, percentiles will show up on a per-pivot basis!

Here is an example URL using the example documents included with solr:
http://localhost:8983/solr/select?q=*%3A*&start=0&rows=3&wt=xml&facet=true&percentiles=true&percentiles.field=popularity&percentiles.requested.percentiles=25,50,75&percentiles.averages=true&facet.field=price&facet.field=popularity&facet.pivot=manufacturedate_dt&f.popularity.percentiles.lower.fence=0&f.popularity.percentiles.upper.fence=11&f.popularity.percentiles.gap=1&facet.sort=index&percentiles.field=price&percentiles.lower.fence=0&percentiles.upper.fence=5000&percentiles.gap=10  

Results format:
  If percentiles are requested the "facet_statistics" node will show under "facet_counts". Each field requested will have its own subsection.  Each subsection will contain percentiles and optionally average and count.
  If pivot facets are also requested, each level of pivot will have a "statistics" section that will contain per-field info similar to that found in "facet_statistics" above.

Notes:
  All field types that range facets support are supported, however average on a date field will always return as 0. Apologies.
  Works in distributed mode!
  Includes a unit test.
  If you're curious about what settings are used internally for the range faceting, it is:
    	rangeHardEnd = false;
	includeLower = true;
	includeUpper = false;
	includeEdge = false;
	
                
      was (Author: selah):
    This patch builds upon the distributed pivot facets introduced in SOLR-2894 and adds the ability to request rudimentary percentiles when faceting.  The percentiles are calculated by using range facets to create "buckets" which divide up the field in question.  A range facet is done on each bucket to determine the number of documents whose value falls within that bucket.  An average value for each bucket is determined by averaging the upper and lower bound of that bucket.  The count of documents for each bucket as well as the bucket average are used when determining percentiles, with the bucket average being returned as the percentile value.  Thus the accuracy of the value is determined by bucket size.  Smaller buckets will yield more accurate values but will be more computationally intensive.  

The choice to use buckets and have "fuzzy" values was made because 1) We were using query facets to do this already and desired a solution that involved less querying and 2) Our use case involves document counts on the order of tens of millions and distributed coalescing distinct values during distributed search seemed problematic from a performance standpoint.

Usage:
  Querying:
  Faceting must be enabled (facet=true).  Then you may use the following parameters to define your percentiles request:
  percentiles=true : enables facet statistics
  percentiles.field=fieldname : field to calculate facets for; can be specified more than once
  percentiles.requested.percentiles=25,50,75 : requested percentiles i.e. 25th,50th,75th
  percentiles.lower.fence=0 : lower bound for percentiles calculation i.e. lower edge of first bucket
  percentiles.upper.fence=5000 : upper bound for percentiles calculation i.e. upper edge of last bucket
  percentiles.gap=10 : bucket size i.e. bucket1 0-10, bucket2 10-20, etc (double counting on edges avoided similar to range facets)
  percentiles.averages=true : set this if you would like average and doc count reported for each field (average is weighted average of bucket midpoints)
  facet.pivot=field1,field2 : if you ask for pivots, percentiles will show up on a per-pivot basis!

Here is an example URL using the example documents included with solr:
http://localhost:8983/solr/select?q=*%3A*&start=0&rows=3&wt=xml&facet=true&percentiles=true&percentiles.field=popularity&percentiles.requested.percentiles=25,50,75&percentiles.averages=true&facet.field=price&facet.field=popularity&facet.pivot=manufacturedate_dt&f.popularity.percentiles.lower.fence=0&f.popularity.percentiles.upper.fence=11&f.popularity.percentiles.gap=1&facet.sort=index&percentiles.field=price&percentiles.lower.fence=0&percentiles.upper.fence=5000&percentiles.gap=10  

Results format:
  If percentiles are requested the "facet_statistics" node will show under "facet_counts". Each field requested will have its own subsection.  Each subsection will contain percentiles and optionally average and count.
  If pivot facets are also requested, each level of pivot will have a "statistics" section that will contain per-field info similar to that found in "facet_statistics" above.

Notes:
  All field types that range facets support are supported, however average on a date field will always return as 0. Apologies.
  Works in distributed mode!
  Includes a unit test.

                  
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.0
>
>         Attachments: SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Russell updated SOLR-3583:
--------------------------------

    Fix Version/s: 4.0
    
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.0
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Terrance A. Snyder (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482727#comment-13482727 ] 

Terrance A. Snyder commented on SOLR-3583:
------------------------------------------

[~selah] Please do! Your contribution is amazing and pushes SOLR into a brave new world.
                
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Russell updated SOLR-3583:
--------------------------------

    Description: Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.    (was: Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  )
    
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch, SOLR-3583.patch
>
>
> Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507718#comment-13507718 ] 

Chris Russell commented on SOLR-3583:
-------------------------------------

Think I may have introduced a performance issue re:faceting, unit tests taking a lot longer to run.  Am investigating.
                
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch, SOLR-3583.patch
>
>
> Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Russell updated SOLR-3583:
--------------------------------

    Attachment: SOLR-3583.patch

Since it's based on SOLR-2894, this patch was developed against trunk 1297102.
                
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.0
>
>         Attachments: SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Monica Skidmore (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403356#comment-13403356 ] 

Monica Skidmore commented on SOLR-3583:
---------------------------------------

Thanks for sharing this - I have several use cases for it...
                
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.0
>
>         Attachments: SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Russell updated SOLR-3583:
--------------------------------

    Attachment: SOLR-3583.patch

Updated to trunk 1404975
                
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch, SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507673#comment-13507673 ] 

Chris Russell edited comment on SOLR-3583 at 11/30/12 9:46 PM:
---------------------------------------------------------------

Updated to trunk 1404975
You must first apply the 12th Nov 2012 version of SOLR-2894 which I updated to apply to the same version of trunk.
Based on some changes that I had to work around while updating to trunk, I feel that this will not work properly with facet.missing=true.  I am working on correcting this.
                
      was (Author: selah):
    Updated to trunk 1404975
You must first apply the 12th Nov 2012 version of SOLR-2894 which I updated to apply to the same version of trunk.
                  
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch, SOLR-3583.patch
>
>
> Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507673#comment-13507673 ] 

Chris Russell edited comment on SOLR-3583 at 11/30/12 9:44 PM:
---------------------------------------------------------------

Updated to trunk 1404975
You must first apply the 12th Nov 2012 version of SOLR-2894 which I updated to apply to the same version of trunk.
                
      was (Author: selah):
    Updated to trunk 1404975
                  
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.1
>
>         Attachments: SOLR-3583.patch, SOLR-3583.patch
>
>
> Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

Posted by "Chris Russell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403331#comment-13403331 ] 

Chris Russell commented on SOLR-3583:
-------------------------------------

This patch builds upon the distributed pivot facets introduced in SOLR-2894 and adds the ability to request rudimentary percentiles when faceting.  The percentiles are calculated by using range facets to create "buckets" which divide up the field in question.  A range facet is done on each bucket to determine the number of documents whose value falls within that bucket.  An average value for each bucket is determined by averaging the upper and lower bound of that bucket.  The count of documents for each bucket as well as the bucket average are used when determining percentiles, with the bucket average being returned as the percentile value.  Thus the accuracy of the value is determined by bucket size.  Smaller buckets will yield more accurate values but will be more computationally intensive.  

The choice to use buckets and have "fuzzy" values was made because 1) We were using query facets to do this already and desired a solution that involved less querying and 2) Our use case involves document counts on the order of tens of millions and distributed coalescing distinct values during distributed search seemed problematic from a performance standpoint.

Usage:
  Querying:
  Faceting must be enabled (facet=true).  Then you may use the following parameters to define your percentiles request:
  percentiles=true : enables facet statistics
  percentiles.field=fieldname : field to calculate facets for; can be specified more than once
  percentiles.requested.percentiles=25,50,75 : requested percentiles i.e. 25th,50th,75th
  percentiles.lower.fence=0 : lower bound for percentiles calculation i.e. lower edge of first bucket
  percentiles.upper.fence=5000 : upper bound for percentiles calculation i.e. upper edge of last bucket
  percentiles.gap=10 : bucket size i.e. bucket1 0-10, bucket2 10-20, etc (double counting on edges avoided similar to range facets)
  percentiles.averages=true : set this if you would like average and doc count reported for each field (average is weighted average of bucket midpoints)
  facet.pivot=field1,field2 : if you ask for pivots, percentiles will show up on a per-pivot basis!

Here is an example URL using the example documents included with solr:
http://localhost:8983/solr/select?q=*%3A*&start=0&rows=3&wt=xml&facet=true&percentiles=true&percentiles.field=popularity&percentiles.requested.percentiles=25,50,75&percentiles.averages=true&facet.field=price&facet.field=popularity&facet.pivot=manufacturedate_dt&f.popularity.percentiles.lower.fence=0&f.popularity.percentiles.upper.fence=11&f.popularity.percentiles.gap=1&facet.sort=index&percentiles.field=price&percentiles.lower.fence=0&percentiles.upper.fence=5000&percentiles.gap=10  

Results format:
  If percentiles are requested the "facet_statistics" node will show under "facet_counts". Each field requested will have its own subsection.  Each subsection will contain percentiles and optionally average and count.
  If pivot facets are also requested, each level of pivot will have a "statistics" section that will contain per-field info similar to that found in "facet_statistics" above.

Notes:
  All field types that range facets support are supported, however average on a date field will always return as 0. Apologies.
  Works in distributed mode!
  Includes a unit test.

                
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.0
>
>         Attachments: SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org