You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Chris Russell (JIRA)" <ji...@apache.org> on 2012/06/28 20:24:43 UTC

[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

    [ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403331#comment-13403331 ] 

Chris Russell commented on SOLR-3583:
-------------------------------------

This patch builds upon the distributed pivot facets introduced in SOLR-2894 and adds the ability to request rudimentary percentiles when faceting.  The percentiles are calculated by using range facets to create "buckets" which divide up the field in question.  A range facet is done on each bucket to determine the number of documents whose value falls within that bucket.  An average value for each bucket is determined by averaging the upper and lower bound of that bucket.  The count of documents for each bucket as well as the bucket average are used when determining percentiles, with the bucket average being returned as the percentile value.  Thus the accuracy of the value is determined by bucket size.  Smaller buckets will yield more accurate values but will be more computationally intensive.  

The choice to use buckets and have "fuzzy" values was made because 1) We were using query facets to do this already and desired a solution that involved less querying and 2) Our use case involves document counts on the order of tens of millions and distributed coalescing distinct values during distributed search seemed problematic from a performance standpoint.

Usage:
  Querying:
  Faceting must be enabled (facet=true).  Then you may use the following parameters to define your percentiles request:
  percentiles=true : enables facet statistics
  percentiles.field=fieldname : field to calculate facets for; can be specified more than once
  percentiles.requested.percentiles=25,50,75 : requested percentiles i.e. 25th,50th,75th
  percentiles.lower.fence=0 : lower bound for percentiles calculation i.e. lower edge of first bucket
  percentiles.upper.fence=5000 : upper bound for percentiles calculation i.e. upper edge of last bucket
  percentiles.gap=10 : bucket size i.e. bucket1 0-10, bucket2 10-20, etc (double counting on edges avoided similar to range facets)
  percentiles.averages=true : set this if you would like average and doc count reported for each field (average is weighted average of bucket midpoints)
  facet.pivot=field1,field2 : if you ask for pivots, percentiles will show up on a per-pivot basis!

Here is an example URL using the example documents included with solr:
http://localhost:8983/solr/select?q=*%3A*&start=0&rows=3&wt=xml&facet=true&percentiles=true&percentiles.field=popularity&percentiles.requested.percentiles=25,50,75&percentiles.averages=true&facet.field=price&facet.field=popularity&facet.pivot=manufacturedate_dt&f.popularity.percentiles.lower.fence=0&f.popularity.percentiles.upper.fence=11&f.popularity.percentiles.gap=1&facet.sort=index&percentiles.field=price&percentiles.lower.fence=0&percentiles.upper.fence=5000&percentiles.gap=10  

Results format:
  If percentiles are requested the "facet_statistics" node will show under "facet_counts". Each field requested will have its own subsection.  Each subsection will contain percentiles and optionally average and count.
  If pivot facets are also requested, each level of pivot will have a "statistics" section that will contain per-field info similar to that found in "facet_statistics" above.

Notes:
  All field types that range facets support are supported, however average on a date field will always return as 0. Apologies.
  Works in distributed mode!
  Includes a unit test.

                
> Percentiles for facets, pivot facets, and distributed pivot facets
> ------------------------------------------------------------------
>
>                 Key: SOLR-3583
>                 URL: https://issues.apache.org/jira/browse/SOLR-3583
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: newbie, patch
>             Fix For: 4.0
>
>         Attachments: SOLR-3583.patch
>
>
> Built on top of SOLR-2894 (includes Apr 25th version) this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org