You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Chakra Yeleswarapu (JIRA)" <ji...@apache.org> on 2016/11/17 20:07:58 UTC

[jira] [Commented] (SOLR-7452) json facet api returning inconsistent counts in cloud set up

    [ https://issues.apache.org/jira/browse/SOLR-7452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674675#comment-15674675 ] 

Chakra Yeleswarapu commented on SOLR-7452:
------------------------------------------

Hello,

Our use case relies on accurate bucket count and facet stats. 
json.facet works fine on fields with low unique values. However for high unique value fields bucketing and stats are inconsistent.
As in OPs example, if query is filtered further (yielding fewer documents), bucket count and stats returned are accurate. 

Any updates on this much appreciated.

Thanks


> json facet api returning inconsistent counts in cloud set up
> ------------------------------------------------------------
>
>                 Key: SOLR-7452
>                 URL: https://issues.apache.org/jira/browse/SOLR-7452
>             Project: Solr
>          Issue Type: Bug
>          Components: Facet Module
>    Affects Versions: 5.1
>            Reporter: Vamsi Krishna D
>              Labels: count, facet, sort
>         Attachments: SOLR-7452.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> While using the newly added feature of json term facet api (http://yonik.com/json-facet-api/#TermsFacet) I am encountering inconsistent returns of counts of faceted value ( Note I am running on a cloud mode of solr). For example consider that i have txns_id(unique field or key), consumer_number and amount. Now for a 10 million such records , lets say i query for 
> q=*:*&rows=0&
>  json.facet={
>    biskatoo:{
> 	 type : terms,
>        field : consumer_number,
>        limit : 20,
> 	sort : {y:desc},
> 	numBuckets : true,
> 	facet:{
> 	 y : "sum(amount)"
>        }
>    }
>  }
> the results are as follows ( some are omitted ):
> "facets":{
>     "count":6641277,
>     "biskatoo":{
>       "numBuckets":3112708,
>       "buckets":[{
>           "val":"surya",
>           "count":4,
>           "y":2.264506},
>       {
>           "val":"raghu",
>           "COUNT":3,   // capitalised for recognition 
>           "y":1.8},
>         {
>           "val":"malli",
>           "count":4,
>           "y":1.78}]}}}
> but if i restrict the query to 
> q=consumer_number:raghu&rows=0&
>  json.facet={
>    biskatoo:{
> 	 type : terms,
>        field : consumer_number,
>        limit : 20,
> 	sort : {y:desc},
> 	numBuckets : true,
> 	facet:{
> 	 y : "sum(amount)"
>        }
>    }
>  }
> i get :
>   "facets":{
>     "count":4,
>     "biskatoo":{
>       "numBuckets":1,
>       "buckets":[{
>           "val":"raghu",
>           "COUNT":4,
>           "y":2429708.24}]}}}
> One can see the count results are inconsistent ( and I found many occasions of inconsistencies).
> I have tried the patch https://issues.apache.org/jira/browse/SOLR-7412 but still the issue seems not resolved



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org