You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@calcite.apache.org by "Nishant Bangarwa (JIRA)" <ji...@apache.org> on 2017/03/03 14:32:45 UTC

[jira] [Updated] (CALCITE-1670) Count distinct on druid is translated to Cardinality aggregator which is approximate

     [ https://issues.apache.org/jira/browse/CALCITE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nishant Bangarwa updated CALCITE-1670:
--------------------------------------
    Description: 
Right now count distinct on Druid is pushed as a 'cardinality' aggregator which uses hyperloglog and return approximate results. See cardinality aggregator here - http://druid.io/docs/latest/querying/aggregations.html for details. 

https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidQuery.java#L721
{code} 
case COUNT:
      if (aggCall.isDistinct()) {
        return new JsonCardinalityAggregation("cardinality", name, list);
      }
      return new JsonAggregation("count", name, only);
{code} 


The current recommended way in druid to get exact counts is to do a nested groupby query. 

  was:
Right now count distinct on Druid is pushed as a 'cardinality' aggregator which uses hyperloglog and return approximate results. See cardinality aggregator here - http://druid.io/docs/latest/querying/aggregations.html for details. 

https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidQuery.java#L721
{code} 
case COUNT:
      if (aggCall.isDistinct()) {
        return new JsonCardinalityAggregation("cardinality", name, list);
      }
      return new JsonAggregation("count", name, only);
{code} 


> Count distinct on druid is translated to Cardinality aggregator which is approximate
> ------------------------------------------------------------------------------------
>
>                 Key: CALCITE-1670
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1670
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Nishant Bangarwa
>            Assignee: Julian Hyde
>
> Right now count distinct on Druid is pushed as a 'cardinality' aggregator which uses hyperloglog and return approximate results. See cardinality aggregator here - http://druid.io/docs/latest/querying/aggregations.html for details. 
> https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidQuery.java#L721
> {code} 
> case COUNT:
>       if (aggCall.isDistinct()) {
>         return new JsonCardinalityAggregation("cardinality", name, list);
>       }
>       return new JsonAggregation("count", name, only);
> {code} 
> The current recommended way in druid to get exact counts is to do a nested groupby query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)