You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2017/03/04 01:10:45 UTC

[jira] [Commented] (CALCITE-1670) Count distinct on druid is translated to Cardinality aggregator which is approximate

    [ https://issues.apache.org/jira/browse/CALCITE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895355#comment-15895355 ] 

Julian Hyde commented on CALCITE-1670:
--------------------------------------

In CALCITE-1587 we added a property, approximateDistinctCount. The idea was to push distinct-count down to Druid's Cardinality if approximateDistinctCount is true.

I would also like to be able to declare that a particular aggregate call is approximate; in CALCITE-1588 [~gian] remarked that Druid SQL has an operator called {{APPROX_COUNT_DISTINCT}}.

I wasn't aware that there was a way to accomplish distinct-count in Druid. We have a rewrite rule in Calcite that can do it. It generates two levels of Aggregate. I believe (please correct me if I'm wrong) that Druid can only do one Aggregate pass. If so, maybe we could enable that rule and we could push one of the levels of Aggregate down to Druid.

> Count distinct on druid is translated to Cardinality aggregator which is approximate
> ------------------------------------------------------------------------------------
>
>                 Key: CALCITE-1670
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1670
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Nishant Bangarwa
>            Assignee: Julian Hyde
>
> Right now count distinct on Druid is translated as a 'cardinality' aggregator which uses hyperloglog and return approximate results. See cardinality aggregator here - http://druid.io/docs/latest/querying/aggregations.html for details. 
> https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidQuery.java#L721
> {code} 
> case COUNT:
>       if (aggCall.isDistinct()) {
>         return new JsonCardinalityAggregation("cardinality", name, list);
>       }
>       return new JsonAggregation("count", name, only);
> {code} 
> The current recommended way in druid to get exact counts is to do a nested groupby query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)