You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Gian Merlino (JIRA)" <ji...@apache.org> on 2017/01/18 18:52:26 UTC

[jira] [Commented] (CALCITE-1587) Druid adapter: topN returns approximate results

    [ https://issues.apache.org/jira/browse/CALCITE-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828547#comment-15828547 ] 

Gian Merlino commented on CALCITE-1587:
---------------------------------------

In Druid's built-in SQL we make this an option, druid.sql.planner.useApproximateTopN. fwiw we also have a similar option for whether COUNT(DISTINCT col)) should be approximate or not.

Also, topNs are exact if you are sorting on the dimension, and will be faster than groupBy in that case since groupBy doesn't yet push down limits all the way to the data nodes (although we are working on this). So it's still useful, and exact, to use them for queries like "SELECT DISTINCT foo FROM bar ORDER BY foo LIMIT 50". In Druid we do this even if druid.sql.planner.useApproximateTopN is false.

The topN approximation is described in detail at http://druid.io/docs/latest/querying/topnquery.html#aliasing

> Druid adapter: topN returns approximate results
> -----------------------------------------------
>
>                 Key: CALCITE-1587
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1587
>             Project: Calcite
>          Issue Type: Bug
>          Components: druid
>    Affects Versions: 1.11.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Julian Hyde
>             Fix For: 1.12.0
>
>
> Currently, we convert to _topN_ queries. However, metrics returned by Druid will be approximate values. Thus, probably we should not convert to Druid topN queries and rather always use Druid groupBy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)