You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2017/06/25 22:46:00 UTC

[jira] [Comment Edited] (BEAM-2478) Distinct Aggregates

    [ https://issues.apache.org/jira/browse/BEAM-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062450#comment-16062450 ] 

Julian Hyde edited comment on BEAM-2478 at 6/25/17 10:45 PM:
-------------------------------------------------------------

Your rewrite for hierarchical calculation is slightly wrong.

{code}
select a, count(distinct b) from t group by a

becomes

select a, count(distinct_b) from (
  select a, b as distinct_b
  from t
  group by a, b)
group by a
{code}

This correctly ignores rows where b is null.

Calcite's [AggregateExpandDistinctAggregatesRule|https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java] does this rewrite; it can also do a more complex rewrite using GROUPING SETS if there are multiple distinct-counts in the same query. See also CALCITE-1588 for approximate distinct-count.


was (Author: julianhyde):
Your rewrite for hierarchical calculation is slightly wrong.

{code}
select a, count(distinct b) from t group by a

becomes

select a, count(distinct_b) from (
  select a, b as distinct_b
  from t
  group by a, b)
group by a)
{code}

This correctly ignores rows where b is null.

Calcite's [AggregateExpandDistinctAggregatesRule|https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java] does this rewrite; it can also do a more complex rewrite using GROUPING SETS if there are multiple distinct-counts in the same query. See also CALCITE-1588 for approximate distinct-count.

> Distinct Aggregates
> -------------------
>
>                 Key: BEAM-2478
>                 URL: https://issues.apache.org/jira/browse/BEAM-2478
>             Project: Beam
>          Issue Type: New Feature
>          Components: dsl-sql
>            Reporter: Jingsong Lee
>            Assignee: Tarush Grover
>
> eg: COUNT(DISTINCT empno)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)