You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2017/06/25 22:46:00 UTC
[jira] [Comment Edited] (BEAM-2478) Distinct Aggregates
[ https://issues.apache.org/jira/browse/BEAM-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062450#comment-16062450 ]
Julian Hyde edited comment on BEAM-2478 at 6/25/17 10:45 PM:
-------------------------------------------------------------
Your rewrite for hierarchical calculation is slightly wrong.
{code}
select a, count(distinct b) from t group by a
becomes
select a, count(distinct_b) from (
select a, b as distinct_b
from t
group by a, b)
group by a
{code}
This correctly ignores rows where b is null.
Calcite's [AggregateExpandDistinctAggregatesRule|https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java] does this rewrite; it can also do a more complex rewrite using GROUPING SETS if there are multiple distinct-counts in the same query. See also CALCITE-1588 for approximate distinct-count.
was (Author: julianhyde):
Your rewrite for hierarchical calculation is slightly wrong.
{code}
select a, count(distinct b) from t group by a
becomes
select a, count(distinct_b) from (
select a, b as distinct_b
from t
group by a, b)
group by a)
{code}
This correctly ignores rows where b is null.
Calcite's [AggregateExpandDistinctAggregatesRule|https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java] does this rewrite; it can also do a more complex rewrite using GROUPING SETS if there are multiple distinct-counts in the same query. See also CALCITE-1588 for approximate distinct-count.
> Distinct Aggregates
> -------------------
>
> Key: BEAM-2478
> URL: https://issues.apache.org/jira/browse/BEAM-2478
> Project: Beam
> Issue Type: New Feature
> Components: dsl-sql
> Reporter: Jingsong Lee
> Assignee: Tarush Grover
>
> eg: COUNT(DISTINCT empno)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)