You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2021/02/01 19:31:00 UTC
[jira] [Created] (CALCITE-4483) Add WITHIN DISTINCT clause for
aggregate functions
Julian Hyde created CALCITE-4483:
------------------------------------
Summary: Add WITHIN DISTINCT clause for aggregate functions
Key: CALCITE-4483
URL: https://issues.apache.org/jira/browse/CALCITE-4483
Project: Calcite
Issue Type: Bug
Reporter: Julian Hyde
Add a {{WITHIN DISTINCT}} clause to aggregate functions, allowing duplicate rows to be eliminated before entering the function.
This feature is non-standard, and in fact does not exist in any database I am aware of.
It is related to {{DISTINCT}}, and is in fact a generalization of it. {{DISTINCT}} can always be rewritten in terms of {{WITHIN DISTINCT}}. For example, {{SUM(DISTINCT sal)}} is equivalent to {{SUM(sal) WITHIN DISTINCT (sal)}}.
Consider the query
{noformat}
SELECT SUM(age),
SUM(DISTINCT age),
SUM(age) WITHIN DISTINCT (name)
FROM Friends{noformat}
where {{Friends}} has the rows
{noformat}
name age job
====== === ==========
Julian 16 Programmer
Dick 15
Anne 13 Car wash
George 15 Lifeguard
George 15 Dog walker
Timmy 4
{noformat}
Note that there are two rows for George, because she has two jobs.
The values of the columns are as follows:
* {{SUM(age)}} has the value (16 + 15 + 13 + 15 + 15 + 4) = 78;
* {{SUM(DISTINCT age)}} has the value (16 + 15 + 13 + 4) = 48;
* {{SUM(age) WITHIN DISTINCT (name)}} has the value (16 + 15 + 13 + 15 + 4) = 63.
{{WITHIN DISTINCT}} has treated the two 15 values for George as one value, but has still counted the 15 for Dick separately.
The {{WITHIN DISTINCT}} clause can be useful to prevent double-counting when duplicates have been added via a many-to-one join.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)