You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jason Altekruse (Jira)" <ji...@apache.org> on 2020/01/15 22:40:00 UTC
[jira] [Created] (SPARK-30523) Collapse back to back aggregations
into a single aggregate to reduce the number of shuffles
Jason Altekruse created SPARK-30523:
---------------------------------------
Summary: Collapse back to back aggregations into a single aggregate to reduce the number of shuffles
Key: SPARK-30523
URL: https://issues.apache.org/jira/browse/SPARK-30523
Project: Spark
Issue Type: Improvement
Components: Optimizer
Affects Versions: 3.0.0
Reporter: Jason Altekruse
Queries containing nested aggregate operations can in some cases be computable with a single phase of aggregation. This Jira seeks to introduce a new optimizer rule to identify some of those cases and rewrite plans to avoid needlessly re-shuffling and generating the aggregation hash table data twice.
Some examples of collapsible aggregates:
{code:java}
SELECT sum(sumAgg) as a, year from (
select sum(1) as sumAgg, course, year FROM courseSales GROUP BY course, year
) group by year
// can be collapsed to
SELECT sum(1) as `a`, year from courseSales group by year
{code}
{code}
SELECT sum(agg), min(a), b from (
select sum(1) as agg, a, b FROM testData2 GROUP BY a, b
) group by b
// can be collapsed to
SELECT sum(1) as `sum(agg)`, min(a) as `min(a)`, b from testData2 group by b
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org