You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jingsong Lee (Jira)" <ji...@apache.org> on 2019/09/27 04:24:00 UTC
[jira] [Closed] (FLINK-12263) Remove SINGLE_VALUE aggregate
function from physical plan
[ https://issues.apache.org/jira/browse/FLINK-12263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jingsong Lee closed FLINK-12263.
--------------------------------
Resolution: Invalid
> Remove SINGLE_VALUE aggregate function from physical plan
> ---------------------------------------------------------
>
> Key: FLINK-12263
> URL: https://issues.apache.org/jira/browse/FLINK-12263
> Project: Flink
> Issue Type: New Feature
> Components: Table SQL / Planner
> Reporter: Jark Wu
> Priority: Major
>
> SINGLE_VALUE is an aggregate function which only accepts one row, and throws exception when received more than one row.
>
> For example:
> {code:sql}
> SELECT a2, SUM(a1) FROM A GROUP BY a2 HAVING SUM(a1) > (SELECT SUM(a1) * 0.1 FROM A)
> {code}
> will get a physical plan contains SINGLE_VALUE:
> {code:sql}
> +- NestedLoopJoin(joinType=[InnerJoin], where=[>(EXPR$1, $f0)], select=[a2, EXPR$1, $f0], build=[right], singleRowJoin=[true])
> :- HashAggregate(isMerge=[true], groupBy=[a2], select=[a2, Final_SUM(sum$0) AS EXPR$1])
> : +- Exchange(distribution=[hash[a2]])
> : +- LocalHashAggregate(groupBy=[a2], select=[a2, Partial_SUM(a1) AS sum$0])
> : +- TableSourceScan(table=[[A, source: [TestTableSource(a1, a2)]]], fields=[a1, a2])
> +- Exchange(distribution=[broadcast])
> +- HashAggregate(isMerge=[true], select=[Final_SINGLE_VALUE(value$0, count$1) AS $f0])
> +- Exchange(distribution=[single])
> +- LocalHashAggregate(select=[Partial_SINGLE_VALUE(EXPR$0) AS (value$0, count$1)])
> +- Calc(select=[*($f0, 0.1) AS EXPR$0])
> +- HashAggregate(isMerge=[true], select=[Final_SUM(sum$0) AS $f0])
> +- Exchange(distribution=[single])
> +- LocalHashAggregate(select=[Partial_SUM(a1) AS sum$0])
> +- Calc(select=[a1])
> +- TableSourceScan(table=[[A, source: [TestTableSource(a1, a2)]]], fields=[a1, a2])
> {code}
> But SINGLE_VALUE is a bit wired in physical plan because the logical plan can make sure there is only one input row. Moreover it it also introduces additional overhead.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)