You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Allison Wang (Jira)" <ji...@apache.org> on 2021/09/13 20:26:00 UTC

[jira] [Updated] (SPARK-36747) Do not collapse Project with Aggregate when correlated subqueries are present in the project list

     [ https://issues.apache.org/jira/browse/SPARK-36747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allison Wang updated SPARK-36747:
---------------------------------
    Description: 
Currently CollapseProject combines Project with Aggregate when the shared attributes are deterministic. But if there are correlated scalar subqueries in the project list that uses the output of the aggregate, they cannot be combined. Otherwise, the plan after rewrite will not be valid:

{code}
select (select sum(c2) from t where c1 = cast(s as int)) from (select sum(c2) s from t)

== Optimized Logical Plan ==
Aggregate [sum(b)#28L AS scalarsubquery(s)#29L]
+- Project [sum(b)#28L]
   +- Join LeftOuter, (a#20 = cast(sum(b#21) as int))
      :- LocalRelation [b#21]
      +- Aggregate [a#20], [sum(b#21) AS sum(b)#28L, a#20]
         +- LocalRelation [a#20, b#21]

java.lang.UnsupportedOperationException: Cannot generate code for expression: sum(input[0, int, false])
{code}

  was:
Currently CollapseProject combines Project with Aggregate when the shared attributes are deterministic. But if there are correlated scalar subqueries in the project list that uses the output of the aggregate, they cannot be combined. Otherwise, the plan after rewrite will not be valid:
```
select (select sum(c2) from t where c1 = cast(s as int)) from (select sum(c2) s from t)

== Optimized Logical Plan ==
Aggregate [sum(b)#28L AS scalarsubquery(s)#29L]
+- Project [sum(b)#28L]
   +- Join LeftOuter, (a#20 = cast(sum(b#21) as int))
      :- LocalRelation [b#21]
      +- Aggregate [a#20], [sum(b#21) AS sum(b)#28L, a#20]
         +- LocalRelation [a#20, b#21]

java.lang.UnsupportedOperationException: Cannot generate code for expression: sum(input[0, int, false])
```


> Do not collapse Project with Aggregate when correlated subqueries are present in the project list
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-36747
>                 URL: https://issues.apache.org/jira/browse/SPARK-36747
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Allison Wang
>            Priority: Major
>
> Currently CollapseProject combines Project with Aggregate when the shared attributes are deterministic. But if there are correlated scalar subqueries in the project list that uses the output of the aggregate, they cannot be combined. Otherwise, the plan after rewrite will not be valid:
> {code}
> select (select sum(c2) from t where c1 = cast(s as int)) from (select sum(c2) s from t)
> == Optimized Logical Plan ==
> Aggregate [sum(b)#28L AS scalarsubquery(s)#29L]
> +- Project [sum(b)#28L]
>    +- Join LeftOuter, (a#20 = cast(sum(b#21) as int))
>       :- LocalRelation [b#21]
>       +- Aggregate [a#20], [sum(b#21) AS sum(b)#28L, a#20]
>          +- LocalRelation [a#20, b#21]
> java.lang.UnsupportedOperationException: Cannot generate code for expression: sum(input[0, int, false])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org