You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2021/04/08 13:14:00 UTC

[jira] [Resolved] (SPARK-34946) Block unsupported correlated scalar subquery in Aggregate

     [ https://issues.apache.org/jira/browse/SPARK-34946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-34946.
---------------------------------
    Fix Version/s: 3.2.0
         Assignee: Allison Wang
       Resolution: Fixed

> Block unsupported correlated scalar subquery in Aggregate
> ---------------------------------------------------------
>
>                 Key: SPARK-34946
>                 URL: https://issues.apache.org/jira/browse/SPARK-34946
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Allison Wang
>            Assignee: Allison Wang
>            Priority: Major
>             Fix For: 3.2.0
>
>
> Currently, Spark supports Aggregate to host correlated scalar subqueries, but in some cases, those subqueries cannot be rewritten properly in the `RewriteCorrelatedScalarSubquery` rule. The error messages are also confusing. Hence we should block these cases in CheckAnalysis.
>   
> Case 1: correlated scalar subquery in the grouping expressions but not in aggregate expressions 
> {code:java}
> SELECT SUM(c2) FROM t t1 GROUP BY (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1)
> {code}
> We get this error:
> {code:java}
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; something went wrong in analysis  
> {code}
> because the correlated scalar subquery is not rewritten properly: 
> {code:java}
> == Optimized Logical Plan ==
> Aggregate [scalar-subquery#5 [(c1#6 = c1#6#93)]], [sum(c2#7) AS sum(c2)#11L]
> :  +- Aggregate [c1#6], [sum(c2#7) AS sum(c2)#15L, c1#6 AS c1#6#93]
> :     +- LocalRelation [c1#6, c2#7]
> +- LocalRelation [c1#6, c2#7]
> {code}
>  
> Case 2: correlated scalar subquery in the aggregate expressions but not in the grouping expressions 
> {code:java}
> SELECT (SELECT SUM(c2) FROM t t2 WHERE t1.c1 = t2.c1), SUM(c2) FROM t t1 GROUP BY c1
> {code}
> We get this error:
> {code:java}
> java.lang.IllegalStateException: Couldn't find sum(c2)#69L in [c1#60,sum(c2#61)#64L]
> {code}
> because the transformed correlated scalar subquery output is not present in the grouping expression of the Aggregate:
> {code:java}
> == Optimized Logical Plan ==
> Aggregate [c1#60], [sum(c2)#69L AS scalarsubquery(c1)#70L, sum(c2#61) AS sum(c2)#65L]
> +- Project [c1#60, c2#61, sum(c2)#69L]
>    +- Join LeftOuter, (c1#60 = c1#60#95)
>       :- LocalRelation [c1#60, c2#61]
>       +- Aggregate [c1#60], [sum(c2#61) AS sum(c2)#69L, c1#60 AS c1#60#95]
>          +- LocalRelation [c1#60, c2#61]
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org