You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Guojian Li (Jira)" <ji...@apache.org> on 2020/08/18 10:01:00 UTC

[jira] [Commented] (SPARK-18622) Missing Reference in Multi Union Clauses Cause by TypeCoercion

    [ https://issues.apache.org/jira/browse/SPARK-18622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179511#comment-17179511 ] 

Guojian Li commented on SPARK-18622:
------------------------------------

I have the same issue in spark 2.3.4

https://issues.apache.org/jira/browse/SPARK-32638

the issue seem not be resolved . 

> Missing Reference in Multi Union Clauses Cause by TypeCoercion
> --------------------------------------------------------------
>
>                 Key: SPARK-18622
>                 URL: https://issues.apache.org/jira/browse/SPARK-18622
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.3, 2.0.2
>            Reporter: Yerui Sun
>            Assignee: Herman van Hövell
>            Priority: Major
>             Fix For: 2.1.0
>
>
> {code}
> spark-sql> explain extended
>          > select a
>          > from
>          > (
>          >   select 0 a, 0 b
>          > union all
>          >   select sum(1) a, cast(0 as bigint) b
>          > union all
>          >   select 0 a, 0 b
>          > )t;
>  
> == Parsed Logical Plan ==
> 'Project ['a]
> +- 'SubqueryAlias t
>    +- 'Union
>       :- 'Union
>       :  :- Project [0 AS a#0, 0 AS b#1]
>       :  :  +- OneRowRelation$
>       :  +- 'Project ['sum(1) AS a#2, cast(0 as bigint) AS b#3L]
>       :     +- OneRowRelation$
>       +- Project [0 AS a#4, 0 AS b#5]
>          +- OneRowRelation$
>  
> == Analyzed Logical Plan ==
> a: int
> Project [a#0]
> +- SubqueryAlias t
>    +- Union
>       :- !Project [a#0, b#9L]
>       :  +- Union
>       :     :- Project [cast(a#0 as bigint) AS a#11L, b#9L]
>       :     :  +- Project [a#0, cast(b#1 as bigint) AS b#9L]
>       :     :     +- Project [0 AS a#0, 0 AS b#1]
>       :     :        +- OneRowRelation$
>       :     +- Project [a#2L, b#3L]
>       :        +- Project [a#2L, b#3L]
>       :           +- Aggregate [sum(cast(1 as bigint)) AS a#2L, cast(0 as bigint) AS b#3L]
>       :              +- OneRowRelation$
>       +- Project [a#4, cast(b#5 as bigint) AS b#10L]
>          +- Project [0 AS a#4, 0 AS b#5]
>             +- OneRowRelation$
>  
> == Optimized Logical Plan ==
> org.apache.spark.sql.AnalysisException: resolved attribute(s) a#0 missing from a#11L,b#9L in operator !Project [a#0, b#9L];;
> Project [a#0]
> +- SubqueryAlias t
>    +- Union
>       :- !Project [a#0, b#9L]
>       :  +- Union
>       :     :- Project [cast(a#0 as bigint) AS a#11L, b#9L]
>       :     :  +- Project [a#0, cast(b#1 as bigint) AS b#9L]
>       :     :     +- Project [0 AS a#0, 0 AS b#1]
>       :     :        +- OneRowRelation$
>       :     +- Project [a#2L, b#3L]
>       :        +- Project [a#2L, b#3L]
>       :           +- Aggregate [sum(cast(1 as bigint)) AS a#2L, cast(0 as bigint) AS b#3L]
>       :              +- OneRowRelation$
>       +- Project [a#4, cast(b#5 as bigint) AS b#10L]
>          +- Project [0 AS a#4, 0 AS b#5]
>             +- OneRowRelation$
>  
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: resolved attribute(s) a#0 missing from a#11L,b#9L in operator !Project [a#0, b#9L];;
> Project [a#0]
> +- SubqueryAlias t
>    +- Union
>       :- !Project [a#0, b#9L]
>       :  +- Union
>       :     :- Project [cast(a#0 as bigint) AS a#11L, b#9L]
>       :     :  +- Project [a#0, cast(b#1 as bigint) AS b#9L]
>       :     :     +- Project [0 AS a#0, 0 AS b#1]
>       :     :        +- OneRowRelation$
>       :     +- Project [a#2L, b#3L]
>       :        +- Project [a#2L, b#3L]
>       :           +- Aggregate [sum(cast(1 as bigint)) AS a#2L, cast(0 as bigint) AS b#3L]
>       :              +- OneRowRelation$
>       +- Project [a#4, cast(b#5 as bigint) AS b#10L]
>          +- Project [0 AS a#4, 0 AS b#5]
>             +- OneRowRelation$
> {code}
> Key Points to re-produce issue:
> * 3 or more union clauses;
> * One column is sum aggregate in one union clause, and is Integer type in other union clause;
> * Another column has different date types in union clauses;
> The reason of issue:
> - Step 1: Apply TypeCoercion.WidenSetOperationTypes, add project with cast since the union clauses has different datatypes for one column; With 3 union clauses, the inner union clause also be projected with cast;
> - Step 2: Apply TypeCoercion.FunctionArgumentConversion, the return type of sum(int) will be extended to BigInt, meaning one column in union clauses changed datatype;
> - Step 3: Apply TypeCoercion.WidenSetOperationTypes again, another cast project added in inner union clause, since sum(int) datatype changed; at this point, the reference of project ON inner union will be missed, since the project IN inner union is newly added, see the  Analyzed Logical Plan;
> Solutions to fix:
> * Since set operation type coercion should be applied after inner clause be stabled, apply WidenSetOperationTypes at last will fix the issue;
> * To avoiding multi level projects on set operation clause, handle the existing cast project carefully in WidenSetOperationTypes should be also work;
> Appreciate for any comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org