You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Herman van Hovell (JIRA)" <ji...@apache.org> on 2017/01/05 23:39:58 UTC
[jira] [Commented] (SPARK-19086) Improper scoping of name resolution of columns in HAVING clause

    [ https://issues.apache.org/jira/browse/SPARK-19086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802885#comment-15802885 ] 

Herman van Hovell commented on SPARK-19086:
-------------------------------------------

We alternate column resolution between the inner and the outer plan during subquery resolution. In this case we cannot resolve the query using the inner columns (because of the aggregate), but we can resolve it using the outer column.

So I am not entirely sure this is a problem, since the inner aggregate does not produce a column named `t2c` and the outer plan does.

I look forward to be convinced otherwise :)

> Improper scoping of name resolution of columns in HAVING clause
> ---------------------------------------------------------------
>
>                 Key: SPARK-19086
>                 URL: https://issues.apache.org/jira/browse/SPARK-19086
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Nattavut Sutyanyong
>            Priority: Minor
>
> There seems to be a problem on the scoping of name resolution of columns in a HAVING clause.
> Here is a scenario of the problem:
> {code}
> // A simplified version of TC 01.13 from PR-16337
> Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
> Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
> // This is okay. 
> // Error: t2c is unresolved
> sql("select t2a from t2 group by t2a having t2c = 8").show
> // This is okay as t2c is resolved to the t2 on the parent side
> // because t2 in the subquery does not output column t2c.
> sql("select * from t2 where t2a in (select t2a from (select t2a from t2) t2 group by t2a having t2c = 8)").explain(true)
> // This is the problem.
> sql("select * from t2 where t2a in (select t2a from t2 group by t2a having t2c = 8)").explain(true)
> == Analyzed Logical Plan ==
> t2a: int, t2b: int, t2c: int
> Project [t2a#22, t2b#23, t2c#24]
> +- Filter predicate-subquery#38 [(t2a#22 = t2a#22#49) && (t2c#24 = 8)]
>    :  +- Project [t2a#22 AS t2a#22#49]
>    :     +- Aggregate [t2a#22], [t2a#22]
>    :        +- SubqueryAlias t2, `t2`
>    :           +- Project [_1#18 AS t2a#22, _2#19 AS t2b#23, _3#20 AS t2c#24]
>    :              +- LocalRelation [_1#18, _2#19, _3#20]
>    +- SubqueryAlias t2, `t2`
>       +- Project [_1#18 AS t2a#22, _2#19 AS t2b#23, _3#20 AS t2c#24]
>          +- LocalRelation [_1#18, _2#19, _3#20]
> {code}
> We should not resolve {{t2c}} in the subquery to the outer {{t2}} on the parent side. It should try to resolve {{t2c}} to the {{t2}} in the subquery from its current scope and raise an exception because it is invalid to pull up the column {{t2c}} from the {{Aggregate}} operator below.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org