You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nattavut Sutyanyong (JIRA)" <ji...@apache.org> on 2016/12/21 16:54:58 UTC
[jira] [Comment Edited] (SPARK-18863) Output non-aggregate
expressions without GROUP BY in a subquery does not yield an error
[ https://issues.apache.org/jira/browse/SPARK-18863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767537#comment-15767537 ]
Nattavut Sutyanyong edited comment on SPARK-18863 at 12/21/16 4:54 PM:
-----------------------------------------------------------------------
Another case to investigate [subquery/in-subquery/in-group-by.sql TC 01.12]
{code}
Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
Seq((1,1,1)).toDF("t3a", "t3b", "t3c").createOrReplaceTempView("t3")
— TC 01.12
select * from t1 where t1a in
(select min(t2a) from t2 where t2a = t2a and t2c >= 1 group by t2c having t2c in
(select t3c from t3 group by t3c, t3b having t2b > 6 and t3b > t2b ))
== Parsed Logical Plan ==
'Project [*]
+- 'Filter 't1a IN (list#803)
: +- 'Filter 't2c IN (list#802)
: : +- 'Filter (('t2b > 6) && ('t3b > 't2b))
: : +- 'Aggregate ['t3c, 't3b], ['t3c]
: : +- 'UnresolvedRelation `t3`
: +- 'Aggregate ['t2c], [unresolvedalias('min('t2a), None)]
: +- 'Filter (('t2a = 't2a) && ('t2c >= 1))
: +- 'UnresolvedRelation `t2`
+- 'UnresolvedRelation `t1`
== Analyzed Logical Plan ==
t1a: int, t1b: int, t1c: int
Project [t1a#764, t1b#765, t1c#766]
+- Filter predicate-subquery#803 [(t1a#764 = min(t2a)#816)]
: +- Project [min(t2a)#816]
: +- !Filter predicate-subquery#802 [(t2c#781 = t3c#796) && (t2b#780 > 6) && (t3b#795 > t2b#780)]
: : +- Project [t3c#796, t3b#795]
: : +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
: : +- SubqueryAlias t3, `t3`
: : +- Project [_1#790 AS t3a#794, _2#791 AS t3b#795, _3#792 AS t3c#796]
: : +- LocalRelation [_1#790, _2#791, _3#792]
: +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816, t2c#781]
: +- Filter ((t2a#779 = t2a#779) && (t2c#781 >= 1))
: +- SubqueryAlias t2, `t2`
: +- Project [_1#775 AS t2a#779, _2#776 AS t2b#780, _3#777 AS t2c#781]
: +- LocalRelation [_1#775, _2#776, _3#777]
+- SubqueryAlias t1, `t1`
+- Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
+- LocalRelation [_1#760, _2#761, _3#762]
== Optimized Logical Plan ==
Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
+- Join LeftSemi, (_1#760 = min(t2a)#816)
:- LocalRelation [_1#760, _2#761, _3#762]
+- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816]
+- Project [_1#775 AS t2a#779, _3#777 AS t2c#781]
+- Join LeftSemi, (((_3#777 = t3c#796) && (_2#776 > 6)) && (t3b#795 > _2#776))
:- Filter (_3#777 >= 1)
: +- LocalRelation [_1#775, _2#776, _3#777]
+- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
+- LocalRelation [t3b#795, t3c#796]
{code}
I don't see the column t2b#780 being part of the output of the (lower) Aggregate operator. Somehow the LeftSemi join for t2b > 6 is just moved down below the Aggregate over t2. This does not look right to me.
was (Author: nsyca):
Another case to invetigate [subquery/in-subquery/in-group-by.sql TC 01.12]
{code}
Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
Seq((1,1,1)).toDF("t3a", "t3b", "t3c").createOrReplaceTempView("t3")
— TC 01.12
select * from t1 where t1a in
(select min(t2a) from t2 where t2a = t2a and t2c >= 1 group by t2c having t2c in
(select t3c from t3 group by t3c, t3b having t2b > 6 and t3b > t2b ))
== Parsed Logical Plan ==
'Project [*]
+- 'Filter 't1a IN (list#803)
: +- 'Filter 't2c IN (list#802)
: : +- 'Filter (('t2b > 6) && ('t3b > 't2b))
: : +- 'Aggregate ['t3c, 't3b], ['t3c]
: : +- 'UnresolvedRelation `t3`
: +- 'Aggregate ['t2c], [unresolvedalias('min('t2a), None)]
: +- 'Filter (('t2a = 't2a) && ('t2c >= 1))
: +- 'UnresolvedRelation `t2`
+- 'UnresolvedRelation `t1`
== Analyzed Logical Plan ==
t1a: int, t1b: int, t1c: int
Project [t1a#764, t1b#765, t1c#766]
+- Filter predicate-subquery#803 [(t1a#764 = min(t2a)#816)]
: +- Project [min(t2a)#816]
: +- !Filter predicate-subquery#802 [(t2c#781 = t3c#796) && (t2b#780 > 6) && (t3b#795 > t2b#780)]
: : +- Project [t3c#796, t3b#795]
: : +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
: : +- SubqueryAlias t3, `t3`
: : +- Project [_1#790 AS t3a#794, _2#791 AS t3b#795, _3#792 AS t3c#796]
: : +- LocalRelation [_1#790, _2#791, _3#792]
: +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816, t2c#781]
: +- Filter ((t2a#779 = t2a#779) && (t2c#781 >= 1))
: +- SubqueryAlias t2, `t2`
: +- Project [_1#775 AS t2a#779, _2#776 AS t2b#780, _3#777 AS t2c#781]
: +- LocalRelation [_1#775, _2#776, _3#777]
+- SubqueryAlias t1, `t1`
+- Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
+- LocalRelation [_1#760, _2#761, _3#762]
== Optimized Logical Plan ==
Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
+- Join LeftSemi, (_1#760 = min(t2a)#816)
:- LocalRelation [_1#760, _2#761, _3#762]
+- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816]
+- Project [_1#775 AS t2a#779, _3#777 AS t2c#781]
+- Join LeftSemi, (((_3#777 = t3c#796) && (_2#776 > 6)) && (t3b#795 > _2#776))
:- Filter (_3#777 >= 1)
: +- LocalRelation [_1#775, _2#776, _3#777]
+- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
+- LocalRelation [t3b#795, t3c#796]
{code}
I don't see the column t2b#780 being part of the output of the (lower) Aggregate operator. Somehow the LeftSemi join for t2b > 6 is just moved down below the Aggregate over t2. This does not look right to me.
> Output non-aggregate expressions without GROUP BY in a subquery does not yield an error
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-18863
> URL: https://issues.apache.org/jira/browse/SPARK-18863
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Nattavut Sutyanyong
>
> [~smilegator] has found that the following query does not raise a syntax error (note the GROUP BY clause is commented out):
> {code:sql}
> SELECT pk, cv
> FROM p, c
> WHERE p.pk = c.ck
> AND c.cv = (SELECT max(avg)
> FROM (SELECT c1.cv, avg(c1.cv) avg
> FROM c c1
> WHERE c1.ck = p.pk
> -- GROUP BY c1.cv
> ))
> {code}
> There could be multiple values of {{c1.cv}} for each value of {{avg(c1.cv)}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org