You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nattavut Sutyanyong (JIRA)" <ji...@apache.org> on 2016/12/21 16:54:58 UTC
[jira] [Comment Edited] (SPARK-18863) Output non-aggregate expressions without GROUP BY in a subquery does not yield an error

    [ https://issues.apache.org/jira/browse/SPARK-18863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767537#comment-15767537 ] 

Nattavut Sutyanyong edited comment on SPARK-18863 at 12/21/16 4:54 PM:
-----------------------------------------------------------------------

Another case to investigate [subquery/in-subquery/in-group-by.sql TC 01.12]

{code}
Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
Seq((1,1,1)).toDF("t3a", "t3b", "t3c").createOrReplaceTempView("t3")

— TC 01.12
select * from t1 where t1a in
(select min(t2a) from t2 where t2a = t2a and t2c >= 1 group by t2c having t2c in
(select t3c from t3 group by t3c, t3b having t2b > 6 and t3b > t2b ))

== Parsed Logical Plan ==
'Project [*]
+- 'Filter 't1a IN (list#803)
   :  +- 'Filter 't2c IN (list#802)
   :     :  +- 'Filter (('t2b > 6) && ('t3b > 't2b))
   :     :     +- 'Aggregate ['t3c, 't3b], ['t3c]
   :     :        +- 'UnresolvedRelation `t3`
   :     +- 'Aggregate ['t2c], [unresolvedalias('min('t2a), None)]
   :        +- 'Filter (('t2a = 't2a) && ('t2c >= 1))
   :           +- 'UnresolvedRelation `t2`
   +- 'UnresolvedRelation `t1`

== Analyzed Logical Plan ==
t1a: int, t1b: int, t1c: int
Project [t1a#764, t1b#765, t1c#766]
+- Filter predicate-subquery#803 [(t1a#764 = min(t2a)#816)]
   :  +- Project [min(t2a)#816]
   :     +- !Filter predicate-subquery#802 [(t2c#781 = t3c#796) && (t2b#780 > 6) && (t3b#795 > t2b#780)]
   :        :  +- Project [t3c#796, t3b#795]
   :        :     +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
   :        :        +- SubqueryAlias t3, `t3`
   :        :           +- Project [_1#790 AS t3a#794, _2#791 AS t3b#795, _3#792 AS t3c#796]
   :        :              +- LocalRelation [_1#790, _2#791, _3#792]
   :        +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816, t2c#781]
   :           +- Filter ((t2a#779 = t2a#779) && (t2c#781 >= 1))
   :              +- SubqueryAlias t2, `t2`
   :                 +- Project [_1#775 AS t2a#779, _2#776 AS t2b#780, _3#777 AS t2c#781]
   :                    +- LocalRelation [_1#775, _2#776, _3#777]
   +- SubqueryAlias t1, `t1`
      +- Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
         +- LocalRelation [_1#760, _2#761, _3#762]

== Optimized Logical Plan ==
Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
+- Join LeftSemi, (_1#760 = min(t2a)#816)
   :- LocalRelation [_1#760, _2#761, _3#762]
   +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816]
      +- Project [_1#775 AS t2a#779, _3#777 AS t2c#781]
         +- Join LeftSemi, (((_3#777 = t3c#796) && (_2#776 > 6)) && (t3b#795 > _2#776))
            :- Filter (_3#777 >= 1)
            :  +- LocalRelation [_1#775, _2#776, _3#777]
            +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
               +- LocalRelation [t3b#795, t3c#796]
{code}

I don't see the column t2b#780 being part of the output of the (lower) Aggregate operator. Somehow the LeftSemi join for t2b > 6 is just moved down below the Aggregate over t2. This does not look right to me. 


was (Author: nsyca):
Another case to invetigate [subquery/in-subquery/in-group-by.sql TC 01.12]

{code}
Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
Seq((1,1,1)).toDF("t3a", "t3b", "t3c").createOrReplaceTempView("t3")

— TC 01.12
select * from t1 where t1a in
(select min(t2a) from t2 where t2a = t2a and t2c >= 1 group by t2c having t2c in
(select t3c from t3 group by t3c, t3b having t2b > 6 and t3b > t2b ))

== Parsed Logical Plan ==
'Project [*]
+- 'Filter 't1a IN (list#803)
   :  +- 'Filter 't2c IN (list#802)
   :     :  +- 'Filter (('t2b > 6) && ('t3b > 't2b))
   :     :     +- 'Aggregate ['t3c, 't3b], ['t3c]
   :     :        +- 'UnresolvedRelation `t3`
   :     +- 'Aggregate ['t2c], [unresolvedalias('min('t2a), None)]
   :        +- 'Filter (('t2a = 't2a) && ('t2c >= 1))
   :           +- 'UnresolvedRelation `t2`
   +- 'UnresolvedRelation `t1`

== Analyzed Logical Plan ==
t1a: int, t1b: int, t1c: int
Project [t1a#764, t1b#765, t1c#766]
+- Filter predicate-subquery#803 [(t1a#764 = min(t2a)#816)]
   :  +- Project [min(t2a)#816]
   :     +- !Filter predicate-subquery#802 [(t2c#781 = t3c#796) && (t2b#780 > 6) && (t3b#795 > t2b#780)]
   :        :  +- Project [t3c#796, t3b#795]
   :        :     +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
   :        :        +- SubqueryAlias t3, `t3`
   :        :           +- Project [_1#790 AS t3a#794, _2#791 AS t3b#795, _3#792 AS t3c#796]
   :        :              +- LocalRelation [_1#790, _2#791, _3#792]
   :        +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816, t2c#781]
   :           +- Filter ((t2a#779 = t2a#779) && (t2c#781 >= 1))
   :              +- SubqueryAlias t2, `t2`
   :                 +- Project [_1#775 AS t2a#779, _2#776 AS t2b#780, _3#777 AS t2c#781]
   :                    +- LocalRelation [_1#775, _2#776, _3#777]
   +- SubqueryAlias t1, `t1`
      +- Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
         +- LocalRelation [_1#760, _2#761, _3#762]

== Optimized Logical Plan ==
Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
+- Join LeftSemi, (_1#760 = min(t2a)#816)
   :- LocalRelation [_1#760, _2#761, _3#762]
   +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816]
      +- Project [_1#775 AS t2a#779, _3#777 AS t2c#781]
         +- Join LeftSemi, (((_3#777 = t3c#796) && (_2#776 > 6)) && (t3b#795 > _2#776))
            :- Filter (_3#777 >= 1)
            :  +- LocalRelation [_1#775, _2#776, _3#777]
            +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
               +- LocalRelation [t3b#795, t3c#796]
{code}

I don't see the column t2b#780 being part of the output of the (lower) Aggregate operator. Somehow the LeftSemi join for t2b > 6 is just moved down below the Aggregate over t2. This does not look right to me. 

> Output non-aggregate expressions without GROUP BY in a subquery does not yield an error 
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-18863
>                 URL: https://issues.apache.org/jira/browse/SPARK-18863
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Nattavut Sutyanyong
>
> [~smilegator] has found that the following query does not raise a syntax error (note the GROUP BY clause is commented out):
> {code:sql}
> SELECT pk, cv
> FROM   p, c
> WHERE  p.pk = c.ck
> AND    c.cv = (SELECT max(avg)
>                FROM   (SELECT   c1.cv, avg(c1.cv) avg
>                        FROM     c c1
>                        WHERE    c1.ck = p.pk
> --                       GROUP BY c1.cv
>                       ))
> {code}
> There could be multiple values of {{c1.cv}} for each value of {{avg(c1.cv)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org