You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nattavut Sutyanyong (JIRA)" <ji...@apache.org> on 2016/08/06 16:15:20 UTC

[jira] [Comment Edited] (SPARK-16804) Correlated subqueries containing non-deterministic operators return incorrect results

    [ https://issues.apache.org/jira/browse/SPARK-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400150#comment-15400150 ] 

Nattavut Sutyanyong edited comment on SPARK-16804 at 8/6/16 4:14 PM:
---------------------------------------------------------------------

To demonstrate that this fix does not unnecessarily block the "good" cases (where LIMIT is present but NOT on the correlated path), here is an example, which produce the same result set in both with and without this proposed fix.
<code>
scala> sql("select c1 from t1 where exists (select 1 from (select 1 from t2 limit 1) where t1.c1=t2.c2)").show 
+---+                                                                           
| c1|
+---+
|  1|
+---+
</code>


was (Author: nsyca):
To demonstrate that this fix does not unnecessarily block the "good" cases (where LIMIT is present but NOT on the correlated path), here is an example, which produce the same result set in both with and without this proposed fix.

scala> sql("select c1 from t1 where exists (select 1 from (select 1 from t2 limit 1) where t1.c1=t2.c2)").show 
+---+                                                                           
| c1|
+---+
|  1|
+---+


> Correlated subqueries containing non-deterministic operators return incorrect results
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-16804
>                 URL: https://issues.apache.org/jira/browse/SPARK-16804
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Nattavut Sutyanyong
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Correlated subqueries with LIMIT could return incorrect results. The rule ResolveSubquery in the Analysis phase moves correlated predicates to a join predicates and neglect the semantic of the LIMIT.
> Example:
> {noformat}
> Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
> Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 1)").show
> +---+                                                                           
> | c1|
> +---+
> |  1|
> +---+
> {noformat}
> The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org