You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Allison Wang (Jira)" <ji...@apache.org> on 2022/02/09 05:37:00 UTC

[jira] [Updated] (SPARK-38155) Disallow distinct aggregate in lateral subqueries with unsupported correlated predicates

     [ https://issues.apache.org/jira/browse/SPARK-38155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allison Wang updated SPARK-38155:
---------------------------------
    Description: 
Block lateral subqueries in CheckAnalysis that contains DISTINCT aggregate and correlated non-equality predicates. This can lead to incorrect results as DISTINCT will be rewritten as Aggregate during the optimization phase.

For example

 
{code:java}
CREATE VIEW t1(c1, c2) AS VALUES (0, 1)
CREATE VIEW t2(c1, c2) AS VALUES (1, 2), (2, 2)
SELECT * FROM t1 JOIN LATERAL (SELECT DISTINCT c2 FROM t2 WHERE c1 > t1.c1)
{code}
 

The correct results should be (0, 1, 2) but currently, it gives  [(0, 1, 2), (0, 1, 2)].

  was:
Block lateral subqueries in CheckAnalysis that contains DISTINCT aggregate and correlated non-equality predicates. This can lead to incorrect results as DISTINCT will be rewritten as Aggregate during the optimization phase.

For example

CREATE VIEW t1(c1, c2) AS VALUES (0, 1)

CREATE VIEW t2(c1, c2) AS VALUES (1, 2), (2, 2)

SELECT * FROM t1 JOIN LATERAL (SELECT DISTINCT c2 FROM t2 WHERE c1 > t1.c1)

The correct results should be (0, 1, 2) but currently, it gives  [(0, 1, 2), (0, 1, 2)]


> Disallow distinct aggregate in lateral subqueries with unsupported correlated predicates
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-38155
>                 URL: https://issues.apache.org/jira/browse/SPARK-38155
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Allison Wang
>            Priority: Major
>
> Block lateral subqueries in CheckAnalysis that contains DISTINCT aggregate and correlated non-equality predicates. This can lead to incorrect results as DISTINCT will be rewritten as Aggregate during the optimization phase.
> For example
>  
> {code:java}
> CREATE VIEW t1(c1, c2) AS VALUES (0, 1)
> CREATE VIEW t2(c1, c2) AS VALUES (1, 2), (2, 2)
> SELECT * FROM t1 JOIN LATERAL (SELECT DISTINCT c2 FROM t2 WHERE c1 > t1.c1)
> {code}
>  
> The correct results should be (0, 1, 2) but currently, it gives  [(0, 1, 2), (0, 1, 2)].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org