You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Allison Wang (Jira)" <ji...@apache.org> on 2021/09/03 00:03:00 UTC

[jira] [Updated] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries

     [ https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allison Wang updated SPARK-36656:
---------------------------------
    Description: 
Currently, the optimizer rule `CollapseProject` inlines expressions generated from correlated scalar subqueries, which can create unnecessary left outer joins.

{code:sql}
select c1, s, s * 10 from (
select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
{code}

{code:scala}
// Before
Project [c1, s, (s * 10)]
+- Project [c1, scalar-subquery [c1] AS s]
   :  +- Aggregate [c1], [first(c2), c1] 
   :      +- LocalRelation [c1, c2]
   +- LocalRelation [c1, c2]

// After (scalar subqueries are inlined)
Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
:  +- Aggregate [c1], [first(c2), c1] 
:      +- LocalRelation [c1, c2]
:  +- Aggregate [c1], [first(c2), c1] 
:      +- LocalRelation [c1, c2]
+- LocalRelation [c1, c2]
{code}

Then this query will have two LeftOuter joins created. We should only collapse projects after correlated subqueries are rewritten as joins.

  was:
Currently, the optimizer rule `CollapseProject` inlines expressions generated from correlated scalar subqueries, which can create unnecessary left outer joins.

{code:scala}
// Before
Project [c1, s, (s * 10)]
+- Project [c1, scalar-subquery [c1] AS s]
   :  +- Aggregate [c1], [first(c2), c1] 
   :      +- LocalRelation [c1, c2]
   +- LocalRelation [c1, c2]

// After (scalar subqueries are inlined)
Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
:  +- Aggregate [c1], [first(c2), c1] 
:      +- LocalRelation [c1, c2]
:  +- Aggregate [c1], [first(c2), c1] 
:      +- LocalRelation [c1, c2]
+- LocalRelation [c1, c2]
{code}

Then this query will have two LeftOuter joins created. We should only collapse projects after correlated subqueries are rewritten as joins.


> CollapseProject should not collapse correlated scalar subqueries
> ----------------------------------------------------------------
>
>                 Key: SPARK-36656
>                 URL: https://issues.apache.org/jira/browse/SPARK-36656
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Allison Wang
>            Priority: Major
>
> Currently, the optimizer rule `CollapseProject` inlines expressions generated from correlated scalar subqueries, which can create unnecessary left outer joins.
> {code:sql}
> select c1, s, s * 10 from (
> select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1)
> {code}
> {code:scala}
> // Before
> Project [c1, s, (s * 10)]
> +- Project [c1, scalar-subquery [c1] AS s]
>    :  +- Aggregate [c1], [first(c2), c1] 
>    :      +- LocalRelation [c1, c2]
>    +- LocalRelation [c1, c2]
> // After (scalar subqueries are inlined)
> Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)]
> :  +- Aggregate [c1], [first(c2), c1] 
> :      +- LocalRelation [c1, c2]
> :  +- Aggregate [c1], [first(c2), c1] 
> :      +- LocalRelation [c1, c2]
> +- LocalRelation [c1, c2]
> {code}
> Then this query will have two LeftOuter joins created. We should only collapse projects after correlated subqueries are rewritten as joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org