You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (Jira)" <ji...@apache.org> on 2020/01/23 23:26:00 UTC

[jira] [Resolved] (SPARK-30298) bucket join cannot work for self-join with views

     [ https://issues.apache.org/jira/browse/SPARK-30298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Takeshi Yamamuro resolved SPARK-30298.
--------------------------------------
    Fix Version/s: 3.0.0
         Assignee: Terry Kim
       Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/26943

> bucket join cannot work for self-join with views
> ------------------------------------------------
>
>                 Key: SPARK-30298
>                 URL: https://issues.apache.org/jira/browse/SPARK-30298
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Xiaoju Wu
>            Assignee: Terry Kim
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> This UT may fail at the last line:
> {code:java}
> test("bucket join cannot work for self-join with views") {
>     withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "1") {
>       withTable("t1") {
>         val df = (0 until 20).map(i => (i, i)).toDF("i", "j").as("df")
>         df.write
>           .format("parquet")
>           .bucketBy(8, "i")
>           .saveAsTable("t1")
>         sql(s"create view v1 as select * from t1").collect()
>         val plan1 = sql("SELECT * FROM t1 a JOIN t1 b ON a.i = b.i").queryExecution.executedPlan
>         assert(plan1.collect { case exchange : ShuffleExchangeExec => exchange }.isEmpty)
>         val plan2 = sql("SELECT * FROM t1 a JOIN v1 b ON a.i = b.i").queryExecution.executedPlan
>         assert(plan2.collect { case exchange : ShuffleExchangeExec => exchange }.isEmpty)
>       }
>     }
>   }
> {code}
> It's because View will add Project with Alias, then Join's requiredDistribution is based on Alias, but ProjectExec passes child's outputPartition up without Alias. Then the satisfies check cannot meet in EnsureRequirement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org