You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2019/08/27 13:56:00 UTC

[jira] [Assigned] (SPARK-28621) CheckCartesianProducts may throw some error which mismatch generated physical plan

     [ https://issues.apache.org/jira/browse/SPARK-28621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan reassigned SPARK-28621:
-----------------------------------

    Assignee: Weichen Xu

> CheckCartesianProducts may throw some error which mismatch generated physical plan
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-28621
>                 URL: https://issues.apache.org/jira/browse/SPARK-28621
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: Weichen Xu
>            Assignee: Weichen Xu
>            Priority: Major
>
> CheckCartesianProducts check logical plan which mismatch the physical plan. So when option "spark.sql.crossJoin.enabled" turn off (by default), it may throw some mismatching error which make user confusing.
>  
> There're some cases:
> 1) The sql optimizer will possibly do join reordering. So that if one join marked by "CROSS JOIN" syntax, it may be reordered in the physical plan. So here CheckCartesianProducts only check logical plan which will possibly mismatch the physical plan.
> 2) Other places which may be inconsistent with physical plan:
> providing:
> {code:java}
> spark.range(2).createOrReplaceTempView("sm1") // can be broadcast
> spark.range(50000000).createOrReplaceTempView("bg1") // cannot be broadcast
> spark.range(60000000).createOrReplaceTempView("bg2") // cannot be broadcast
> {code}
> and suppose `spark.sql.crossJoin.enabled=false` (by default)
> 1) Some join could be convert to broadcast nested loop join, but CheckCartesianProducts raise error. e.g.
> {code:java}
> select sm1.id, bg1.id from bg1 join sm1 where sm1.id < bg1.id
> {code}
> 2) Some join will run by CartesianJoin but CheckCartesianProducts DO NOT raise error. e.g.
> {code:java}
> select bg1.id, bg2.id from bg1 join bg2 where bg1.id < bg2.id
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org