You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Weichen Xu (Jira)" <ji...@apache.org> on 2019/08/20 12:51:00 UTC

[jira] [Updated] (SPARK-28621) may throw some mismatching error

     [ https://issues.apache.org/jira/browse/SPARK-28621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weichen Xu updated SPARK-28621:
-------------------------------
    Summary: may throw some mismatching error   (was: CheckCartesianProducts do not work correctly)

> may throw some mismatching error 
> ---------------------------------
>
>                 Key: SPARK-28621
>                 URL: https://issues.apache.org/jira/browse/SPARK-28621
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: Weichen Xu
>            Priority: Major
>
> CheckCartesianProducts check logical plan which mismatch the physical plan. So when option "spark.sql.crossJoin.enabled" turn on, it may throw some mismatching error which make user confusing.
>  
> There're some cases:
> 1) The sql optimizer will possibly do join reordering. So that if one join marked by "CROSS JOIN" syntax, it may be reordered in the physical plan. So here CheckCartesianProducts only check logical plan which will possibly mismatch the physical plan.
> 2) Other places which may be inconsistent with physical plan:
> providing:
> {code:java}
> spark.range(2).createOrReplaceTempView("sm1") // can be broadcast
> spark.range(50000000).createOrReplaceTempView("bg1") // cannot be broadcast
> spark.range(60000000).createOrReplaceTempView("bg2") // cannot be broadcast
> {code}
> and suppose `spark.sql.crossJoin.enabled=false` (by default)
> 1) Some join could be convert to broadcast nested loop join, but CheckCartesianProducts raise error. e.g.
> {code:java}
> select sm1.id, bg1.id from bg1 join sm1 where sm1.id < bg1.id
> {code}
> 2) Some join will run by CartesianJoin but CheckCartesianProducts DO NOT raise error. e.g.
> {code:java}
> select bg1.id, bg2.id from bg1 join bg2 where bg1.id < bg2.id
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org