You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chao Sun (Jira)" <ji...@apache.org> on 2022/12/09 17:22:00 UTC

[jira] [Updated] (SPARK-41413) SPJ: Spark should avoid shuffle when partition keys mismatch, but join expressions are compatible

     [ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chao Sun updated SPARK-41413:
-----------------------------
    Summary: SPJ: Spark should avoid shuffle when partition keys mismatch, but join expressions are compatible  (was: Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible)

> SPJ: Spark should avoid shuffle when partition keys mismatch, but join expressions are compatible
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-41413
>                 URL: https://issues.apache.org/jira/browse/SPARK-41413
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.3.1
>            Reporter: Chao Sun
>            Priority: Major
>
> Currently when checking whether two sides of a Storage Partitioned Join are compatible, we requires both the partition expressions as well as the partition keys are compatible. However, this condition could be relaxed so that we only require the former. In the case that the latter is not compatible, we can calculate a common superset of keys and push down the information to both sides of the join, and use empty partitions for the missing keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org