You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Xiao Li (Jira)" <ji...@apache.org> on 2020/10/23 00:49:00 UTC

[jira] [Updated] (SPARK-32461) Shuffled hash join improvement

     [ https://issues.apache.org/jira/browse/SPARK-32461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiao Li updated SPARK-32461:
----------------------------
    Labels: release-notes  (was: )

> Shuffled hash join improvement
> ------------------------------
>
>                 Key: SPARK-32461
>                 URL: https://issues.apache.org/jira/browse/SPARK-32461
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Cheng Su
>            Priority: Major
>              Labels: release-notes
>
> Shuffled hash join avoids sort compared to sort merge join. This advantage shows up obviously when joining large table in terms of saving CPU and IO (in case of external sort happens). In latest master trunk, shuffled hash join is disabled by default with config "spark.sql.join.preferSortMergeJoin"=true, with favor of reducing risk of OOM. However shuffled hash join could be improved to a better state (validated in our internal fork). Creating this Jira to track overall progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org