You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/04/29 04:47:00 UTC

[jira] [Assigned] (SPARK-35264) Support AQE side broadcastJoin threshold

     [ https://issues.apache.org/jira/browse/SPARK-35264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-35264:
------------------------------------

    Assignee: Apache Spark

> Support AQE side broadcastJoin threshold
> ----------------------------------------
>
>                 Key: SPARK-35264
>                 URL: https://issues.apache.org/jira/browse/SPARK-35264
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: ulysses you
>            Assignee: Apache Spark
>            Priority: Major
>
> The main idea here is that make join config isolation between normal planner and aqe planner which shared the same code path.
> Actually we don not very trust using the static stat to consider if it can build broadcast hash join. In our experience it's very common that Spark throw broadcast timeout or driver side OOM exception when execute a bit large plan. And due to braodcast join is not reversed which means if we covert join to braodcast hash join at first time, we(AQE) can not optimize it again, so it should make sense to decide if we can do broadcast at aqe side using different sql config.
> In order to achieve this we use a specific join hint in advance during AQE framework and then at JoinSelection side it will take and follow the inserted hint.
> For now we only support select strategy for equi join, and follow this order
>  1. mark join as broadcast hash join if possible
>  2. mark join as shuffled hash join if possible
> Note that, we don't override join strategy if user specifies a join hint.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org