You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2021/10/19 22:01:00 UTC

[jira] [Updated] (SPARK-35414) Completely fix the broadcast timeout issue in AQE

     [ https://issues.apache.org/jira/browse/SPARK-35414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-35414:
----------------------------------
        Parent:     (was: SPARK-33828)
    Issue Type: Bug  (was: Sub-task)

> Completely fix the broadcast timeout issue in AQE
> -------------------------------------------------
>
>                 Key: SPARK-35414
>                 URL: https://issues.apache.org/jira/browse/SPARK-35414
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.0.1
>            Reporter: Yu Zhong
>            Assignee: Yu Zhong
>            Priority: Major
>
> SPARK-33933 report a issue that in AQE, when the resources is limited, broadcast timeout could happened. 
> [#31269|https://github.com/apache/spark/pull/31269] gives a partial fix by reorder newStages by class type to make sure BroadcastQueryState precede others when calling materialized(). However, it only guarantee that the order of task to be scheduled in normal circumstances, but, the guarantee is not strict since the submit of broadcast job and shuffle map job are in different thread.
> So we need a completely fix to avoid the edge case triggering broadcast timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org