You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "XiDuo You (Jira)" <ji...@apache.org> on 2022/04/01 06:33:00 UTC

[jira] [Updated] (SPARK-37528) Schedule Tasks By Input Size

     [ https://issues.apache.org/jira/browse/SPARK-37528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

XiDuo You updated SPARK-37528:
------------------------------
    Summary: Schedule Tasks By Input Size  (was: Support reorder tasks during scheduling by shuffle partition size in AQE)

> Schedule Tasks By Input Size
> ----------------------------
>
>                 Key: SPARK-37528
>                 URL: https://issues.apache.org/jira/browse/SPARK-37528
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, SQL
>    Affects Versions: 3.3.0
>            Reporter: XiDuo You
>            Priority: Major
>
> In general, the larger input data size means longer running time. So ideally, we can let DAGScheduler submit bigger input size task first. It can reduce the whole stage running time. For example, we have one stage with 4 tasks and the defaultParallelism is 2 and the 4 tasks have different running time [1s, 3s, 2s, 4s].
> - in normal, the running time of the stage is: 7s
> - if big task first, the running time of the stage is: 5s



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org