You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:04:41 UTC

[jira] [Updated] (SPARK-20219) Schedule tasks based on size of input from ScheduledRDD

     [ https://issues.apache.org/jira/browse/SPARK-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-20219:
---------------------------------
    Labels: bulk-closed  (was: )

> Schedule tasks based on size of input from ScheduledRDD
> -------------------------------------------------------
>
>                 Key: SPARK-20219
>                 URL: https://issues.apache.org/jira/browse/SPARK-20219
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>            Priority: Major
>              Labels: bulk-closed
>         Attachments: screenshot-1.png
>
>
> When data is highly skewed on ShuffledRDD, it make sense to launch those tasks which process much more input as soon as possible. The current scheduling mechanism in *TaskSetManager* is quite simple:
> {code}
>   for (i <- (0 until numTasks).reverse) {
>     addPendingTask(i)
>   }
> {code}
> In scenario that "large tasks" locate at bottom half of tasks array, if tasks with much more input are launched early, we can significantly reduce the time cost and save resource when *"dynamic allocation"* is disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org