You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:04:21 UTC

[jira] [Updated] (SPARK-19100) Schedule tasks in descending order of estimated input size / estimated task duration

     [ https://issues.apache.org/jira/browse/SPARK-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-19100:
---------------------------------
    Labels: bulk-closed  (was: )

> Schedule tasks in descending order of estimated input size / estimated task duration
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-19100
>                 URL: https://issues.apache.org/jira/browse/SPARK-19100
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>            Reporter: Josh Rosen
>            Priority: Major
>              Labels: bulk-closed
>
> Say that you're scheduling a reduce phase and based on the map output sizes you have identified that some reducers will be skewed due to processing much more input. In this case, it's preferable to schedule those larger tasks first: the large reduce tasks will dominate the completion time of the stage so it's important to launch them as soon as possible.
> Spark's current task scheduling technique is naive and simply launches tasks in ascending order of their indices: https://github.com/apache/spark/blob/903bb8e8a2b84b9ea82acbb8ae9d58754862be3a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L158
> If we implement an interface for a Task to expose a relative size estimate (comparable only within the same TaskSet) and used that instead of index in the initial scheduling then I think we could realize decent stage completion time improvements for highly-skewed stages.
> This is also beneficial for initial map stages where you have input size estimates: if you're generating tasks / splits from HDFS blocks or S3 FileStatuses then you have some estimate of the total input size of the task and that may be a decent proxy for the task's duration.
> Basically, my feeling is that scheduling in order of size estimates, even if they're slightly inaccurate, must be a strict improvement over scheduling in ascending order of task id.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org