You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Matei Zaharia (JIRA)" <ji...@apache.org> on 2014/06/24 20:42:26 UTC

[jira] [Resolved] (SPARK-1937) Tasks can be submitted before executors are registered

     [ https://issues.apache.org/jira/browse/SPARK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia resolved SPARK-1937.
----------------------------------

          Resolution: Fixed
       Fix Version/s: 1.1.0
    Target Version/s: 1.1.0

> Tasks can be submitted before executors are registered
> ------------------------------------------------------
>
>                 Key: SPARK-1937
>                 URL: https://issues.apache.org/jira/browse/SPARK-1937
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Rui Li
>            Assignee: Rui Li
>             Fix For: 1.1.0
>
>         Attachments: After-patch.PNG, Before-patch.png, RSBTest.scala
>
>
> During construction, TaskSetManager will assign tasks to several pending lists according to the tasks’ preferred locations. If the desired location is unavailable, it’ll then assign this task to “pendingTasksWithNoPrefs”, a list containing tasks without preferred locations.
> The problem is that tasks may be submitted before the executors get registered with the driver, in which case TaskSetManager will assign all the tasks to pendingTasksWithNoPrefs. Later when it looks for a task to schedule, it will pick one from this list and assign it to arbitrary executor, since TaskSetManager considers the tasks can run equally well on any node.
> This problem deprives benefits of data locality, drags the whole job slow and can cause imbalance between executors.
> I ran into this issue when running a spark program on a 7-node cluster (node6~node12). The program processes 100GB data.
> Since the data is uploaded to HDFS from node6, this node has a complete copy of the data and as a result, node6 finishes tasks much faster, which in turn makes it complete dis-proportionally more tasks than other nodes.
> To solve this issue, I think we shouldn't check availability of executors/hosts when constructing TaskSetManager. If a task prefers a node, we simply add the task to that node’s pending list. When later on the node is added, TaskSetManager can schedule the task according to proper locality level. If unfortunately the preferred node(s) never gets added, TaskSetManager can still schedule the task at locality level “ANY”.



--
This message was sent by Atlassian JIRA
(v6.2#6252)