You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2016/06/21 21:53:57 UTC

[jira] [Commented] (SPARK-16106) TaskSchedulerImpl does not correctly handle new executors on existing hosts

    [ https://issues.apache.org/jira/browse/SPARK-16106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342821#comment-15342821 ] 

Imran Rashid commented on SPARK-16106:
--------------------------------------

cc [~kayousterhout]

After taking a closer look at this, I don't think its really a very big issue.  Since you've already got some executor on the host, your locality level will already include *at least* NODE_LOCAL.  Its possible that when you add another executor, now you can go to PROCESS_LOCAL -- but I can't think of how you'd have a task which wants to be PROCESS_LOCAL in an executor which doesn't exist yet.

There is also the issue w/ failedEpoch, I haven't wrapped my head around what the ramifications of that are yet.

In any case, its confusing at the very least, might as well fix this.  But it might be reasonable to simply change {{executorAdded()}} to {{hostAdded()}} to clear up the naming, and leave the current behavior as well.

> TaskSchedulerImpl does not correctly handle new executors on existing hosts
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-16106
>                 URL: https://issues.apache.org/jira/browse/SPARK-16106
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 2.0.0
>            Reporter: Imran Rashid
>            Priority: Trivial
>
> The TaskSchedulerImpl updates the set of executors and hosts in each call to {{resourceOffers}}.  During this call, it also tracks whether there are any new executors observed in {{newExecAvail}}:
> {code}
>       executorIdToHost(o.executorId) = o.host
>       executorIdToTaskCount.getOrElseUpdate(o.executorId, 0)
>       if (!executorsByHost.contains(o.host)) {
>         executorsByHost(o.host) = new HashSet[String]()
>         executorAdded(o.executorId, o.host)
>         newExecAvail = true
>       }
> {code}
> However, this only detects when a new *host* is added, not when an additional executor is added to an existing host (a relatively common event in dynamic allocation).
> The end result is that task locality and {{failedEpochs}} is not updated correctly for new executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org