You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kay Ousterhout (JIRA)" <ji...@apache.org> on 2015/12/15 03:22:46 UTC

[jira] [Commented] (SPARK-11460) Locality waits should be based on task set creation time, not last launch time

    [ https://issues.apache.org/jira/browse/SPARK-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057203#comment-15057203 ] 

Kay Ousterhout commented on SPARK-11460:
----------------------------------------

For the specific issue mentioned in the description, can you set spark.locality.wait.rack to 0 (is that what you're already doing)?  Does that cause other issues?

I commented on the more general issue in the pull request.

> Locality waits should be based on task set creation time, not last launch time
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-11460
>                 URL: https://issues.apache.org/jira/browse/SPARK-11460
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0, 1.5.1
>         Environment: YARN
>            Reporter: Shengyue Ji
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Spark waits for spark.locality.waits period before going from RACK_LOCAL to ANY when selecting an executor for assignment. The timeout was essentially reset each time a new assignment is made.
> We were running Spark streaming on Kafka with a 10 second batch window on 32 Kafka partitions with 16 executors. All executors were in the ANY group. At one point one RACK_LOCAL executor was added and all tasks were assigned to it. Each task took about 0.6 second to process, resetting the spark.locality.wait timeout (3000ms) repeatedly. This caused the whole process to under utilize resources and created an increasing backlog.
> spark.locality.wait should be based on the task set creation time, not last launch time so that after 3000ms of initial creation, all executors can get tasks assigned to them.
> We are specifying a zero timeout for now as a workaround to disable locality optimization. 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L556



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org