You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2017/08/10 18:11:01 UTC

[jira] [Created] (SPARK-21695) Spark scheduler locality algorithm can take longer then expected

Thomas Graves created SPARK-21695:
-------------------------------------

             Summary: Spark scheduler locality algorithm can take longer then expected
                 Key: SPARK-21695
                 URL: https://issues.apache.org/jira/browse/SPARK-21695
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 2.1.0
            Reporter: Thomas Graves


Reference jira https://issues.apache.org/jira/browse/SPARK-21656

I'm seeing an issue with some jobs where the scheduler takes a long time to schedule tasks on executors.   The default locality wait is 3 seconds so I was expecting that an executor should get some task on it in max 9 seconds (node local, rack local, any), but its taking way more time then that.  In the case of spark-21656 it takes 60+ seconds and executors idle timeout.  

We should investigate why and see if we can fix this.

Upon an initial look it seems the scheduler resets the locality lastLaunchTime whenever it places any task on a node at that locality level. It appears this means it can take way longer then 3 seconds for any particular task to fall back, but this needs to be verified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org