You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kay Ousterhout (JIRA)" <ji...@apache.org> on 2017/02/07 20:20:41 UTC

[jira] [Created] (SPARK-19502) Remove unnecessary code to re-submit stages in the DAGScheduler

Kay Ousterhout created SPARK-19502:
--------------------------------------

             Summary: Remove unnecessary code to re-submit stages in the DAGScheduler
                 Key: SPARK-19502
                 URL: https://issues.apache.org/jira/browse/SPARK-19502
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 1.1.1
            Reporter: Kay Ousterhout
            Assignee: Kay Ousterhout
            Priority: Minor


There are a [few lines of code in the DAGScheduler](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1215) to re-submit shuffle map stages when some of the tasks fail.  My understanding is that there should be a 1:1 mapping between pending tasks (which are tasks that haven't completed successfully) and available output locations, so that code should never be reachable.  Furthermore, the approach taken by that code (to re-submit an entire stage as a result of task failures) is not how we handle task failures in a stage (the lower-level scheduler resubmits the individual tasks) which is what the 5-years-old TODO on that code seems to be implying should be done.

The big caveat is that there's a bug being fixed in SPARK-19263 that means there is *not* a 1:1 relationship between pendingTasks and available outputLocations, so that code is serving as a (buggy) band-aid.  This should be fixed once we resolve SPARK-19263.

cc [~imranr] [~markhamstra] [~jinxing6042@126.com]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org