You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/09/15 03:58:45 UTC

[jira] [Created] (SPARK-10607) Scheduler should include defensive measures against infinite loops due to task commit denial

Josh Rosen created SPARK-10607:
----------------------------------

             Summary: Scheduler should include defensive measures against infinite loops due to task commit denial
                 Key: SPARK-10607
                 URL: https://issues.apache.org/jira/browse/SPARK-10607
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 1.5.0, 1.4.1, 1.3.1
            Reporter: Josh Rosen
            Priority: Minor


If OutputCommitter.commitTask() repeatedly fails due to the OutputCommitCoordinator denying the right to commit, then scheduler may get stuck in an infinite task retry loop. The reason for this behavior is the fact  that DAGScheduler treats failures due to CommitDenied separately from other failures: they don't count towards the typical count of maximum task failures which can trigger a job failure. The correct fix is to add an upper-bound on the number of times that a commit can be denied as a last-ditch safety net to avoid infinite loop behavior.

See SPARK-10381 for additional context. This is not a high priority issue to fix right now, since the fix in SPARK-10381 should prevent this scenario from happening in the first place. However, another layer of conservative defensive limits / timeouts certainly would not hurt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org