You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/09/01 01:41:45 UTC

[jira] [Created] (SPARK-10381) Infinite loop when OutputCommitCoordination is enabled and OutputCommitter.commitTask throws exception

Josh Rosen created SPARK-10381:
----------------------------------

             Summary: Infinite loop when OutputCommitCoordination is enabled and OutputCommitter.commitTask throws exception
                 Key: SPARK-10381
                 URL: https://issues.apache.org/jira/browse/SPARK-10381
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 1.4.1, 1.3.1, 1.5.0
            Reporter: Josh Rosen
            Assignee: Josh Rosen
            Priority: Critical


When speculative execution is enabled, consider a scenario where the authorized committer of a particular output partition fails during the OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator is supposed to release that committer's exclusive lock on committing once that task fails. However, due to a unit mismatch the lock will not be released, causing Spark to go into an infinite retry loop.

This bug was masked by the fact that the OutputCommitCoordinator does not have enough end-to-end tests (the current tests use many mocks). Other factors contributing to this bug are the fact that we have many similarly-named identifiers that have different semantics but the same data types (e.g. attemptNumber and taskAttemptId, with inconsistent variable naming which makes them difficult to distinguish).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org