You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Joshua Cohen (JIRA)" <ji...@apache.org> on 2016/02/11 17:39:18 UTC

[jira] [Created] (AURORA-1614) Failed sandbox initialization can cause tasks to go LOST

Joshua Cohen created AURORA-1614:
------------------------------------

             Summary: Failed sandbox initialization can cause tasks to go LOST
                 Key: AURORA-1614
                 URL: https://issues.apache.org/jira/browse/AURORA-1614
             Project: Aurora
          Issue Type: Bug
          Components: Executor
            Reporter: Joshua Cohen
            Priority: Minor


When we initialize the sandbox, we only catch Sandbox specific error types, meaning that if an unexpected error is raised, the executor just hangs until the timeout is exceeded, at which point the task goes lost.

We should instead broadly catch exceptions raised during sandbox initialization and quickly fail tasks.

Additionally, the {{DockerDirectorySandbox}} was not properly catching errors raised when creating/symlinking which led to the above problem in the event of a misconfiguration. In practice this issue shouldn't have occurred in normal usage, but it made development slow until I tracked down what was causing the tasks to just hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)