You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Mike Liddell (JIRA)" <ji...@apache.org> on 2013/07/09 02:03:48 UTC

[jira] [Updated] (TEZ-284) Add 'terminationCause' tracking to DAGImpl and VertexImpl

     [ https://issues.apache.org/jira/browse/TEZ-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Liddell updated TEZ-284:
-----------------------------

    Labels: TEZ-0.2.0  (was: )
    
> Add 'terminationCause' tracking to DAGImpl and VertexImpl
> ---------------------------------------------------------
>
>                 Key: TEZ-284
>                 URL: https://issues.apache.org/jira/browse/TEZ-284
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Mike Liddell
>            Assignee: Mike Liddell
>              Labels: TEZ-0.2.0
>
> By tracking a reasonably exact cause of termination, we can be more precise with state-machine logic and facilitate post-mortems.
> For example, DAGImpl and VertexImpl use a checkXYZforCompletion() method to determine whether to transition to a final state.  In non-success cases, the root cause determines if we transition to FAILED or KILLED.  
> This helps implement TEZ-141 "DAG does not kill running vertices when going into failed state" and TEZ-143 "Vertex doesn not kill other running tasks when it fails due to a task failure".  (Simpler solutions for state-machine issues are available but the general tracking of root-causes seems valuable for its port-mortem uses).
> The initial improvement is to get general support going.. later JIRAs will add more diagnostics support and additional 'root causes' as necessary.
> Example:
> public enum VertexTerminationCause {
>   /** DAG was killed  */
>   DAG_KILL, 
>   /** Other vertex failed causing DAG to fail thus killing this vertex  */
>   OTHER_VERTEX_FAILURE,
>   /** One of the tasks for this vertex failed.  */
>   OWN_TASK_FAILURE, 
>   /** This vertex failed during commit. */
>   COMMIT_FAILURE,
>   /** This vertex failed as it had zero tasks. */
>   ZERO_TASKS, 
>   /** This vertex failed during init. */
>   GENERIC_INIT_FAILURE

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira