You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Mike Liddell (JIRA)" <ji...@apache.org> on 2013/07/09 02:03:48 UTC
[jira] [Updated] (TEZ-284) Add 'terminationCause' tracking to
DAGImpl and VertexImpl
[ https://issues.apache.org/jira/browse/TEZ-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mike Liddell updated TEZ-284:
-----------------------------
Labels: TEZ-0.2.0 (was: )
> Add 'terminationCause' tracking to DAGImpl and VertexImpl
> ---------------------------------------------------------
>
> Key: TEZ-284
> URL: https://issues.apache.org/jira/browse/TEZ-284
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Mike Liddell
> Assignee: Mike Liddell
> Labels: TEZ-0.2.0
>
> By tracking a reasonably exact cause of termination, we can be more precise with state-machine logic and facilitate post-mortems.
> For example, DAGImpl and VertexImpl use a checkXYZforCompletion() method to determine whether to transition to a final state. In non-success cases, the root cause determines if we transition to FAILED or KILLED.
> This helps implement TEZ-141 "DAG does not kill running vertices when going into failed state" and TEZ-143 "Vertex doesn not kill other running tasks when it fails due to a task failure". (Simpler solutions for state-machine issues are available but the general tracking of root-causes seems valuable for its port-mortem uses).
> The initial improvement is to get general support going.. later JIRAs will add more diagnostics support and additional 'root causes' as necessary.
> Example:
> public enum VertexTerminationCause {
> /** DAG was killed */
> DAG_KILL,
> /** Other vertex failed causing DAG to fail thus killing this vertex */
> OTHER_VERTEX_FAILURE,
> /** One of the tasks for this vertex failed. */
> OWN_TASK_FAILURE,
> /** This vertex failed during commit. */
> COMMIT_FAILURE,
> /** This vertex failed as it had zero tasks. */
> ZERO_TASKS,
> /** This vertex failed during init. */
> GENERIC_INIT_FAILURE
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira