You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2016/01/19 17:11:39 UTC

[jira] [Commented] (TEZ-3051) Vertex failed with invalid event DAG_VERTEX_RERUNNING at SUCCEEDED

    [ https://issues.apache.org/jira/browse/TEZ-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106922#comment-15106922 ] 

Jason Lowe commented on TEZ-3051:
---------------------------------

>From the DAG post log:
{noformat}
2016-01-15 20:29:16,982 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: vertex_1445184332588_547863_1_00 [scope-193] back to running due to rescheduling task_1445184332588_547863_1_00_001314
2016-01-15 20:29:16,982 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: vertex_1445184332588_547863_1_00 [scope-193] transitioned from SUCCEEDED to RUNNING due to event V_TASK_RESCHEDULED
2016-01-15 20:29:16,982 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: vertex_1445184332588_547863_1_02 [scope-195] back to running due to rescheduling task_1445184332588_547863_1_02_000535
2016-01-15 20:29:16,982 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: vertex_1445184332588_547863_1_02 [scope-195] transitioned from SUCCEEDED to RUNNING due to event V_TASK_RESCHEDULED
2016-01-15 20:29:16,982 [ERROR] [Dispatcher thread {Central}] |impl.DAGImpl|: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: DAG_VERTEX_RERUNNING at SUCCEEDED
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1125)
        at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:144)
        at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1998)
        at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1989)
        at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
2016-01-15 20:29:16,983 [ERROR] [Dispatcher thread {Central}] |impl.DAGImpl|: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: DAG_VERTEX_RERUNNING at SUCCEEDED
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1125)
        at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:144)
        at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1998)
        at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1989)
        at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
2016-01-15 20:29:16,983 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: dag_1445184332588_547863_1 terminating due to internal error
{noformat}

A bit before this there was a log message noting that an AM node transitioned from ACTIVE to UNHEALTHY, and it looks like it caused some completed task attempts to be marked killed.  That in turn caused the tasks and parent vertex to try to transition from a terminal state back to running.  In this case the DAG had actually completed successfully, but the late node failure triggered an attempt to resurrect the completed DAG.


> Vertex failed with invalid event DAG_VERTEX_RERUNNING at SUCCEEDED
> ------------------------------------------------------------------
>
>                 Key: TEZ-3051
>                 URL: https://issues.apache.org/jira/browse/TEZ-3051
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jason Lowe
>
> I saw a job fail due to an internal error on a vertex: org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: DAG_VERTEX_RERUNNING at SUCCEEDED
> Stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)