You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2016/04/13 20:03:25 UTC

[jira] [Commented] (TEZ-3213) Uncaught exception during vertex recovery leads to invalid state transition loop

    [ https://issues.apache.org/jira/browse/TEZ-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239706#comment-15239706 ] 

Jason Lowe commented on TEZ-3213:
---------------------------------

Sample log showing the initial error and the subsequent loop
{noformat}
2016-04-12 08:46:23,002 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Uncaught Exception when handling event V_SOURCE_VERTEX_RECOVERED on vertex scope-4784 with vertexId vertex_1459233834927_3098531_1_14 at current state RECOVERING
java.lang.RuntimeException: Invalid Vertex state, found non-zero recovered events in invalid state, recoveredState=KILLED, recoveredEvents=3840
        at org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:3298)
        at org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:3004)
        at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1875)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202)
        at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2115)
        at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2101)
        at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
[...]
2016-04-12 08:46:23,062 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Can't handle Invalid event V_INTERNAL_ERROR on vertex scope-4784 with vertexId vertex_1459233834927_3098531_1_14 at current state RECOVERING
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_INTERNAL_ERROR at RECOVERING
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1875)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202)
        at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2115)
        at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2101)
        at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
[...]
2016-04-12 08:46:23,086 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Can't handle Invalid event V_INTERNAL_ERROR on vertex scope-4784 with vertexId vertex_1459233834927_3098531_1_14 at current state RECOVERING
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_INTERNAL_ERROR at RECOVERING
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1875)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202)
        at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2115)
        at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2101)
        at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
[...]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_INTERNAL_ERROR at RECOVERING
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1875)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202)
        at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2115)
        at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2101)
        at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
{noformat}


> Uncaught exception during vertex recovery leads to invalid state transition loop
> --------------------------------------------------------------------------------
>
>                 Key: TEZ-3213
>                 URL: https://issues.apache.org/jira/browse/TEZ-3213
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jason Lowe
>
> If an uncaught exception occurs during a state transition from the RECOVERING vertex then V_INTERNAL_ERROR will be delivered to the state machine, but that event is not handled in the RECOVERING state.  That in turn causes a V_INTERNAL_ERROR event to be delivered to the state machine, and it loops logging the invalid transitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)