You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2015/05/01 20:52:06 UTC

[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

    [ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523666#comment-14523666 ] 

Bikas Saha commented on TEZ-2379:
---------------------------------

Please see analysis above. It should actually be invalid for an attempt killed to come after the task state is killed because the task is supposed to wait for attempts to be complete before entering a final state - thats the whole point of the kill_wait state, right?
IMO, the fix is to have the attempt ignore a kill request if its already done. Thoughts?

> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2379
>                 URL: https://issues.apache.org/jira/browse/TEZ-2379
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Hitesh Shah
>            Priority: Blocker
>         Attachments: TEZ-2379.1.patch
>
>
> {noformat}
> 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_000013
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>         at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
>         at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
>         at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
>         at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
>         at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
>         at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Additional notes:
> ============
> Hive - latest build 
> Tez - master
> tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)