You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2015/06/25 06:28:04 UTC

[jira] [Comment Edited] (TEZ-2576) It is not necessary to send NodeFailureEvent to task attempt of completed DAG

    [ https://issues.apache.org/jira/browse/TEZ-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600667#comment-14600667 ] 

Jeff Zhang edited comment on TEZ-2576 at 6/25/15 4:27 AM:
----------------------------------------------------------

This might cause state machine error when node failure happens when AM is IDLE

{code}
2015-06-25 12:13:02,419 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: DAG_VERTEX_RERUNNING at SUCCEEDED
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
	at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
	at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1090)
	at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1)
	at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1924)
	at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1)
	at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
	at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
	at java.lang.Thread.run(Thread.java:745)
{code}


was (Author: zjffdu):
This might cause state machine error when node failure happens when AM Is the IDLE

{code}
2015-06-25 12:13:02,419 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: DAG_VERTEX_RERUNNING at SUCCEEDED
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
	at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
	at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1090)
	at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1)
	at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1924)
	at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1)
	at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
	at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
	at java.lang.Thread.run(Thread.java:745)
{code}

> It is not necessary to send NodeFailureEvent to task attempt of completed DAG
> -----------------------------------------------------------------------------
>
>                 Key: TEZ-2576
>                 URL: https://issues.apache.org/jira/browse/TEZ-2576
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>
> When node fails, it would send NodeFailureEvent to all the task attempts on this node. It is not necessary to send this to the task attempts that belong to the completed dags. 
> {code}
>  for (TezTaskAttemptID taId : container.failedAssignments) {
>         container.sendNodeFailureToTA(taId, errorMessage, TaskAttemptTerminationCause.NODE_FAILED);
>       }
>       for (TezTaskAttemptID taId : container.completedAttempts) {
>         container.sendNodeFailureToTA(taId, errorMessage, TaskAttemptTerminationCause.NODE_FAILED);
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)