You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2015/05/07 07:37:01 UTC

[jira] [Commented] (TEZ-2427) TestFaultTolerance NPE in RecoveryService

    [ https://issues.apache.org/jira/browse/TEZ-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532042#comment-14532042 ] 

Jeff Zhang commented on TEZ-2427:
---------------------------------

The issue is due to DAGFinishedEvent is logged twice in the case of InternalError after DAG has finished. In RecoveryService, dag's outputStream is removed and closed when DAGFinishedEvent is handled, so the second DAGFinishedEvent will cause NPE.

Besides, I believe the TestFaultTolerance should not have InternalErrorTransition, [~hitesh] Do you have the app logs ?



> TestFaultTolerance NPE in RecoveryService
> -----------------------------------------
>
>                 Key: TEZ-2427
>                 URL: https://issues.apache.org/jira/browse/TEZ-2427
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>            Priority: Critical
>
> From https://builds.apache.org/job/Tez-Build/1055/
> 2015-05-06 23:55:54,488 ERROR [Dispatcher thread: Central] common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> 	at org.apache.tez.dag.history.recovery.RecoveryService.doFlush(RecoveryService.java:458)
> 	at org.apache.tez.dag.history.recovery.RecoveryService.handle(RecoveryService.java:289)
> 	at org.apache.tez.dag.history.HistoryEventHandler.handleCriticalEvent(HistoryEventHandler.java:102)
> 	at org.apache.tez.dag.app.dag.impl.DAGImpl.logJobHistoryUnsuccesfulEvent(DAGImpl.java:1161)
> 	at org.apache.tez.dag.app.dag.impl.DAGImpl.finished(DAGImpl.java:1275)
> 	at org.apache.tez.dag.app.dag.impl.DAGImpl.access$2600(DAGImpl.java:144)
> 	at org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2151)
> 	at org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2140)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> 	at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
> 	at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
> 	at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
> 	at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
> 	at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
> 	at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> 	at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> 	at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)