You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Tassapol Athiapinya (JIRA)" <ji...@apache.org> on 2014/03/21 01:24:48 UTC

[jira] [Created] (TEZ-966) Tez AM has invalid state transition error when datanode is bad.

Tassapol Athiapinya created TEZ-966:
---------------------------------------

             Summary: Tez AM has invalid state transition error when datanode is bad.
                 Key: TEZ-966
                 URL: https://issues.apache.org/jira/browse/TEZ-966
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.4.0
            Reporter: Tassapol Athiapinya


I found AM has an invalid event error when AM complains datanode is bad.

{code}
java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.18.145.215:35766 remote=/172.18.145.215:50010]
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1985)
	at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:796)
2014-03-20 08:27:09,529 WARN [AsyncDispatcher event handler] org.apache.hadoop.hdfs.DFSClient: Error while syncing
java.io.IOException: All datanodes 172.18.145.215:50010 are bad. Aborting...
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2014-03-20 08:27:09,530 WARN [AsyncDispatcher event handler] org.apache.tez.dag.history.recovery.RecoveryService: Error handling summary event, eventType=VERTEX_FINISHED
java.io.IOException: All datanodes 172.18.145.215:50010 are bad. Aborting...
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2014-03-20 08:27:09,530 ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Failed to send vertex finished event to recovery
java.io.IOException: All datanodes 172.18.145.215:50010 are bad. Aborting...
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2014-03-20 08:27:09,531 ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event V_TASK_COMPLETED on vertex initialmap with vertexId vertex_1395294589125_0141_1_00 at current state RUNNING
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_TASK_COMPLETED at RUNNING
	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
	at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1202)
	at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:155)
	at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1549)
	at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1535)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
	at java.lang.Thread.run(Thread.java:722)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)