You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2014/03/21 01:40:50 UTC

[jira] [Updated] (TEZ-966) Tez AM has invalid state transition error when datanode is bad.

     [ https://issues.apache.org/jira/browse/TEZ-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hitesh Shah updated TEZ-966:
----------------------------

    Attachment: TEZ-966.1.patch

> Tez AM has invalid state transition error when datanode is bad.
> ---------------------------------------------------------------
>
>                 Key: TEZ-966
>                 URL: https://issues.apache.org/jira/browse/TEZ-966
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Tassapol Athiapinya
>         Attachments: TEZ-966.1.patch
>
>
> I found AM has an invalid event error when AM complains datanode is bad.
> {code}
> java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.18.145.215:35766 remote=/172.18.145.215:50010]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:83)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:83)
> 	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1985)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:796)
> 2014-03-20 08:27:09,529 WARN [AsyncDispatcher event handler] org.apache.hadoop.hdfs.DFSClient: Error while syncing
> java.io.IOException: All datanodes 172.18.145.215:50010 are bad. Aborting...
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
> 2014-03-20 08:27:09,530 WARN [AsyncDispatcher event handler] org.apache.tez.dag.history.recovery.RecoveryService: Error handling summary event, eventType=VERTEX_FINISHED
> java.io.IOException: All datanodes 172.18.145.215:50010 are bad. Aborting...
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
> 2014-03-20 08:27:09,530 ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Failed to send vertex finished event to recovery
> java.io.IOException: All datanodes 172.18.145.215:50010 are bad. Aborting...
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
> 2014-03-20 08:27:09,531 ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event V_TASK_COMPLETED on vertex initialmap with vertexId vertex_1395294589125_0141_1_00 at current state RUNNING
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_TASK_COMPLETED at RUNNING
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> 	at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1202)
> 	at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:155)
> 	at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1549)
> 	at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1535)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> 	at java.lang.Thread.run(Thread.java:722)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)