You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2015/04/01 04:06:53 UTC

[jira] [Updated] (TEZ-2260) AM been shutdown due to NoSuchMethodError in DAGProtos

     [ https://issues.apache.org/jira/browse/TEZ-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Zhang updated TEZ-2260:
----------------------------
    Attachment: applog.tar

attach the full app logs when running TestTezJobs.testSortMergeJoinExamplePipeline

> AM been shutdown due to NoSuchMethodError in DAGProtos
> ------------------------------------------------------
>
>                 Key: TEZ-2260
>                 URL: https://issues.apache.org/jira/browse/TEZ-2260
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>         Attachments: applog.tar
>
>
> Not sure why this happens, maybe due to environment issue.
> {code}
> 2015-04-01 09:08:49,757 INFO [Dispatcher thread: Central] history.HistoryEventHandler: [HISTORY][DAG:dag_1427850436467_0007_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=datagen, taskAttemptId=attempt_1427850436467_0007_1_00_000000_0, startTime=1427850527981, finishTime=1427850529750, timeTaken=1769, status=SUCCEEDED, errorEnum=, diagnostics=, counters=Counters: 8, File System Counters, HDFS_BYTES_READ=0, HDFS_BYTES_WRITTEN=953030, HDFS_READ_OPS=9, HDFS_LARGE_READ_OPS=0, HDFS_WRITE_OPS=6, org.apache.tez.common.counters.TaskCounter, GC_TIME_MILLIS=46, COMMITTED_HEAP_BYTES=257425408, OUTPUT_RECORDS=44195
> 2015-04-01 09:08:49,757 FATAL [RecoveryEventHandlingThread] yarn.YarnUncaughtExceptionHandler: Thread Thread[RecoveryEventHandlingThread,5,main] threw an Error.  Shutting down now...
> java.lang.NoSuchMethodError: org.apache.tez.dag.api.records.DAGProtos$TezCountersProto$Builder.access$26000()Lorg/apache/tez/dag/api/records/DAGProtos$TezCountersProto$Builder;
> 	at org.apache.tez.dag.api.records.DAGProtos$TezCountersProto.newBuilder(DAGProtos.java:24581)
> 	at org.apache.tez.dag.api.DagTypeConverters.convertTezCountersToProto(DagTypeConverters.java:544)
> 	at org.apache.tez.dag.history.events.TaskAttemptFinishedEvent.toProto(TaskAttemptFinishedEvent.java:97)
> 	at org.apache.tez.dag.history.events.TaskAttemptFinishedEvent.toProtoStream(TaskAttemptFinishedEvent.java:120)
> 	at org.apache.tez.dag.history.recovery.RecoveryService.handleRecoveryEvent(RecoveryService.java:403)
> 	at org.apache.tez.dag.history.recovery.RecoveryService.access$700(RecoveryService.java:50)
> 	at org.apache.tez.dag.history.recovery.RecoveryService$1.run(RecoveryService.java:158)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-04-01 09:08:49,757 INFO [Dispatcher thread: Central] impl.TaskAttemptImpl: attempt_1427850436467_0007_1_00_000000_0 TaskAttempt Transitioned from RUNNING to SUCCEEDED due to event TA_DONE
> {code}
> This issue result in several consequent issues. Because this error cause the AM to recovery in the next attempt. But in the next attempt it meet the following issue, looks like data node crashed.
> {code}
> 2015-04-01 09:09:00,093 WARN [Thread-82] hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[127.0.0.1:56238, 127.0.0.1:56234], original=[127.0.0.1:56238, 127.0.0.1:56234]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1040)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1106)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594)
> 2015-04-01 09:09:00,093 WARN [Dispatcher thread: Central] hdfs.DFSClient: Error while syncing
> java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[127.0.0.1:56238, 127.0.0.1:56234], original=[127.0.0.1:56238, 127.0.0.1:56234]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1040)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1106)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594)
> 2015-04-01 09:09:00,094 ERROR [Dispatcher thread: Central] recovery.RecoveryService: Error handling summary event, eventType=VERTEX_FINISHED
> java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[127.0.0.1:56238, 127.0.0.1:56234], original=[127.0.0.1:56238, 127.0.0.1:56234]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1040)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1106)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1253)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:594)
> {code}
> Because of the above issue (summary recovery log error), it cause the AM shutdown, and in the client side, it throw SessionNotRunning Exception without any diagnostic info. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)