You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Jian He (JIRA)" <ji...@apache.org> on 2013/09/16 19:40:08 UTC
[jira] [Resolved] (MAPREDUCE-5471) Succeed job tries to restart
after RMrestart
[ https://issues.apache.org/jira/browse/MAPREDUCE-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jian He resolved MAPREDUCE-5471.
--------------------------------
Resolution: Duplicate
Closed as a duplicate of YARN-540
> Succeed job tries to restart after RMrestart
> --------------------------------------------
>
> Key: MAPREDUCE-5471
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5471
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Yesha Vora
> Assignee: Jian He
> Priority: Blocker
> Attachments: MR5471-1AM.log, MR5471-2AM.log
>
>
> Run a job , restart RM when job just finished. It should not restart the job once it Succeed.
> After RM restart, The AM of restarted job fails with below error.
> AM log after Rmrestart:
> 013-08-19 17:29:21,144 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping JobHistoryEventHandler. Size of the outstanding queue size is 0
> 2013-08-19 17:29:21,145 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
> 2013-08-19 17:29:21,146 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://host1:port1/user/ABC/.staging/job_1376933101704_0001
> 2013-08-19 17:29:21,156 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://host1:port1/ABC/.staging/job_1376933101704_0001/job.splitmetainfo
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1469)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1324)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1291)
> at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:922)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:131)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1184)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:995)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1323)
> Caused by: java.io.FileNotFoundException: File does not exist: hdfs://host1:port1/ABC/.staging/job_1376933101704_0001/job.splitmetainfo
> at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1121)
> at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1113)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:78)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1113)
> at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1464)
> ... 17 more
> 2013-08-19 17:29:21,158 INFO [Thread-2] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. Signaling RMCommunicator and JobHistoryEventHandler.
> 2013-08-19 17:29:21,159 WARN [Thread-2] org.apache.hadoop.util.ShutdownHookManager: ShutdownHook 'MRAppMasterShutdownHook' failed, java.lang.NullPointerException
> java.lang.NullPointerException
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:805)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1344)
> at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira