You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/08/02 16:53:48 UTC

[jira] [Resolved] (MAPREDUCE-5444) MRAppMaster throws InvalidStateTransitonException: Invalid event: JOB_AM_REBOOT at SUCCEEDED

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe resolved MAPREDUCE-5444.
-----------------------------------

    Resolution: Invalid

bq. I have one point to add here that, immidiately after job is succeeded , app master got reboot command from RM. JobClient is exitted( see MAPREDUCE-5441 ). By the time, RM has launched 2nd attempt of app master. 2nd attempt app master too compete for resources, but there is no client waiting getting job report.I feel this is problem.

There will always be a race where the job has just succeeded but the RM gets out of sync with the AM before the AM can unregister.  Normally the AM will exit, another AM attempt will be launched by the RM, and the new attempt will recover the previous SUCCEEDED state and exit shortly afterwards without launching any subsequent tasks.

As for the client, that's an orthogonal problem.  It's not required that a client be listening to an application as it executes, and if the client is unnecessarily exiting across an AM restart then we can tackle that issue in MAPREDUCE-5441.
                
> MRAppMaster throws InvalidStateTransitonException: Invalid event: JOB_AM_REBOOT at SUCCEEDED
> --------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5444
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5444
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster
>            Reporter: Rohith Sharma K S
>            Priority: Minor
>
> {noformat}
> 2013-08-02 14:55:11,537 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for JobFinishedEvent 
> 2013-08-02 14:55:11,538 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1375199817609_0049Job Transitioned from COMMITTING to SUCCEEDED
> 2013-08-02 14:55:11,663 INFO [Thread-52] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://0.0.0.0:45000/home/restest/staging-dir/restest/.staging/job_1375199817609_0049/job_1375199817609_0049_2.jhist to hdfs://0.0.0.0:45000/home/restest/staging-dir/history/done_intermediate/restest/job_1375199817609_0049-1375435337429-restest-word+count-1375435511533-10-1-SUCCEEDED-a.jhist_tmp
> 2013-08-02 14:55:11,750 INFO [Thread-52] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: hdfs://0.0.0.0:45000/home/restest/staging-dir/history/done_intermediate/restest/job_1375199817609_0049-1375435337429-restest-word+count-1375435511533-10-1-SUCCEEDED-a.jhist_tmp
> 2013-08-02 14:55:11,769 INFO [Thread-52] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying hdfs://0.0.0.0:45000/home/restest/staging-dir/restest/.staging/job_1375199817609_0049/job_1375199817609_0049_2_conf.xml to hdfs://0.0.0.0:45000/home/restest/staging-dir/history/done_intermediate/restest/job_1375199817609_0049_conf.xml_tmp
> 2013-08-02 14:55:11,880 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:1 CompletedMaps:10 CompletedReds:1 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
> 2013-08-02 14:55:13,649 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Error communicating with RM: Resource Manager doesn't recognize AttemptId: application_1375199817609_0049
> org.apache.hadoop.yarn.YarnException: Resource Manager doesn't recognize AttemptId: application_1375199817609_0049
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:626)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:238)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:250)
> 	at java.lang.Thread.run(Thread.java:662)
> 2013-08-02 14:55:13,649 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_AM_REBOOT at SUCCEEDED
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:914)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:129)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1114)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1110)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
> 	at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.realDispatch(RecoveryService.java:309)
> 	at org.apache.hadoop.mapreduce.v2.app.recover.RecoveryService$RecoveryDispatcher.dispatch(RecoveryService.java:305)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
> 	at java.lang.Thread.run(Thread.java:662)
> 2013-08-02 14:55:13,652 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: JobHistoryEvent is triggered from JobImpl
> 2013-08-02 14:55:13,652 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1375199817609_0049Job Transitioned from SUCCEEDED to ERROR
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira