You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (Created) (JIRA)" <ji...@apache.org> on 2012/02/27 23:39:51 UTC

[jira] [Created] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
-----------------------------------------------------------------------------------

                 Key: MAPREDUCE-3932
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mr-am, mrv2
    Affects Versions: 0.23.0
            Reporter: Vinod Kumar Vavilapalli
            Assignee: Vinod Kumar Vavilapalli
             Fix For: 0.23.2


[~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
{code}
2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
	at java.lang.Thread.run(Thread.java:619)
2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
	at java.lang.Thread.run(Thread.java:619)

{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251863#comment-13251863 ] 

Hudson commented on MAPREDUCE-3932:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #2068 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2068/])
    MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324902)

     Result = ABORTED
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324902
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java

                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.3
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-3932:
-------------------------------------------

    Attachment: MR-3932.txt

This patch addresses Sid's comments.
                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.2
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251814#comment-13251814 ] 

Hudson commented on MAPREDUCE-3932:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #2129 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2129/])
    MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324902)

     Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324902
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java

                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.3
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251805#comment-13251805 ] 

Siddharth Seth commented on MAPREDUCE-3932:
-------------------------------------------

Thanks Bobby. +1. Will commit this shortly.
                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.2
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-3932:
-------------------------------------------

    Status: Patch Available  (was: Open)
    
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.2
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251797#comment-13251797 ] 

Hadoop QA commented on MAPREDUCE-3932:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522277/MR-3932.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 1 new or modified test files.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2202//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2202//console

This message is automatically generated.
                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.2
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-3932:
-------------------------------------------

    Status: Open  (was: Patch Available)

Thanks for the review Sid.  I will update the comments, and try to remove the unneeded token stuff.  I was lazy and did a copy and paste. 
                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.2
>
>         Attachments: MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Siddharth Seth (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siddharth Seth updated MAPREDUCE-3932:
--------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.23.2)
                   0.23.3
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2 and branch-0.23.
                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.3
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250994#comment-13250994 ] 

Hadoop QA commented on MAPREDUCE-3932:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522148/MR-3932.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 1 new or modified test files.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2187//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2187//console

This message is automatically generated.
                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.2
>
>         Attachments: MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252384#comment-13252384 ] 

Hudson commented on MAPREDUCE-3932:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #225 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/225/])
    MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324904)

     Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324904
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java

                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.3
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-3932:
-------------------------------------------

            Assignee: Robert Joseph Evans  (was: Vinod Kumar Vavilapalli)
    Target Version/s: 0.23.3
              Status: Patch Available  (was: Open)
    
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.2
>
>         Attachments: MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249877#comment-13249877 ] 

Robert Joseph Evans commented on MAPREDUCE-3932:
------------------------------------------------

@Vinod I know you are swamped is it OK with you if I take this one?
                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Critical
>             Fix For: 0.23.2
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252444#comment-13252444 ] 

Hudson commented on MAPREDUCE-3932:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #1047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1047/])
    MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324902)

     Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324902
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java

                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.3
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Amol Kekre (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amol Kekre updated MAPREDUCE-3932:
----------------------------------

    Priority: Critical  (was: Major)

Making it critical
                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Critical
>             Fix For: 0.23.2
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy closed MAPREDUCE-3932.
------------------------------------

    
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.2-alpha
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251318#comment-13251318 ] 

Siddharth Seth commented on MAPREDUCE-3932:
-------------------------------------------

Bobby, the patch looks good. Couple of minor comments.
The additional comment in TestTaskAttempt about "testAttemptContainerRequest" needing to run first - nice catch, would be useful to have a little more info in the comment though (referring to ContainerLaunchContext creation ?).
Also, I believe the new test you added can be run without bothering with credentials, tokens etc.
                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.2
>
>         Attachments: MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251821#comment-13251821 ] 

Hudson commented on MAPREDUCE-3932:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #2055 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2055/])
    MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324902)

     Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324902
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java

                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.3
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-3932:
-------------------------------------------

    Attachment: MR-3932.txt

I traced out the State Machine for TaskAttemptImpl and I verified that all states after ASSIGNED, can handle TA_CONTAINER_LAUNCHED and/or TA_CONTAINER_LAUNCH_FAILED depending on how they are returned.  I also looked at the JobImpl state Machine ad did a similar thing for the Counter Updates.
                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Critical
>             Fix For: 0.23.2
>
>         Attachments: MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252397#comment-13252397 ] 

Hudson commented on MAPREDUCE-3932:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #1012 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1012/])
    MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324902)

     Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324902
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java

                
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.3
>
>         Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira