You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (Created) (JIRA)" <ji...@apache.org> on 2012/02/27 23:39:51 UTC
[jira] [Created] (MAPREDUCE-3932) MR tasks failing and crashing the
AM when available-resources/headRoom becomes zero
MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
-----------------------------------------------------------------------------------
Key: MAPREDUCE-3932
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Fix For: 0.23.2
[~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
{code}
2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
at java.lang.Thread.run(Thread.java:619)
2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
at java.lang.Thread.run(Thread.java:619)
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251863#comment-13251863 ]
Hudson commented on MAPREDUCE-3932:
-----------------------------------
Integrated in Hadoop-Mapreduce-trunk-Commit #2068 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2068/])
MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324902)
Result = ABORTED
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324902
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.3
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the
AM when available-resources/headRoom becomes zero
Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Joseph Evans updated MAPREDUCE-3932:
-------------------------------------------
Attachment: MR-3932.txt
This patch addresses Sid's comments.
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251814#comment-13251814 ]
Hudson commented on MAPREDUCE-3932:
-----------------------------------
Integrated in Hadoop-Hdfs-trunk-Commit #2129 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2129/])
MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324902)
Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324902
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.3
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251805#comment-13251805 ]
Siddharth Seth commented on MAPREDUCE-3932:
-------------------------------------------
Thanks Bobby. +1. Will commit this shortly.
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the
AM when available-resources/headRoom becomes zero
Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Joseph Evans updated MAPREDUCE-3932:
-------------------------------------------
Status: Patch Available (was: Open)
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251797#comment-13251797 ]
Hadoop QA commented on MAPREDUCE-3932:
--------------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12522277/MR-3932.txt
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in .
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2202//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2202//console
This message is automatically generated.
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the
AM when available-resources/headRoom becomes zero
Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Joseph Evans updated MAPREDUCE-3932:
-------------------------------------------
Status: Open (was: Patch Available)
Thanks for the review Sid. I will update the comments, and try to remove the unneeded token stuff. I was lazy and did a copy and paste.
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the
AM when available-resources/headRoom becomes zero
Posted by "Siddharth Seth (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siddharth Seth updated MAPREDUCE-3932:
--------------------------------------
Resolution: Fixed
Fix Version/s: (was: 0.23.2)
0.23.3
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)
Committed to trunk, branch-2 and branch-0.23.
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.3
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250994#comment-13250994 ]
Hadoop QA commented on MAPREDUCE-3932:
--------------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12522148/MR-3932.txt
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in .
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2187//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2187//console
This message is automatically generated.
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252384#comment-13252384 ]
Hudson commented on MAPREDUCE-3932:
-----------------------------------
Integrated in Hadoop-Hdfs-0.23-Build #225 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/225/])
MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324904)
Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324904
Files :
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.3
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the
AM when available-resources/headRoom becomes zero
Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Joseph Evans updated MAPREDUCE-3932:
-------------------------------------------
Assignee: Robert Joseph Evans (was: Vinod Kumar Vavilapalli)
Target Version/s: 0.23.3
Status: Patch Available (was: Open)
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249877#comment-13249877 ]
Robert Joseph Evans commented on MAPREDUCE-3932:
------------------------------------------------
@Vinod I know you are swamped is it OK with you if I take this one?
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Critical
> Fix For: 0.23.2
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252444#comment-13252444 ]
Hudson commented on MAPREDUCE-3932:
-----------------------------------
Integrated in Hadoop-Mapreduce-trunk #1047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1047/])
MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324902)
Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324902
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.3
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the
AM when available-resources/headRoom becomes zero
Posted by "Amol Kekre (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amol Kekre updated MAPREDUCE-3932:
----------------------------------
Priority: Critical (was: Major)
Making it critical
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Critical
> Fix For: 0.23.2
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (MAPREDUCE-3932) MR tasks failing and crashing the
AM when available-resources/headRoom becomes zero
Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy closed MAPREDUCE-3932.
------------------------------------
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.3, 2.0.2-alpha
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251318#comment-13251318 ]
Siddharth Seth commented on MAPREDUCE-3932:
-------------------------------------------
Bobby, the patch looks good. Couple of minor comments.
The additional comment in TestTaskAttempt about "testAttemptContainerRequest" needing to run first - nice catch, would be useful to have a little more info in the comment though (referring to ContainerLaunchContext creation ?).
Also, I believe the new test you added can be run without bothering with credentials, tokens etc.
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251821#comment-13251821 ]
Hudson commented on MAPREDUCE-3932:
-----------------------------------
Integrated in Hadoop-Common-trunk-Commit #2055 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2055/])
MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324902)
Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324902
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.3
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3932) MR tasks failing and crashing the
AM when available-resources/headRoom becomes zero
Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Joseph Evans updated MAPREDUCE-3932:
-------------------------------------------
Attachment: MR-3932.txt
I traced out the State Machine for TaskAttemptImpl and I verified that all states after ASSIGNED, can handle TA_CONTAINER_LAUNCHED and/or TA_CONTAINER_LAUNCH_FAILED depending on how they are returned. I also looked at the JobImpl state Machine ad did a similar thing for the Counter Updates.
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3932) MR tasks failing and crashing
the AM when available-resources/headRoom becomes zero
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252397#comment-13252397 ]
Hudson commented on MAPREDUCE-3932:
-----------------------------------
Integrated in Hadoop-Hdfs-trunk #1012 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1012/])
MAPREDUCE-3932. Fix the TaskAttempt state machine to handle CONTIANER_LAUNCHED and CONTIANER_LAUNCH_FAILED events in additional states. (Contributed by Robert Joseph Evans) (Revision 1324902)
Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1324902
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java
> MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
> -----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3932
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.3
>
> Attachments: MR-3932.txt, MR-3932.txt
>
>
> [~karams] reported this offline. One reduce task gets preempted because of zero headRoom and crashes the AM.
> {code}
> 2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 44544
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3
> 2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000006 to attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000007 to attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1329995034628_0983_01_000008 to attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 availableResources(headroom):memory: 0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:20
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule...
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 netScheduledMapMem:9216 netScheduledReduceMem:4608
> 2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
> 2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host6 to /$rack6
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,976 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host1 to /$rack1
> 2012-02-23 11:30:16,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,981 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved $host9 to /$rack9
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
> 2012-02-23 11:30:16,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000002_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from ASSIGNED to KILL_CONTAINER_CLEANUP
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,983 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000000_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:16,984 INFO [ContainerLauncher #3] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,987 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for taskAttempt attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1329995034628_0983_r_000001_0
> 2012-02-23 11:30:16,988 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container was killed before it was launched
> 2012-02-23 11:30:17,061 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000000_0 : 53990
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000001_0 TaskAttempt Transitioned from KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
> 2012-02-23 11:30:17,077 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1329995034628_0983_r_000001_0: Container was killed before it was launched
> 2012-02-23 11:30:17,078 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1329995034628_0983_r_000001_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_CONTAINER_LAUNCH_FAILED at KILL_TASK_CLEANUP
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:926)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:870)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:862)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> 2012-02-23 11:30:17,080 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1329995034628_0983_r_000000_0] using containerId: [container_1329995034628_0983_01_000006 on NM: [$host6:51529]
> 2012-02-23 11:30:17,081 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1329995034628_0983_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port returned by ContainerManager for attempt_1329995034628_0983_r_000002_0 : 47960
> 2012-02-23 11:30:17,207 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1329995034628_0983_r_000002_0
> 2012-02-23 11:30:17,215 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1329995034628_0983Job Transitioned from RUNNING to ERROR
> 2012-02-23 11:30:17,216 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
> at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
> at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:657)
> at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:111)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:848)
> at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:844)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:82)
> at java.lang.Thread.run(Thread.java:619)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira