You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/02/06 21:19:13 UTC

[jira] [Commented] (MAPREDUCE-4982) AM hung with one pending map task

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572776#comment-13572776 ] 

Jason Lowe commented on MAPREDUCE-4982:
---------------------------------------

Note that this job had many map attempt failures, and a number of nodes had been blacklisted by the AM as a result.  At one point in the log I saw this message which was a bit troubling:

{noformat}
2013-02-03 16:30:32,164 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Could not map allocated container to a valid request. Releasing allocated container Container: [ContainerId: container_1359150825713_856434_01_003359, NodeId: xx, NodeHttpAddress: xx, Resource: memory: 4608, Priority: {Priority: 5}, State: NEW, Token: ContainerToken { kind: ContainerToken, service: xx }, Status: container_id {, app_attempt_id {, application_id {, id: 856434, cluster_timestamp: 1359150825713, }, attemptId: 1, }, id: 3359, }, state: C_NEW, ]
{noformat}

I suspect the AM couldn't associate it with an outstanding map task and lost the container, and that container is effectively the one needed to complete the final map task and therefore the job.

Note that the priority of the missing container is for a failed map.  I'm wondering if a failed map somehow stole a normal priority request, and when the failed map priority request finally came in there were no more failed attempts to associate with it and the container was dropped.

                
> AM hung with one pending map task
> ---------------------------------
>
>                 Key: MAPREDUCE-4982
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4982
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.6
>            Reporter: Jason Lowe
>
> Saw a job that hung with one pending map task that never ran.  The task was in the SCHEDULED state with a single attempt that was in the UNASSIGNED state.  The AM looked like it was waiting for a container from the RM, but the RM was never granting it the one container it needed.
> I suspect the AM botched the container request bookkeeping somehow.  More details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira