You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Robert Parker (JIRA)" <ji...@apache.org> on 2012/12/12 20:22:20 UTC

[jira] [Updated] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Parker updated MAPREDUCE-4833:
-------------------------------------

    Assignee: Robert Parker
    
> Task can get stuck in FAIL_CONTAINER_CLEANUP
> --------------------------------------------
>
>                 Key: MAPREDUCE-4833
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.5
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Parker
>            Priority: Critical
>
> If an NM goes down and the AM still tries to launch a container on it the ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the RM may notice that the NM has gone away and inform the AM of this, this triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try to kill the container, but the ContainerLauncherImpl will not send back a TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira