You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org> on 2011/12/03 01:17:40 UTC

[jira] [Commented] (MAPREDUCE-3339) Job is getting hanged indefinitely,if the child processes are killed on the NM. KILL_CONTAINER eventtype is continuosly sent to the containers that are not existing

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161950#comment-13161950 ] 

Siddharth Seth commented on MAPREDUCE-3339:
-------------------------------------------

Ran into this as well. This is the same as https://issues.apache.org/jira/browse/MAPREDUCE-3314?focusedCommentId=13139942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13139942

The AM keeps getting allocated containers on a single node - since the node has already been blacklisted for previous failures, it releases these containers and just keeps running.

The warning 'Event EventType: KILL_CONTAINER sent to absent container' is expected - the RM tries to tell the NM to kill the container, since the AM could have LAUNCHEd it and then RELEASEd it.

Leaving this Jira open to add a configuration option to the AM, so that it ignores blacklisting in such situations.
                
> Job is getting hanged indefinitely,if the child processes are killed on the NM.  KILL_CONTAINER eventtype is continuosly sent to the containers that are not existing
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3339
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3339
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ramgopal N
>            Assignee: Siddharth Seth
>            Priority: Blocker
>
> I have only one NM running.
> I have submitted a job and all the child processes on the NM got killed continuosly.This made the Job to hang indefinitely.
> In the NM logs it is logging WARN message :org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1320301910500_0004_01_001359 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira