You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org> on 2011/12/03 01:17:40 UTC
[jira] [Commented] (MAPREDUCE-3339) Job is getting hanged
indefinitely,if the child processes are killed on the NM. KILL_CONTAINER
eventtype is continuosly sent to the containers that are not existing
[ https://issues.apache.org/jira/browse/MAPREDUCE-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161950#comment-13161950 ]
Siddharth Seth commented on MAPREDUCE-3339:
-------------------------------------------
Ran into this as well. This is the same as https://issues.apache.org/jira/browse/MAPREDUCE-3314?focusedCommentId=13139942&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13139942
The AM keeps getting allocated containers on a single node - since the node has already been blacklisted for previous failures, it releases these containers and just keeps running.
The warning 'Event EventType: KILL_CONTAINER sent to absent container' is expected - the RM tries to tell the NM to kill the container, since the AM could have LAUNCHEd it and then RELEASEd it.
Leaving this Jira open to add a configuration option to the AM, so that it ignores blacklisting in such situations.
> Job is getting hanged indefinitely,if the child processes are killed on the NM. KILL_CONTAINER eventtype is continuosly sent to the containers that are not existing
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3339
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3339
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.0
> Reporter: Ramgopal N
> Assignee: Siddharth Seth
> Priority: Blocker
>
> I have only one NM running.
> I have submitted a job and all the child processes on the NM got killed continuosly.This made the Job to hang indefinitely.
> In the NM logs it is logging WARN message :org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1320301910500_0004_01_001359
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira