You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Hitesh Shah (Updated) (JIRA)" <ji...@apache.org> on 2011/10/23 04:19:32 UTC

[jira] [Updated] (MAPREDUCE-3240) NM should send a SIGKILL for completed containers also

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hitesh Shah updated MAPREDUCE-3240:
-----------------------------------

    Attachment: MR-3240.wip.patch

Patch does the following: 

- introduced sending a sigterm followed by a sigkill when cleaning up a container
  - new config settings introduced for the delay between sigterm and sigkill 

- introduced activeContainers within the ContainerExecutor. Used by the launcher to set whether a container should be launched or not. If cleanup is called before the process starts, this flag ensures that the process is never started. Addresses race-kill issue in MR-3084 

- Getting the pid after the shell executor  has completed is unreliable so now task.sh writes the pid into a local file which can be read by the containerlauncher and used to kill the process. 
 


                
> NM should send a SIGKILL for completed containers also
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3240
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3240
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Hitesh Shah
>         Attachments: MR-3240.wip.patch
>
>
> This is to address the containers which exit properly after spawning sub-processes themselves. We don't want to leave these sub-process-tree or else they can pillage the NM's resources.
> Today, we already have code to send SIGKILL to the whole process-trees (because of single sessionId resulting from  setsid) when the container is alive. We need to obtain the PID of the containers when they start and use that PID to send signal for completed containers' case also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira