You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2016/05/25 21:50:13 UTC

[jira] [Commented] (YARN-4459) container-executor should only kill process groups

    [ https://issues.apache.org/jira/browse/YARN-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300927#comment-15300927 ] 

Hudson commented on YARN-4459:
------------------------------

SUCCESS: Integrated in Hadoop-trunk-Commit #9861 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9861/])
YARN-4459. container-executor should only kill process groups. (jlowe: rev 1ba31fe9e906dbd093afd4b254216601967a4a7b)
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c


> container-executor should only kill process groups
> --------------------------------------------------
>
>                 Key: YARN-4459
>                 URL: https://issues.apache.org/jira/browse/YARN-4459
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-4459.01.patch, YARN-4459.02.patch, YARN-4459.03.patch
>
>
> When calling 'signal_container_as_user' in container-executor, it first checks whether process group exists, if not, it will kill the process itself(if it the process exists).  It is not reasonable because that the process group does not exist means corresponding container has finished, if we kill the process itself, we just kill wrong process.
> We found it happened in our cluster many times. We used same account for starting NM and submitted app, and container-executor sometimes killed NM(the wrongly killed process might just be a newly started thread and was NM's child process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org