You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jim Brennan (JIRA)" <ji...@apache.org> on 2018/08/10 22:23:00 UTC

[jira] [Commented] (YARN-6495) check docker container's exit code when writing to cgroup task files

    [ https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576925#comment-16576925 ] 

Jim Brennan commented on YARN-6495:
-----------------------------------

As part of YARN-8648, I am proposing that we can just remove the code that this patch is fixing.  If we are using cgroups, we are passing the {{cgroup-parent}} argument to docker, which accomplishes what this code was trying to do in a much more deterministic and reliable way.

My proposal would be to remove this code as part of YARN-8648, but if there is a preference for doing that in a separate Jira, I can file a new one.  Assuming there is agreement, I think we can close out this Jira.

[~Jaeboo], [~ebadger], do you agree?

> check docker container's exit code when writing to cgroup task files
> --------------------------------------------------------------------
>
>                 Key: YARN-6495
>                 URL: https://issues.apache.org/jira/browse/YARN-6495
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Jaeboo Jeong
>            Assignee: Jim Brennan
>            Priority: Major
>              Labels: Docker
>         Attachments: YARN-6495.001.patch, YARN-6495.002.patch
>
>
> If I execute simple command like date on docker container, the application failed to complete successfully.
> for example, 
> {code}
> $ yarn  jar $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker -shell_command "date" -jar $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar -num_containers 1 -timeout 3600000
> …
> 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring loop
> 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to complete successfully
> {code}
> The error log is like below.
> {code}
> ...
> Failed to write pid to file /cgroup_parent/cpu/hadoop-yarn/container_xxxx/tasks - No such process
> ...
> {code}
> When writing pid to cgroup tasks, container-executor doesn’t check docker container’s status.
> If the container finished very quickly, we can’t write pid to cgroup tasks, and it is not problem.
> So container-executor needs to check docker container’s exit code during writing pid to cgroup tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org