You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/18 13:00:43 UTC
[jira] [Commented] (FLINK-6300) PID1 of docker images does not
behave correctly
[ https://issues.apache.org/jira/browse/FLINK-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972612#comment-15972612 ]
ASF GitHub Bot commented on FLINK-6300:
---------------------------------------
GitHub user patricklucas opened a pull request:
https://github.com/apache/flink/pull/3734
[FLINK-6300] Use 'exec' in start-foreground calls
To avoid signal-handling issues in Docker, applications need to run as
a single executable or use a process manager that forwards signals
correctly, in either case running as PID 1.
Since Flink uses a number of chained scripts before the ultimate call
to `java`, we need to use `exec` so that the script executable is
replaced, ultimately resulting in a single `java` process as PID 1.
There's no need to run a process manager since Flink only actually
requires a single process.
**Note:** As there has not been a Flink release since these codepaths were introduced, I'm comfortable with merging this change for 1.2.1 since my manual testing was successful.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/patricklucas/flink FLINK-6300_docker_pid_1_fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3734.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3734
----
commit 3be49de04c239083a5f050bf891834fc8abc12c9
Author: Patrick Lucas <me...@patricklucas.com>
Date: 2017-04-18T12:48:49Z
[FLINK-6300] Use 'exec' in start-foreground calls
To avoid signal-handling issues in Docker, applications need to run as
a single executable or use a process manager that forwards signals
correctly, in either case running as PID 1.
Since Flink uses a number of chained scripts before the ultimate call
to `java`, we need to use `exec` so that the script executable is
replaced, ultimately resulting in a single `java` process as PID 1.
There's no need to run a process manager since Flink only actually
requires a single process.
----
> PID1 of docker images does not behave correctly
> -----------------------------------------------
>
> Key: FLINK-6300
> URL: https://issues.apache.org/jira/browse/FLINK-6300
> Project: Flink
> Issue Type: Bug
> Components: Docker
> Affects Versions: 2.0.0, 1.1.4
> Environment: all
> Reporter: kathleen sharp
> Assignee: Patrick Lucas
> Priority: Minor
>
> When running the task manager and job manager docker images the process with PID1 is a bash script.
> There is a problem in using bash as the PID1 process in a docker
> container as docker sends SIGTERM, but bash doesn't send this to its
> child processes.
> This means for example that if a container was ever killed and a child
> process had a file open then the file may get corrupted.
> It's covered in more detail in a blog post here:
> https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
> From the mailing list (Nico):
> "Some background:
> Although docker-entrypoint.sh uses "exec" to run succeeding bash scripts for
> jobmanager.sh and taskmanager.sh, respectively, and thus replaces itself with
> these scripts, they do not seem to use exec themselves for foreground
> processes and thus may run into the problem you described.
> I may be wrong, but I did not find any other fallback to handle this in the
> current code base."
> Potentially useful information:
> dockerd version 1.1.3 added an init flag:
> "You can use the --init flag to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures the usual responsibilities of an init system, such as reaping zombie processes, are performed inside the created container."
> from:
> https://docs.docker.com/engine/reference/run/#restart-policies---restart
> perhaps the fix could be just to update readme for these images to specify to use this flag.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)