You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/18 13:00:43 UTC

[jira] [Commented] (FLINK-6300) PID1 of docker images does not behave correctly

    [ https://issues.apache.org/jira/browse/FLINK-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972612#comment-15972612 ] 

ASF GitHub Bot commented on FLINK-6300:
---------------------------------------

GitHub user patricklucas opened a pull request:

    https://github.com/apache/flink/pull/3734

    [FLINK-6300] Use 'exec' in start-foreground calls

    To avoid signal-handling issues in Docker, applications need to run as
    a single executable or use a process manager that forwards signals
    correctly, in either case running as PID 1.
    
    Since Flink uses a number of chained scripts before the ultimate call
    to `java`, we need to use `exec` so that the script executable is
    replaced, ultimately resulting in a single `java` process as PID 1.
    
    There's no need to run a process manager since Flink only actually
    requires a single process.
    
    **Note:** As there has not been a Flink release since these codepaths were introduced, I'm comfortable with merging this change for 1.2.1 since my manual testing was successful.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/patricklucas/flink FLINK-6300_docker_pid_1_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3734
    
----
commit 3be49de04c239083a5f050bf891834fc8abc12c9
Author: Patrick Lucas <me...@patricklucas.com>
Date:   2017-04-18T12:48:49Z

    [FLINK-6300] Use 'exec' in start-foreground calls
    
    To avoid signal-handling issues in Docker, applications need to run as
    a single executable or use a process manager that forwards signals
    correctly, in either case running as PID 1.
    
    Since Flink uses a number of chained scripts before the ultimate call
    to `java`, we need to use `exec` so that the script executable is
    replaced, ultimately resulting in a single `java` process as PID 1.
    
    There's no need to run a process manager since Flink only actually
    requires a single process.

----


> PID1 of docker images does not behave correctly
> -----------------------------------------------
>
>                 Key: FLINK-6300
>                 URL: https://issues.apache.org/jira/browse/FLINK-6300
>             Project: Flink
>          Issue Type: Bug
>          Components: Docker
>    Affects Versions: 2.0.0, 1.1.4
>         Environment: all
>            Reporter: kathleen sharp
>            Assignee: Patrick Lucas
>            Priority: Minor
>
> When running the task manager and job manager docker images the process with PID1 is a bash script.
> There is a problem in using bash as the PID1 process in a docker
> container as docker sends SIGTERM, but bash doesn't send this to its
> child processes.
> This means for example that if a container was ever killed and a child
> process had a file open then the file may get corrupted.
> It's covered in more detail in a blog post here:
> https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
> From the mailing list (Nico):
> "Some background:
> Although docker-entrypoint.sh uses "exec" to run succeeding bash scripts for
> jobmanager.sh and taskmanager.sh, respectively, and thus replaces itself with
> these scripts, they do not seem to use exec themselves for foreground
> processes and thus may run into the problem you described.
> I may be wrong, but I did not find any other fallback to handle this in the
> current code base."
> Potentially useful information:
> dockerd version 1.1.3 added an init flag:
> "You can use the --init flag to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures the usual responsibilities of an init system, such as reaping zombie processes, are performed inside the created container."
> from:
> https://docs.docker.com/engine/reference/run/#restart-policies---restart
> perhaps the fix could be just to update readme for these images to specify to use this flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)