You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Patrick Lucas <pa...@ververica.com> on 2020/04/02 14:34:06 UTC

Re: [DISCUSS] FLIP-111: Docker image unification

Thanks Andrey for working on this, and everyone else for your feedback.

This FLIP inspired me to discuss and write down some ideas I've had for a
while about configuring and running Flink (especially in Docker) that go
beyond the scope of this FLIP, but don't contradict what it sets out to do.

The crux of it is that Flink should be maximally configurable using
environment variables, and not require manipulation of the filesystem (i.e.
moving/linking JARs or editing config files) in order to run in a large
majority of cases. And beyond that, particular for running Flink in Docker,
is that as much logic as possible should be a part of Flink itself and not,
for instance, in the docker-entrypoint.sh script. I've resisted adding
additional logic to the Flink Docker images except where necessary since
the beginning, and I believe we can get to the point where the only thing
the entrypoint script does is drop privileges before invoking a script
included in Flink.

Ultimately, my ideal end-goal for running Flink in containers would fulfill
> the following points:
>
>    - A user can configure all “start-time” aspects of Flink with
>    environment variables, including additions to the classpath
>    - Flink automatically adapts to the resources available to the
>    container (such as what BashJavaUtils helps with today)
>    - A user can include additional JARs using a mounted volume, or at
>    image build time with convenient tooling
>    - The role/mode (jobmanager, session) is specified as a command line
>    argument, with a single entrypoint program sufficing for all uses of the
>    image
>
> As a bonus, if we could eliminate some or most of the layers of shell
> scripts that are involved in starting a Flink server, perhaps by
> re-implementing this part of the stack in Java, and exec-ing to actually
> run Flink with the proper java CLI arguments, I think it would be a big win
> for the project.


You can read the rest of my notes here:
https://docs.google.com/document/d/1JCACSeDaqeZiXD9G1XxQBunwi-chwrdnFm38U1JxTDQ/edit

On Wed, Mar 4, 2020 at 10:34 AM Andrey Zagrebin <az...@apache.org>
wrote:

> Hi All,
>
> If you have ever touched the docker topic in Flink, you
> probably noticed that we have multiple places in docs and repos which
> address its various concerns.
>
> We have prepared a FLIP [1] to simplify the perception of docker topic in
> Flink by users. It mostly advocates for an approach of extending official
> Flink image from the docker hub. For convenience, it can come with a set of
> bash utilities and documented examples of their usage. The utilities allow
> to:
>
>    - run the docker image in various modes (single job, session master,
>    task manager etc)
>    - customise the extending Dockerfile
>    - and its entry point
>
> Eventually, the FLIP suggests to remove all other user facing Dockerfiles
> and building scripts from Flink repo, move all docker docs to
> apache/flink-docker and adjust existing docker use cases to refer to this
> new approach (mostly Kubernetes now).
>
> The first contributed version of Flink docker integration also contained
> example and docs for the integration with Bluemix in IBM cloud. We also
> suggest to maintain it outside of Flink repository (cc Markus Müller).
>
> Thanks,
> Andrey
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification
>

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Ismaël Mejía <ie...@gmail.com>.
I am coming extremely late to this discussion since the vote already started but
it is great we are finally getting into unification Enthusiast +1. Kudos to
Andrey and the rest of the community for bringing all the useful and different
perspectives.

I just want to bring information on two tickets that were created on parallel to
the discussion that were mostly migrated from the old docker-flink repo that
both Patrick and me have been maintaining for the last 3 years (now repatriated
into the flink-docker repo):

FLINK-16260 Add docker images based on Java 11 (PR ready)
FLINK-16846 Add python docker images

Both are related to the current discussion. The one Java 11 address just the
users wishes. We can see this as a good way to validate our support and offer it
to users but of course this should not be the default image that FLIP-111 will
be based on until the community agrees on it (probably in the future).

The python one definitely deserves more discussion. Today the official Flink
docker images do not contain python so users must extend them to contain python.
Since there are so many nice improvements on Flink for Python, is maybe the time
to release images with python support? This of course brings other questions of
which versions to support and how are we going to test them. So maybe we should
open a specific thread or FLIP on that once FLIP-111 is done.

On Wed, Apr 8, 2020 at 4:35 AM Canbin Zheng <fe...@gmail.com> wrote:
>
> Hi, all,
>
> Thanks for the reply, Andrey!
>
> I have filed two new tickets tracking the problems:
> 1. FLINK-17033 <https://issues.apache.org/jira/browse/FLINK-17033> for
> upgrading base Java Docker image, I pointed out some other problems
>     the openjdk:8-jre-alpine could have in the ticket‘s description.
> 2. FLINK-17034 <https://issues.apache.org/jira/browse/FLINK-17034> for
> suggesting executing the container CMD under TINI.
>
> Regards,
> Canbin Zheng
>
> Andrey Zagrebin <az...@apache.org> 于2020年4月7日周二 下午4:58写道:
>
> > Hi all,
> >
> > Thanks for the further feedback Niels and Canbin.
> >
> > @Niels
> >
> > I agree with Till, the comments about docker tags are valid concerns and we
> > can discuss them in dedicated ML threads
> > in parallel or after the general unification of Dockerfiles suggested by
> > this FLIP.
> >
> > One thing to add about point 4. The native Kubernetes integration does not
> > support a job mode at the moment.
> > This is not only about the image. As I understand, even if you pack the job
> > artefacts into the image, the native Kubernetes integration will start a
> > session cluster.
> > This will be a follow-up for the native Kubernetes integration.
> > cc @Yang Wang
> >
> > @Canbin
> >
> > I think you raise valid concerns. It makes sense to create JIRA issues for
> > them.
> > One for the alpine image problem and one to suggest the TINI as a blocker
> > for FLINK-15843 <https://issues.apache.org/jira/browse/FLINK-15843> and
> > slow pod shutdown.
> > We can discuss and address them in parallel or after the general
> > unification of Dockerfiles suggested by this FLIP.
> >
> > I will start a separate voting thread for this FLIP.
> >
> > Cheers,
> > Andrey
> >
> >
> > On Mon, Apr 6, 2020 at 5:49 PM Canbin Zheng <fe...@gmail.com>
> > wrote:
> >
> > > Hi, all
> > >
> > > Thanks a lot for this FLIP and all the fruitable discussion. I am not
> > sure
> > > whether the following questions are in the scope of this FLIP, but I
> > still
> > > expect your reply:
> > >
> > >    1. Which docker base image do we plan to use for Java? As far as I
> > >    see, openjdk:8-jre-alpine[1] is not officially supported by the
> > OpenJDK
> > >    project anymore; openjdk:8-jre is larger than openjdk:8-jre-slim in
> > size so
> > >    that we use the latter one in our internal branch and it works fine
> > so far.
> > >    2. Is it possible that we execute the container CMD under *TINI*[2]
> > >    instead of the shell for better hygiene? As far as I see, the
> > container of
> > >    the JM or TMs is running in the shell form and it could not receive
> > the
> > >    *TERM* signal when the pod is deleted[3]. Some of the problems are as
> > >    follows:
> > >       - The JM and the TMs could have no chance of cleanup, I used to
> > >       create FLINK-15843[4] for tracking this problem.
> > >       - The pod could take a long time(up to 40 seconds) to be deleted
> > >       after the K8s API Server receives the deletion request.
> > >
> > >            At the moment, we use *TINI* in our internal branch for the
> > > native K8s setup and it solves the problems mentioned above.
> > >
> > > [1]
> > >
> > >
> > https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links
> > >
> > >
> > https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d1377774732c33f7b8368e099
> > >  [2]
> > > https://github.com/krallin/tini
> > >  [3]
> > > https://docs.docker.com/engine/reference/commandline/kill/
> > >  [4]
> > > https://issues.apache.org/jira/browse/FLINK-15843
> > >
> > > Regards,
> > > Canbin Zheng
> > >
> > > Till Rohrmann <tr...@apache.org> 于2020年4月6日周一 下午5:34写道:
> > >
> > >> Thanks for the feedback Niels. This is very helpful.
> > >>
> > >> 1. I agree `flink:latest` is nice to get started but in the long run
> > >> people
> > >> will want to pin their dependencies to a specific Flink version. I think
> > >> the fix will happen as part of FLINK-15794.
> > >>
> > >> 2. SNAPSHOT docker images will be really helpful for developers as well
> > as
> > >> users who want to use the latest features. I believe that this will be a
> > >> follow-up of this FLIP.
> > >>
> > >> 3. The goal of FLIP-111 is to create an image which allows to start a
> > >> session as well as job cluster. Hence, I believe that we will solve this
> > >> problem soon.
> > >>
> > >> 4. Same as 3. The new image will also contain the native K8s integration
> > >> so
> > >> that there is no need to create a special image modulo the artifacts you
> > >> want to add.
> > >>
> > >> Additional notes:
> > >>
> > >> 1. I agree that one log makes it harder to separate different execution
> > >> attempts or different tasks. However, on the other hand, it gives you an
> > >> overall picture of what's happening in a Flink process. If things were
> > >> split apart, then it might become super hard to detect problems in the
> > >> runtime which affect the user code to fail or vice versa, for example.
> > In
> > >> general cross correlation will be harder. I guess a solution could be to
> > >> make this configurable. In any case, we should move the discussion about
> > >> this topic into a separate thread.
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Mon, Apr 6, 2020 at 10:40 AM Niels Basjes <Ni...@basjes.nl> wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > Sorry for jumping in at this late point of the discussion.
> > >> > I see a lot of things I really like and I would like to put my "needs"
> > >> and
> > >> > observations here too so you take them into account (where possible).
> > >> > I suspect that there will be overlap with things you already have
> > taken
> > >> > into account.
> > >> >
> > >> >    1. No more 'flink:latest' docker image tag.
> > >> >    Related to https://issues.apache.org/jira/browse/FLINK-15794
> > >> >    What I have learned is that the 'latest' version of a docker image
> > >> only
> > >> >    makes sense IFF this is an almost standalone thing.
> > >> >    So if I have a servlet that does something in isolation (like my
> > >> hobby
> > >> >    project https://hub.docker.com/r/nielsbasjes/yauaa ) then 'latest'
> > >> > makes
> > >> >    sense.
> > >> >    With Flink you have the application code and all nodes in the
> > cluster
> > >> >    that are depending on each other and as such must run the exact
> > same
> > >> >    versions of the base software.
> > >> >    So if you run flink in a cluster (local/yarn/k8s/mesos/swarm/...)
> > >> where
> > >> >    the application and the nodes inter communicate and closely depend
> > on
> > >> > each
> > >> >    other then 'latest' is a bad idea.
> > >> >       1. Assume I have an application built against the Flink N api
> > and
> > >> the
> > >> >       cluster downloads the latest which is also Flink N.
> > >> >       Then a week later Flink N+1 is released and the API I use
> > changes
> > >> >       (Deprecated)
> > >> >       and a while later Flink N+2 is released and the deprecated API
> > is
> > >> >       removed: Then my application no longer works even though I have
> > >> > not changed
> > >> >       anything.
> > >> >       So I want my application to be 'pinned' to the exact version I
> > >> built
> > >> >       it with.
> > >> >       2. I have a running cluster with my application and cluster
> > >> running
> > >> >       Flink N.
> > >> >       I add some additional nodes and the new nodes pick up the Flink
> > >> N+1
> > >> >       image ... now I have a cluster with mixed versions.
> > >> >       3. The version of flink is really the "Flink+Scala" version
> > pair.
> > >> >       If you have the right flink but the wrong scala you get really
> > >> nasty
> > >> >       errors: https://issues.apache.org/jira/browse/FLINK-16289
> > >> >
> > >> >       2. Deploy SNAPSHOT docker images (i.e. something like
> > >> >    *flink:1.11-SNAPSHOT_2.12*) .
> > >> >    More and more use cases will be running on the code delivered via
> > >> Docker
> > >> >    images instead of bare jar files.
> > >> >    So if a "SNAPSHOT" is released and deployed into a 'staging' maven
> > >> repo
> > >> >    (which may be locally on the developers workstation) then it is my
> > >> > opinion
> > >> >    that at the same moment a "SNAPSHOT" docker image should be
> > >> >    created/deployed.
> > >> >    Each time a "SNAPSHOT" docker image is released this will overwrite
> > >> the
> > >> >    previous "SNAPSHOT".
> > >> >    If the final version is released the SNAPSHOTs of that version
> > >> >    can/should be removed.
> > >> >    This will make testing in clusters a lot easier.
> > >> >    Also building a local fix and then running it locally will work
> > >> without
> > >> >    additional modifications to the code.
> > >> >
> > >> >    3. Support for a 'single application cluster'
> > >> >    I've been playing around with the S3 plugin and what I have found
> > is
> > >> >    that this essentially requires all nodes to have full access to the
> > >> >    credentials needed to connect to S3.
> > >> >    This essentially means that a multi-tenant setup is not possible in
> > >> >    these cases.
> > >> >    So I think the single application cluster should be a feature
> > >> available
> > >> >    in all cases.
> > >> >
> > >> >    4. I would like a native-kubernetes-single-application base image.
> > >> >    I can then create a derived image where I only add the jar of my
> > >> >    application.
> > >> >    My desire is that I can then create a k8s yaml file for kubectl
> > >> >    that adds the needed configs/secrets/arguments/environment
> > variables
> > >> and
> > >> >    starts the cluster and application.
> > >> >    Because the native kubernetes support makes it automatically scale
> > >> based
> > >> >    on the application this should 'just work'.
> > >> >
> > >> > Additional note:
> > >> >
> > >> >    1. Job/Task attempt logging instead of task manager logging.
> > >> >    *I realize this has nothing to do with the docker images*
> > >> >    I found something "hard to work with" while running some tests last
> > >> > week.
> > >> >    The logging is done to a single log for the task manager.
> > >> >    So if I have multiple things running in the single task manager
> > then
> > >> the
> > >> >    logs are mixed together.
> > >> >    Also several attempts of the same task are mixed which makes it
> > very
> > >> >    hard to find out 'what went wrong'.
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Apr 3, 2020 at 4:27 PM Ufuk Celebi <uc...@apache.org> wrote:
> > >> >
> > >> > > Thanks for the summary, Andrey. Good idea to link Patrick's document
> > >> from
> > >> > > the FLIP as a future direction so it doesn't get lost. Could you
> > make
> > >> > sure
> > >> > > to revive that discussion when FLIP-111 nears an end?
> > >> > >
> > >> > > This is good to go on my part. +1 to start the VOTE.
> > >> > >
> > >> > >
> > >> > > @Till, @Yang: Thanks for the clarification with the output
> > >> redirection. I
> > >> > > didn't see that. The concern with the `tee` approach is that the
> > file
> > >> > would
> > >> > > grow indefinitely. I think we can solve this with regular logging by
> > >> > > redirecting stderr to ERROR log level, but I'm not sure. We can look
> > >> at a
> > >> > > potential solution when we get to that point. :-)
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Fri, Apr 3, 2020 at 3:36 PM Andrey Zagrebin <
> > azagrebin@apache.org>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi everyone,
> > >> > > >
> > >> > > > Patrick and Ufuk, thanks a lot for more ideas and suggestions!
> > >> > > >
> > >> > > > I have updated the FLIP according to the current state of
> > >> discussion.
> > >> > > > Now it also contains the implementation steps and future
> > follow-ups.
> > >> > > > Please, review if there are any concerns.
> > >> > > > The order of the steps aims for keeping Flink releasable at any
> > >> point
> > >> > if
> > >> > > > something does not have enough time to get in.
> > >> > > >
> > >> > > > It looks that we are reaching mostly a consensus for the open
> > >> > questions.
> > >> > > > There is also a list of items, which have been discussed in this
> > >> > thread,
> > >> > > > and short summary below.
> > >> > > > As soon as there are no concerns, I will create a voting thread.
> > >> > > >
> > >> > > > I also added some thoughts for further customising logging setup.
> > >> This
> > >> > > may
> > >> > > > be an optional follow-up
> > >> > > > which is additional to the default logging into files for Web UI.
> > >> > > >
> > >> > > > # FLIP scope
> > >> > > > The focus is users of the official releases.
> > >> > > > Create docs for how to use the official docker image.
> > >> > > > Remove other Dockerfiles in Flink repo.
> > >> > > > Rely on running the official docker image in different modes
> > >> (JM/TM).
> > >> > > > Customise running the official image with env vars (This should
> > >> > minimise
> > >> > > > manual manipulating of local files and creation of a custom
> > image).
> > >> > > >
> > >> > > > # Base oficial image
> > >> > > >
> > >> > > > ## Java versions
> > >> > > > There is a separate effort for this:
> > >> > > > https://github.com/apache/flink-docker/pull/9
> > >> > > >
> > >> > > > # Run image
> > >> > > >
> > >> > > > ## Entry point modes
> > >> > > > JM session, JM job, TM
> > >> > > >
> > >> > > > ## Entry point config
> > >> > > > We use env vars for this, e.g. FLINK_PROPERTIES and
> > >> > > ENABLE_BUILT_IN_PLUGINS
> > >> > > >
> > >> > > > ## Flink config options
> > >> > > > We document the existing FLINK_PROPERTIES env var to override
> > config
> > >> > > > options in flink-conf.yaml.
> > >> > > > Then later, we do not need to expose and handle any other special
> > >> env
> > >> > > vars
> > >> > > > for config options (address, port etc).
> > >> > > > The future plan is to make Flink process configurable by env vars,
> > >> e.g.
> > >> > > > 'some.yaml.option: val' -> FLINK_SOME_YAML_OPTION=val
> > >> > > >
> > >> > > > ## Extra files: jars, custom logging properties
> > >> > > > We can provide env vars to point to custom locations, e.g. in
> > >> mounted
> > >> > > > volumes.
> > >> > > >
> > >> > > > # Extend image
> > >> > > >
> > >> > > > ## Python/hadoop versions, activating certain libs/plugins
> > >> > > > Users can install extra dependencies and change configs in their
> > >> custom
> > >> > > > image which extends our base image.
> > >> > > >
> > >> > > > # Logging
> > >> > > >
> > >> > > > ## Web UI
> > >> > > > Modify the *log4j-console.properties* to also output logs into the
> > >> > files
> > >> > > > for WebUI. Limit log file size.
> > >> > > >
> > >> > > > ## Container output
> > >> > > > Separate effort for proper split of Flink process stdout and
> > stderr
> > >> > into
> > >> > > > files and container output
> > >> > > > (idea with tee command: `program start-foreground &2>1 | tee
> > >> > > > flink-user-taskexecutor.out`)
> > >> > > >
> > >> > > > # Docker bash utils
> > >> > > > We are not going to expose it to users as an API.
> > >> > > > They should be able either to configure and run the standard entry
> > >> > point
> > >> > > > or the documentation should give short examples about how to
> > extend
> > >> and
> > >> > > > customise the base image.
> > >> > > >
> > >> > > > During the implementation, we will see if it makes sense to factor
> > >> out
> > >> > > > certain bash procedures
> > >> > > > to reuse them e.g. in custom dev versions of docker image.
> > >> > > >
> > >> > > > # Dockerfile / image for developers
> > >> > > > We keep it on our future roadmap. This effort should help to
> > >> understand
> > >> > > > what we can reuse there.
> > >> > > >
> > >> > > > Best,
> > >> > > > Andrey
> > >> > > >
> > >> > > >
> > >> > > > On Fri, Apr 3, 2020 at 12:57 PM Till Rohrmann <
> > trohrmann@apache.org
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > >> Hi everyone,
> > >> > > >>
> > >> > > >> just a small inline comment.
> > >> > > >>
> > >> > > >> On Fri, Apr 3, 2020 at 11:42 AM Ufuk Celebi <uc...@apache.org>
> > >> wrote:
> > >> > > >>
> > >> > > >> > Hey Yang,
> > >> > > >> >
> > >> > > >> > thanks! See inline answers.
> > >> > > >> >
> > >> > > >> > On Fri, Apr 3, 2020 at 5:11 AM Yang Wang <
> > danrtsey.wy@gmail.com>
> > >> > > wrote:
> > >> > > >> >
> > >> > > >> > > Hi Ufuk,
> > >> > > >> > >
> > >> > > >> > > Thanks for make the conclusion and directly point out what
> > >> need to
> > >> > > be
> > >> > > >> > done
> > >> > > >> > > in
> > >> > > >> > > FLIP-111. I agree with you that we should narrow down the
> > scope
> > >> > and
> > >> > > >> focus
> > >> > > >> > > the
> > >> > > >> > > most important and basic part about docker image unification.
> > >> > > >> > >
> > >> > > >> > > (1) Extend the entrypoint script in apache/flink-docker to
> > >> start
> > >> > the
> > >> > > >> job
> > >> > > >> > >> cluster entry point
> > >> > > >> > >
> > >> > > >> > > I want to add a small requirement for the entry point script.
> > >> > > >> Currently,
> > >> > > >> > > for the native
> > >> > > >> > > K8s integration, we are using the apache/flink-docker image,
> > >> but
> > >> > > with
> > >> > > >> > > different entry
> > >> > > >> > > point("kubernetes-entry.sh"). Generate the java cmd in
> > >> > > KubernetesUtils
> > >> > > >> > and
> > >> > > >> > > run it
> > >> > > >> > > in the entry point. I really hope it could merge to
> > >> > > >> apache/flink-docker
> > >> > > >> > > "docker-entrypoint.sh".
> > >> > > >> > >
> > >> > > >> >
> > >> > > >> > The script [1] only adds the FLINK_CLASSPATH env var which
> > seems
> > >> > > >> generally
> > >> > > >> > reasonable to me. But since principled classpath and entrypoint
> > >> > > >> > configuration is somewhat related to the follow-up improvement
> > >> > > >> proposals, I
> > >> > > >> > could also see this being done after FLIP-111.
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > > (2) Extend the example log4j-console configuration
> > >> > > >> > >> => support log retrieval from the Flink UI out of the box
> > >> > > >> > >
> > >> > > >> > > If you mean to update the
> > >> > "flink-dist/conf/log4j-console.properties"
> > >> > > >> to
> > >> > > >> > > support console and
> > >> > > >> > > local log files. I will say "+1". But we need to find a
> > proper
> > >> way
> > >> > > to
> > >> > > >> > make
> > >> > > >> > > stdout/stderr output
> > >> > > >> > > both available for console and log files. Maybe till's
> > proposal
> > >> > > could
> > >> > > >> > help
> > >> > > >> > > to solve this.
> > >> > > >> > > "`program &2>1 | tee flink-user-taskexecutor.out`"
> > >> > > >> > >
> > >> > > >> >
> > >> > > >> > I think we can simply add a rolling file appender with a limit
> > on
> > >> > the
> > >> > > >> log
> > >> > > >> > size.
> > >> > > >> >
> > >> > > >> > I think this won't solve Yang's concern. What he wants to
> > >> achieve is
> > >> > > >> that
> > >> > > >> STDOUT and STDERR go to STDOUT and STDERR as well as into some
> > >> *.out
> > >> > and
> > >> > > >> *.err file which are accessible from the web ui. I don't think
> > that
> > >> > log
> > >> > > >> appender will help with this problem.
> > >> > > >>
> > >> > > >> Cheers,
> > >> > > >> Till
> > >> > > >>
> > >> > > >>
> > >> > > >> > – Ufuk
> > >> > > >> >
> > >> > > >> > [1]
> > >> > > >> >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> > https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/kubernetes-bin/kubernetes-entry.sh
> > >> > > >> >
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Best regards / Met vriendelijke groeten,
> > >> >
> > >> > Niels Basjes
> > >> >
> > >>
> > >
> >

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Canbin Zheng <fe...@gmail.com>.
Hi, all,

Thanks for the reply, Andrey!

I have filed two new tickets tracking the problems:
1. FLINK-17033 <https://issues.apache.org/jira/browse/FLINK-17033> for
upgrading base Java Docker image, I pointed out some other problems
    the openjdk:8-jre-alpine could have in the ticket‘s description.
2. FLINK-17034 <https://issues.apache.org/jira/browse/FLINK-17034> for
suggesting executing the container CMD under TINI.

Regards,
Canbin Zheng

Andrey Zagrebin <az...@apache.org> 于2020年4月7日周二 下午4:58写道:

> Hi all,
>
> Thanks for the further feedback Niels and Canbin.
>
> @Niels
>
> I agree with Till, the comments about docker tags are valid concerns and we
> can discuss them in dedicated ML threads
> in parallel or after the general unification of Dockerfiles suggested by
> this FLIP.
>
> One thing to add about point 4. The native Kubernetes integration does not
> support a job mode at the moment.
> This is not only about the image. As I understand, even if you pack the job
> artefacts into the image, the native Kubernetes integration will start a
> session cluster.
> This will be a follow-up for the native Kubernetes integration.
> cc @Yang Wang
>
> @Canbin
>
> I think you raise valid concerns. It makes sense to create JIRA issues for
> them.
> One for the alpine image problem and one to suggest the TINI as a blocker
> for FLINK-15843 <https://issues.apache.org/jira/browse/FLINK-15843> and
> slow pod shutdown.
> We can discuss and address them in parallel or after the general
> unification of Dockerfiles suggested by this FLIP.
>
> I will start a separate voting thread for this FLIP.
>
> Cheers,
> Andrey
>
>
> On Mon, Apr 6, 2020 at 5:49 PM Canbin Zheng <fe...@gmail.com>
> wrote:
>
> > Hi, all
> >
> > Thanks a lot for this FLIP and all the fruitable discussion. I am not
> sure
> > whether the following questions are in the scope of this FLIP, but I
> still
> > expect your reply:
> >
> >    1. Which docker base image do we plan to use for Java? As far as I
> >    see, openjdk:8-jre-alpine[1] is not officially supported by the
> OpenJDK
> >    project anymore; openjdk:8-jre is larger than openjdk:8-jre-slim in
> size so
> >    that we use the latter one in our internal branch and it works fine
> so far.
> >    2. Is it possible that we execute the container CMD under *TINI*[2]
> >    instead of the shell for better hygiene? As far as I see, the
> container of
> >    the JM or TMs is running in the shell form and it could not receive
> the
> >    *TERM* signal when the pod is deleted[3]. Some of the problems are as
> >    follows:
> >       - The JM and the TMs could have no chance of cleanup, I used to
> >       create FLINK-15843[4] for tracking this problem.
> >       - The pod could take a long time(up to 40 seconds) to be deleted
> >       after the K8s API Server receives the deletion request.
> >
> >            At the moment, we use *TINI* in our internal branch for the
> > native K8s setup and it solves the problems mentioned above.
> >
> > [1]
> >
> >
> https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links
> >
> >
> https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d1377774732c33f7b8368e099
> >  [2]
> > https://github.com/krallin/tini
> >  [3]
> > https://docs.docker.com/engine/reference/commandline/kill/
> >  [4]
> > https://issues.apache.org/jira/browse/FLINK-15843
> >
> > Regards,
> > Canbin Zheng
> >
> > Till Rohrmann <tr...@apache.org> 于2020年4月6日周一 下午5:34写道:
> >
> >> Thanks for the feedback Niels. This is very helpful.
> >>
> >> 1. I agree `flink:latest` is nice to get started but in the long run
> >> people
> >> will want to pin their dependencies to a specific Flink version. I think
> >> the fix will happen as part of FLINK-15794.
> >>
> >> 2. SNAPSHOT docker images will be really helpful for developers as well
> as
> >> users who want to use the latest features. I believe that this will be a
> >> follow-up of this FLIP.
> >>
> >> 3. The goal of FLIP-111 is to create an image which allows to start a
> >> session as well as job cluster. Hence, I believe that we will solve this
> >> problem soon.
> >>
> >> 4. Same as 3. The new image will also contain the native K8s integration
> >> so
> >> that there is no need to create a special image modulo the artifacts you
> >> want to add.
> >>
> >> Additional notes:
> >>
> >> 1. I agree that one log makes it harder to separate different execution
> >> attempts or different tasks. However, on the other hand, it gives you an
> >> overall picture of what's happening in a Flink process. If things were
> >> split apart, then it might become super hard to detect problems in the
> >> runtime which affect the user code to fail or vice versa, for example.
> In
> >> general cross correlation will be harder. I guess a solution could be to
> >> make this configurable. In any case, we should move the discussion about
> >> this topic into a separate thread.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Mon, Apr 6, 2020 at 10:40 AM Niels Basjes <Ni...@basjes.nl> wrote:
> >>
> >> > Hi all,
> >> >
> >> > Sorry for jumping in at this late point of the discussion.
> >> > I see a lot of things I really like and I would like to put my "needs"
> >> and
> >> > observations here too so you take them into account (where possible).
> >> > I suspect that there will be overlap with things you already have
> taken
> >> > into account.
> >> >
> >> >    1. No more 'flink:latest' docker image tag.
> >> >    Related to https://issues.apache.org/jira/browse/FLINK-15794
> >> >    What I have learned is that the 'latest' version of a docker image
> >> only
> >> >    makes sense IFF this is an almost standalone thing.
> >> >    So if I have a servlet that does something in isolation (like my
> >> hobby
> >> >    project https://hub.docker.com/r/nielsbasjes/yauaa ) then 'latest'
> >> > makes
> >> >    sense.
> >> >    With Flink you have the application code and all nodes in the
> cluster
> >> >    that are depending on each other and as such must run the exact
> same
> >> >    versions of the base software.
> >> >    So if you run flink in a cluster (local/yarn/k8s/mesos/swarm/...)
> >> where
> >> >    the application and the nodes inter communicate and closely depend
> on
> >> > each
> >> >    other then 'latest' is a bad idea.
> >> >       1. Assume I have an application built against the Flink N api
> and
> >> the
> >> >       cluster downloads the latest which is also Flink N.
> >> >       Then a week later Flink N+1 is released and the API I use
> changes
> >> >       (Deprecated)
> >> >       and a while later Flink N+2 is released and the deprecated API
> is
> >> >       removed: Then my application no longer works even though I have
> >> > not changed
> >> >       anything.
> >> >       So I want my application to be 'pinned' to the exact version I
> >> built
> >> >       it with.
> >> >       2. I have a running cluster with my application and cluster
> >> running
> >> >       Flink N.
> >> >       I add some additional nodes and the new nodes pick up the Flink
> >> N+1
> >> >       image ... now I have a cluster with mixed versions.
> >> >       3. The version of flink is really the "Flink+Scala" version
> pair.
> >> >       If you have the right flink but the wrong scala you get really
> >> nasty
> >> >       errors: https://issues.apache.org/jira/browse/FLINK-16289
> >> >
> >> >       2. Deploy SNAPSHOT docker images (i.e. something like
> >> >    *flink:1.11-SNAPSHOT_2.12*) .
> >> >    More and more use cases will be running on the code delivered via
> >> Docker
> >> >    images instead of bare jar files.
> >> >    So if a "SNAPSHOT" is released and deployed into a 'staging' maven
> >> repo
> >> >    (which may be locally on the developers workstation) then it is my
> >> > opinion
> >> >    that at the same moment a "SNAPSHOT" docker image should be
> >> >    created/deployed.
> >> >    Each time a "SNAPSHOT" docker image is released this will overwrite
> >> the
> >> >    previous "SNAPSHOT".
> >> >    If the final version is released the SNAPSHOTs of that version
> >> >    can/should be removed.
> >> >    This will make testing in clusters a lot easier.
> >> >    Also building a local fix and then running it locally will work
> >> without
> >> >    additional modifications to the code.
> >> >
> >> >    3. Support for a 'single application cluster'
> >> >    I've been playing around with the S3 plugin and what I have found
> is
> >> >    that this essentially requires all nodes to have full access to the
> >> >    credentials needed to connect to S3.
> >> >    This essentially means that a multi-tenant setup is not possible in
> >> >    these cases.
> >> >    So I think the single application cluster should be a feature
> >> available
> >> >    in all cases.
> >> >
> >> >    4. I would like a native-kubernetes-single-application base image.
> >> >    I can then create a derived image where I only add the jar of my
> >> >    application.
> >> >    My desire is that I can then create a k8s yaml file for kubectl
> >> >    that adds the needed configs/secrets/arguments/environment
> variables
> >> and
> >> >    starts the cluster and application.
> >> >    Because the native kubernetes support makes it automatically scale
> >> based
> >> >    on the application this should 'just work'.
> >> >
> >> > Additional note:
> >> >
> >> >    1. Job/Task attempt logging instead of task manager logging.
> >> >    *I realize this has nothing to do with the docker images*
> >> >    I found something "hard to work with" while running some tests last
> >> > week.
> >> >    The logging is done to a single log for the task manager.
> >> >    So if I have multiple things running in the single task manager
> then
> >> the
> >> >    logs are mixed together.
> >> >    Also several attempts of the same task are mixed which makes it
> very
> >> >    hard to find out 'what went wrong'.
> >> >
> >> >
> >> >
> >> > On Fri, Apr 3, 2020 at 4:27 PM Ufuk Celebi <uc...@apache.org> wrote:
> >> >
> >> > > Thanks for the summary, Andrey. Good idea to link Patrick's document
> >> from
> >> > > the FLIP as a future direction so it doesn't get lost. Could you
> make
> >> > sure
> >> > > to revive that discussion when FLIP-111 nears an end?
> >> > >
> >> > > This is good to go on my part. +1 to start the VOTE.
> >> > >
> >> > >
> >> > > @Till, @Yang: Thanks for the clarification with the output
> >> redirection. I
> >> > > didn't see that. The concern with the `tee` approach is that the
> file
> >> > would
> >> > > grow indefinitely. I think we can solve this with regular logging by
> >> > > redirecting stderr to ERROR log level, but I'm not sure. We can look
> >> at a
> >> > > potential solution when we get to that point. :-)
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Apr 3, 2020 at 3:36 PM Andrey Zagrebin <
> azagrebin@apache.org>
> >> > > wrote:
> >> > >
> >> > > > Hi everyone,
> >> > > >
> >> > > > Patrick and Ufuk, thanks a lot for more ideas and suggestions!
> >> > > >
> >> > > > I have updated the FLIP according to the current state of
> >> discussion.
> >> > > > Now it also contains the implementation steps and future
> follow-ups.
> >> > > > Please, review if there are any concerns.
> >> > > > The order of the steps aims for keeping Flink releasable at any
> >> point
> >> > if
> >> > > > something does not have enough time to get in.
> >> > > >
> >> > > > It looks that we are reaching mostly a consensus for the open
> >> > questions.
> >> > > > There is also a list of items, which have been discussed in this
> >> > thread,
> >> > > > and short summary below.
> >> > > > As soon as there are no concerns, I will create a voting thread.
> >> > > >
> >> > > > I also added some thoughts for further customising logging setup.
> >> This
> >> > > may
> >> > > > be an optional follow-up
> >> > > > which is additional to the default logging into files for Web UI.
> >> > > >
> >> > > > # FLIP scope
> >> > > > The focus is users of the official releases.
> >> > > > Create docs for how to use the official docker image.
> >> > > > Remove other Dockerfiles in Flink repo.
> >> > > > Rely on running the official docker image in different modes
> >> (JM/TM).
> >> > > > Customise running the official image with env vars (This should
> >> > minimise
> >> > > > manual manipulating of local files and creation of a custom
> image).
> >> > > >
> >> > > > # Base oficial image
> >> > > >
> >> > > > ## Java versions
> >> > > > There is a separate effort for this:
> >> > > > https://github.com/apache/flink-docker/pull/9
> >> > > >
> >> > > > # Run image
> >> > > >
> >> > > > ## Entry point modes
> >> > > > JM session, JM job, TM
> >> > > >
> >> > > > ## Entry point config
> >> > > > We use env vars for this, e.g. FLINK_PROPERTIES and
> >> > > ENABLE_BUILT_IN_PLUGINS
> >> > > >
> >> > > > ## Flink config options
> >> > > > We document the existing FLINK_PROPERTIES env var to override
> config
> >> > > > options in flink-conf.yaml.
> >> > > > Then later, we do not need to expose and handle any other special
> >> env
> >> > > vars
> >> > > > for config options (address, port etc).
> >> > > > The future plan is to make Flink process configurable by env vars,
> >> e.g.
> >> > > > 'some.yaml.option: val' -> FLINK_SOME_YAML_OPTION=val
> >> > > >
> >> > > > ## Extra files: jars, custom logging properties
> >> > > > We can provide env vars to point to custom locations, e.g. in
> >> mounted
> >> > > > volumes.
> >> > > >
> >> > > > # Extend image
> >> > > >
> >> > > > ## Python/hadoop versions, activating certain libs/plugins
> >> > > > Users can install extra dependencies and change configs in their
> >> custom
> >> > > > image which extends our base image.
> >> > > >
> >> > > > # Logging
> >> > > >
> >> > > > ## Web UI
> >> > > > Modify the *log4j-console.properties* to also output logs into the
> >> > files
> >> > > > for WebUI. Limit log file size.
> >> > > >
> >> > > > ## Container output
> >> > > > Separate effort for proper split of Flink process stdout and
> stderr
> >> > into
> >> > > > files and container output
> >> > > > (idea with tee command: `program start-foreground &2>1 | tee
> >> > > > flink-user-taskexecutor.out`)
> >> > > >
> >> > > > # Docker bash utils
> >> > > > We are not going to expose it to users as an API.
> >> > > > They should be able either to configure and run the standard entry
> >> > point
> >> > > > or the documentation should give short examples about how to
> extend
> >> and
> >> > > > customise the base image.
> >> > > >
> >> > > > During the implementation, we will see if it makes sense to factor
> >> out
> >> > > > certain bash procedures
> >> > > > to reuse them e.g. in custom dev versions of docker image.
> >> > > >
> >> > > > # Dockerfile / image for developers
> >> > > > We keep it on our future roadmap. This effort should help to
> >> understand
> >> > > > what we can reuse there.
> >> > > >
> >> > > > Best,
> >> > > > Andrey
> >> > > >
> >> > > >
> >> > > > On Fri, Apr 3, 2020 at 12:57 PM Till Rohrmann <
> trohrmann@apache.org
> >> >
> >> > > > wrote:
> >> > > >
> >> > > >> Hi everyone,
> >> > > >>
> >> > > >> just a small inline comment.
> >> > > >>
> >> > > >> On Fri, Apr 3, 2020 at 11:42 AM Ufuk Celebi <uc...@apache.org>
> >> wrote:
> >> > > >>
> >> > > >> > Hey Yang,
> >> > > >> >
> >> > > >> > thanks! See inline answers.
> >> > > >> >
> >> > > >> > On Fri, Apr 3, 2020 at 5:11 AM Yang Wang <
> danrtsey.wy@gmail.com>
> >> > > wrote:
> >> > > >> >
> >> > > >> > > Hi Ufuk,
> >> > > >> > >
> >> > > >> > > Thanks for make the conclusion and directly point out what
> >> need to
> >> > > be
> >> > > >> > done
> >> > > >> > > in
> >> > > >> > > FLIP-111. I agree with you that we should narrow down the
> scope
> >> > and
> >> > > >> focus
> >> > > >> > > the
> >> > > >> > > most important and basic part about docker image unification.
> >> > > >> > >
> >> > > >> > > (1) Extend the entrypoint script in apache/flink-docker to
> >> start
> >> > the
> >> > > >> job
> >> > > >> > >> cluster entry point
> >> > > >> > >
> >> > > >> > > I want to add a small requirement for the entry point script.
> >> > > >> Currently,
> >> > > >> > > for the native
> >> > > >> > > K8s integration, we are using the apache/flink-docker image,
> >> but
> >> > > with
> >> > > >> > > different entry
> >> > > >> > > point("kubernetes-entry.sh"). Generate the java cmd in
> >> > > KubernetesUtils
> >> > > >> > and
> >> > > >> > > run it
> >> > > >> > > in the entry point. I really hope it could merge to
> >> > > >> apache/flink-docker
> >> > > >> > > "docker-entrypoint.sh".
> >> > > >> > >
> >> > > >> >
> >> > > >> > The script [1] only adds the FLINK_CLASSPATH env var which
> seems
> >> > > >> generally
> >> > > >> > reasonable to me. But since principled classpath and entrypoint
> >> > > >> > configuration is somewhat related to the follow-up improvement
> >> > > >> proposals, I
> >> > > >> > could also see this being done after FLIP-111.
> >> > > >> >
> >> > > >> >
> >> > > >> > > (2) Extend the example log4j-console configuration
> >> > > >> > >> => support log retrieval from the Flink UI out of the box
> >> > > >> > >
> >> > > >> > > If you mean to update the
> >> > "flink-dist/conf/log4j-console.properties"
> >> > > >> to
> >> > > >> > > support console and
> >> > > >> > > local log files. I will say "+1". But we need to find a
> proper
> >> way
> >> > > to
> >> > > >> > make
> >> > > >> > > stdout/stderr output
> >> > > >> > > both available for console and log files. Maybe till's
> proposal
> >> > > could
> >> > > >> > help
> >> > > >> > > to solve this.
> >> > > >> > > "`program &2>1 | tee flink-user-taskexecutor.out`"
> >> > > >> > >
> >> > > >> >
> >> > > >> > I think we can simply add a rolling file appender with a limit
> on
> >> > the
> >> > > >> log
> >> > > >> > size.
> >> > > >> >
> >> > > >> > I think this won't solve Yang's concern. What he wants to
> >> achieve is
> >> > > >> that
> >> > > >> STDOUT and STDERR go to STDOUT and STDERR as well as into some
> >> *.out
> >> > and
> >> > > >> *.err file which are accessible from the web ui. I don't think
> that
> >> > log
> >> > > >> appender will help with this problem.
> >> > > >>
> >> > > >> Cheers,
> >> > > >> Till
> >> > > >>
> >> > > >>
> >> > > >> > – Ufuk
> >> > > >> >
> >> > > >> > [1]
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/kubernetes-bin/kubernetes-entry.sh
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Best regards / Met vriendelijke groeten,
> >> >
> >> > Niels Basjes
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Andrey Zagrebin <az...@apache.org>.
Hi all,

Thanks for the further feedback Niels and Canbin.

@Niels

I agree with Till, the comments about docker tags are valid concerns and we
can discuss them in dedicated ML threads
in parallel or after the general unification of Dockerfiles suggested by
this FLIP.

One thing to add about point 4. The native Kubernetes integration does not
support a job mode at the moment.
This is not only about the image. As I understand, even if you pack the job
artefacts into the image, the native Kubernetes integration will start a
session cluster.
This will be a follow-up for the native Kubernetes integration.
cc @Yang Wang

@Canbin

I think you raise valid concerns. It makes sense to create JIRA issues for
them.
One for the alpine image problem and one to suggest the TINI as a blocker
for FLINK-15843 <https://issues.apache.org/jira/browse/FLINK-15843> and
slow pod shutdown.
We can discuss and address them in parallel or after the general
unification of Dockerfiles suggested by this FLIP.

I will start a separate voting thread for this FLIP.

Cheers,
Andrey


On Mon, Apr 6, 2020 at 5:49 PM Canbin Zheng <fe...@gmail.com> wrote:

> Hi, all
>
> Thanks a lot for this FLIP and all the fruitable discussion. I am not sure
> whether the following questions are in the scope of this FLIP, but I still
> expect your reply:
>
>    1. Which docker base image do we plan to use for Java? As far as I
>    see, openjdk:8-jre-alpine[1] is not officially supported by the OpenJDK
>    project anymore; openjdk:8-jre is larger than openjdk:8-jre-slim in size so
>    that we use the latter one in our internal branch and it works fine so far.
>    2. Is it possible that we execute the container CMD under *TINI*[2]
>    instead of the shell for better hygiene? As far as I see, the container of
>    the JM or TMs is running in the shell form and it could not receive the
>    *TERM* signal when the pod is deleted[3]. Some of the problems are as
>    follows:
>       - The JM and the TMs could have no chance of cleanup, I used to
>       create FLINK-15843[4] for tracking this problem.
>       - The pod could take a long time(up to 40 seconds) to be deleted
>       after the K8s API Server receives the deletion request.
>
>            At the moment, we use *TINI* in our internal branch for the
> native K8s setup and it solves the problems mentioned above.
>
> [1]
>
> https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links
>
> https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d1377774732c33f7b8368e099
>  [2]
> https://github.com/krallin/tini
>  [3]
> https://docs.docker.com/engine/reference/commandline/kill/
>  [4]
> https://issues.apache.org/jira/browse/FLINK-15843
>
> Regards,
> Canbin Zheng
>
> Till Rohrmann <tr...@apache.org> 于2020年4月6日周一 下午5:34写道:
>
>> Thanks for the feedback Niels. This is very helpful.
>>
>> 1. I agree `flink:latest` is nice to get started but in the long run
>> people
>> will want to pin their dependencies to a specific Flink version. I think
>> the fix will happen as part of FLINK-15794.
>>
>> 2. SNAPSHOT docker images will be really helpful for developers as well as
>> users who want to use the latest features. I believe that this will be a
>> follow-up of this FLIP.
>>
>> 3. The goal of FLIP-111 is to create an image which allows to start a
>> session as well as job cluster. Hence, I believe that we will solve this
>> problem soon.
>>
>> 4. Same as 3. The new image will also contain the native K8s integration
>> so
>> that there is no need to create a special image modulo the artifacts you
>> want to add.
>>
>> Additional notes:
>>
>> 1. I agree that one log makes it harder to separate different execution
>> attempts or different tasks. However, on the other hand, it gives you an
>> overall picture of what's happening in a Flink process. If things were
>> split apart, then it might become super hard to detect problems in the
>> runtime which affect the user code to fail or vice versa, for example. In
>> general cross correlation will be harder. I guess a solution could be to
>> make this configurable. In any case, we should move the discussion about
>> this topic into a separate thread.
>>
>> Cheers,
>> Till
>>
>> On Mon, Apr 6, 2020 at 10:40 AM Niels Basjes <Ni...@basjes.nl> wrote:
>>
>> > Hi all,
>> >
>> > Sorry for jumping in at this late point of the discussion.
>> > I see a lot of things I really like and I would like to put my "needs"
>> and
>> > observations here too so you take them into account (where possible).
>> > I suspect that there will be overlap with things you already have taken
>> > into account.
>> >
>> >    1. No more 'flink:latest' docker image tag.
>> >    Related to https://issues.apache.org/jira/browse/FLINK-15794
>> >    What I have learned is that the 'latest' version of a docker image
>> only
>> >    makes sense IFF this is an almost standalone thing.
>> >    So if I have a servlet that does something in isolation (like my
>> hobby
>> >    project https://hub.docker.com/r/nielsbasjes/yauaa ) then 'latest'
>> > makes
>> >    sense.
>> >    With Flink you have the application code and all nodes in the cluster
>> >    that are depending on each other and as such must run the exact same
>> >    versions of the base software.
>> >    So if you run flink in a cluster (local/yarn/k8s/mesos/swarm/...)
>> where
>> >    the application and the nodes inter communicate and closely depend on
>> > each
>> >    other then 'latest' is a bad idea.
>> >       1. Assume I have an application built against the Flink N api and
>> the
>> >       cluster downloads the latest which is also Flink N.
>> >       Then a week later Flink N+1 is released and the API I use changes
>> >       (Deprecated)
>> >       and a while later Flink N+2 is released and the deprecated API is
>> >       removed: Then my application no longer works even though I have
>> > not changed
>> >       anything.
>> >       So I want my application to be 'pinned' to the exact version I
>> built
>> >       it with.
>> >       2. I have a running cluster with my application and cluster
>> running
>> >       Flink N.
>> >       I add some additional nodes and the new nodes pick up the Flink
>> N+1
>> >       image ... now I have a cluster with mixed versions.
>> >       3. The version of flink is really the "Flink+Scala" version pair.
>> >       If you have the right flink but the wrong scala you get really
>> nasty
>> >       errors: https://issues.apache.org/jira/browse/FLINK-16289
>> >
>> >       2. Deploy SNAPSHOT docker images (i.e. something like
>> >    *flink:1.11-SNAPSHOT_2.12*) .
>> >    More and more use cases will be running on the code delivered via
>> Docker
>> >    images instead of bare jar files.
>> >    So if a "SNAPSHOT" is released and deployed into a 'staging' maven
>> repo
>> >    (which may be locally on the developers workstation) then it is my
>> > opinion
>> >    that at the same moment a "SNAPSHOT" docker image should be
>> >    created/deployed.
>> >    Each time a "SNAPSHOT" docker image is released this will overwrite
>> the
>> >    previous "SNAPSHOT".
>> >    If the final version is released the SNAPSHOTs of that version
>> >    can/should be removed.
>> >    This will make testing in clusters a lot easier.
>> >    Also building a local fix and then running it locally will work
>> without
>> >    additional modifications to the code.
>> >
>> >    3. Support for a 'single application cluster'
>> >    I've been playing around with the S3 plugin and what I have found is
>> >    that this essentially requires all nodes to have full access to the
>> >    credentials needed to connect to S3.
>> >    This essentially means that a multi-tenant setup is not possible in
>> >    these cases.
>> >    So I think the single application cluster should be a feature
>> available
>> >    in all cases.
>> >
>> >    4. I would like a native-kubernetes-single-application base image.
>> >    I can then create a derived image where I only add the jar of my
>> >    application.
>> >    My desire is that I can then create a k8s yaml file for kubectl
>> >    that adds the needed configs/secrets/arguments/environment variables
>> and
>> >    starts the cluster and application.
>> >    Because the native kubernetes support makes it automatically scale
>> based
>> >    on the application this should 'just work'.
>> >
>> > Additional note:
>> >
>> >    1. Job/Task attempt logging instead of task manager logging.
>> >    *I realize this has nothing to do with the docker images*
>> >    I found something "hard to work with" while running some tests last
>> > week.
>> >    The logging is done to a single log for the task manager.
>> >    So if I have multiple things running in the single task manager then
>> the
>> >    logs are mixed together.
>> >    Also several attempts of the same task are mixed which makes it very
>> >    hard to find out 'what went wrong'.
>> >
>> >
>> >
>> > On Fri, Apr 3, 2020 at 4:27 PM Ufuk Celebi <uc...@apache.org> wrote:
>> >
>> > > Thanks for the summary, Andrey. Good idea to link Patrick's document
>> from
>> > > the FLIP as a future direction so it doesn't get lost. Could you make
>> > sure
>> > > to revive that discussion when FLIP-111 nears an end?
>> > >
>> > > This is good to go on my part. +1 to start the VOTE.
>> > >
>> > >
>> > > @Till, @Yang: Thanks for the clarification with the output
>> redirection. I
>> > > didn't see that. The concern with the `tee` approach is that the file
>> > would
>> > > grow indefinitely. I think we can solve this with regular logging by
>> > > redirecting stderr to ERROR log level, but I'm not sure. We can look
>> at a
>> > > potential solution when we get to that point. :-)
>> > >
>> > >
>> > >
>> > > On Fri, Apr 3, 2020 at 3:36 PM Andrey Zagrebin <az...@apache.org>
>> > > wrote:
>> > >
>> > > > Hi everyone,
>> > > >
>> > > > Patrick and Ufuk, thanks a lot for more ideas and suggestions!
>> > > >
>> > > > I have updated the FLIP according to the current state of
>> discussion.
>> > > > Now it also contains the implementation steps and future follow-ups.
>> > > > Please, review if there are any concerns.
>> > > > The order of the steps aims for keeping Flink releasable at any
>> point
>> > if
>> > > > something does not have enough time to get in.
>> > > >
>> > > > It looks that we are reaching mostly a consensus for the open
>> > questions.
>> > > > There is also a list of items, which have been discussed in this
>> > thread,
>> > > > and short summary below.
>> > > > As soon as there are no concerns, I will create a voting thread.
>> > > >
>> > > > I also added some thoughts for further customising logging setup.
>> This
>> > > may
>> > > > be an optional follow-up
>> > > > which is additional to the default logging into files for Web UI.
>> > > >
>> > > > # FLIP scope
>> > > > The focus is users of the official releases.
>> > > > Create docs for how to use the official docker image.
>> > > > Remove other Dockerfiles in Flink repo.
>> > > > Rely on running the official docker image in different modes
>> (JM/TM).
>> > > > Customise running the official image with env vars (This should
>> > minimise
>> > > > manual manipulating of local files and creation of a custom image).
>> > > >
>> > > > # Base oficial image
>> > > >
>> > > > ## Java versions
>> > > > There is a separate effort for this:
>> > > > https://github.com/apache/flink-docker/pull/9
>> > > >
>> > > > # Run image
>> > > >
>> > > > ## Entry point modes
>> > > > JM session, JM job, TM
>> > > >
>> > > > ## Entry point config
>> > > > We use env vars for this, e.g. FLINK_PROPERTIES and
>> > > ENABLE_BUILT_IN_PLUGINS
>> > > >
>> > > > ## Flink config options
>> > > > We document the existing FLINK_PROPERTIES env var to override config
>> > > > options in flink-conf.yaml.
>> > > > Then later, we do not need to expose and handle any other special
>> env
>> > > vars
>> > > > for config options (address, port etc).
>> > > > The future plan is to make Flink process configurable by env vars,
>> e.g.
>> > > > 'some.yaml.option: val' -> FLINK_SOME_YAML_OPTION=val
>> > > >
>> > > > ## Extra files: jars, custom logging properties
>> > > > We can provide env vars to point to custom locations, e.g. in
>> mounted
>> > > > volumes.
>> > > >
>> > > > # Extend image
>> > > >
>> > > > ## Python/hadoop versions, activating certain libs/plugins
>> > > > Users can install extra dependencies and change configs in their
>> custom
>> > > > image which extends our base image.
>> > > >
>> > > > # Logging
>> > > >
>> > > > ## Web UI
>> > > > Modify the *log4j-console.properties* to also output logs into the
>> > files
>> > > > for WebUI. Limit log file size.
>> > > >
>> > > > ## Container output
>> > > > Separate effort for proper split of Flink process stdout and stderr
>> > into
>> > > > files and container output
>> > > > (idea with tee command: `program start-foreground &2>1 | tee
>> > > > flink-user-taskexecutor.out`)
>> > > >
>> > > > # Docker bash utils
>> > > > We are not going to expose it to users as an API.
>> > > > They should be able either to configure and run the standard entry
>> > point
>> > > > or the documentation should give short examples about how to extend
>> and
>> > > > customise the base image.
>> > > >
>> > > > During the implementation, we will see if it makes sense to factor
>> out
>> > > > certain bash procedures
>> > > > to reuse them e.g. in custom dev versions of docker image.
>> > > >
>> > > > # Dockerfile / image for developers
>> > > > We keep it on our future roadmap. This effort should help to
>> understand
>> > > > what we can reuse there.
>> > > >
>> > > > Best,
>> > > > Andrey
>> > > >
>> > > >
>> > > > On Fri, Apr 3, 2020 at 12:57 PM Till Rohrmann <trohrmann@apache.org
>> >
>> > > > wrote:
>> > > >
>> > > >> Hi everyone,
>> > > >>
>> > > >> just a small inline comment.
>> > > >>
>> > > >> On Fri, Apr 3, 2020 at 11:42 AM Ufuk Celebi <uc...@apache.org>
>> wrote:
>> > > >>
>> > > >> > Hey Yang,
>> > > >> >
>> > > >> > thanks! See inline answers.
>> > > >> >
>> > > >> > On Fri, Apr 3, 2020 at 5:11 AM Yang Wang <da...@gmail.com>
>> > > wrote:
>> > > >> >
>> > > >> > > Hi Ufuk,
>> > > >> > >
>> > > >> > > Thanks for make the conclusion and directly point out what
>> need to
>> > > be
>> > > >> > done
>> > > >> > > in
>> > > >> > > FLIP-111. I agree with you that we should narrow down the scope
>> > and
>> > > >> focus
>> > > >> > > the
>> > > >> > > most important and basic part about docker image unification.
>> > > >> > >
>> > > >> > > (1) Extend the entrypoint script in apache/flink-docker to
>> start
>> > the
>> > > >> job
>> > > >> > >> cluster entry point
>> > > >> > >
>> > > >> > > I want to add a small requirement for the entry point script.
>> > > >> Currently,
>> > > >> > > for the native
>> > > >> > > K8s integration, we are using the apache/flink-docker image,
>> but
>> > > with
>> > > >> > > different entry
>> > > >> > > point("kubernetes-entry.sh"). Generate the java cmd in
>> > > KubernetesUtils
>> > > >> > and
>> > > >> > > run it
>> > > >> > > in the entry point. I really hope it could merge to
>> > > >> apache/flink-docker
>> > > >> > > "docker-entrypoint.sh".
>> > > >> > >
>> > > >> >
>> > > >> > The script [1] only adds the FLINK_CLASSPATH env var which seems
>> > > >> generally
>> > > >> > reasonable to me. But since principled classpath and entrypoint
>> > > >> > configuration is somewhat related to the follow-up improvement
>> > > >> proposals, I
>> > > >> > could also see this being done after FLIP-111.
>> > > >> >
>> > > >> >
>> > > >> > > (2) Extend the example log4j-console configuration
>> > > >> > >> => support log retrieval from the Flink UI out of the box
>> > > >> > >
>> > > >> > > If you mean to update the
>> > "flink-dist/conf/log4j-console.properties"
>> > > >> to
>> > > >> > > support console and
>> > > >> > > local log files. I will say "+1". But we need to find a proper
>> way
>> > > to
>> > > >> > make
>> > > >> > > stdout/stderr output
>> > > >> > > both available for console and log files. Maybe till's proposal
>> > > could
>> > > >> > help
>> > > >> > > to solve this.
>> > > >> > > "`program &2>1 | tee flink-user-taskexecutor.out`"
>> > > >> > >
>> > > >> >
>> > > >> > I think we can simply add a rolling file appender with a limit on
>> > the
>> > > >> log
>> > > >> > size.
>> > > >> >
>> > > >> > I think this won't solve Yang's concern. What he wants to
>> achieve is
>> > > >> that
>> > > >> STDOUT and STDERR go to STDOUT and STDERR as well as into some
>> *.out
>> > and
>> > > >> *.err file which are accessible from the web ui. I don't think that
>> > log
>> > > >> appender will help with this problem.
>> > > >>
>> > > >> Cheers,
>> > > >> Till
>> > > >>
>> > > >>
>> > > >> > – Ufuk
>> > > >> >
>> > > >> > [1]
>> > > >> >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/kubernetes-bin/kubernetes-entry.sh
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>> >
>> > --
>> > Best regards / Met vriendelijke groeten,
>> >
>> > Niels Basjes
>> >
>>
>

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Canbin Zheng <fe...@gmail.com>.
Hi, all

Thanks a lot for this FLIP and all the fruitable discussion. I am not sure
whether the following questions are in the scope of this FLIP, but I still
expect your reply:

   1. Which docker base image do we plan to use for Java? As far as I see,
   openjdk:8-jre-alpine[1] is not officially supported by the OpenJDK project
   anymore; openjdk:8-jre is larger than openjdk:8-jre-slim in size so that we
   use the latter one in our internal branch and it works fine so far.
   2. Is it possible that we execute the container CMD under *TINI*[2]
   instead of the shell for better hygiene? As far as I see, the container of
   the JM or TMs is running in the shell form and it could not receive the
   *TERM* signal when the pod is deleted[3]. Some of the problems are as
   follows:
      - The JM and the TMs could have no chance of cleanup, I used to
      create FLINK-15843[4] for tracking this problem.
      - The pod could take a long time(up to 40 seconds) to be deleted
      after the K8s API Server receives the deletion request.

           At the moment, we use *TINI* in our internal branch for the
native K8s setup and it solves the problems mentioned above.

[1]
https://github.com/docker-library/docs/blob/master/openjdk/README.md#supported-tags-and-respective-dockerfile-links

https://github.com/docker-library/openjdk/commit/3eb0351b208d739fac35345c85e3c6237c2114ec#diff-f95ffa3d1377774732c33f7b8368e099
 [2]
https://github.com/krallin/tini
 [3]
https://docs.docker.com/engine/reference/commandline/kill/
 [4]
https://issues.apache.org/jira/browse/FLINK-15843

Regards,
Canbin Zheng

Till Rohrmann <tr...@apache.org> 于2020年4月6日周一 下午5:34写道:

> Thanks for the feedback Niels. This is very helpful.
>
> 1. I agree `flink:latest` is nice to get started but in the long run people
> will want to pin their dependencies to a specific Flink version. I think
> the fix will happen as part of FLINK-15794.
>
> 2. SNAPSHOT docker images will be really helpful for developers as well as
> users who want to use the latest features. I believe that this will be a
> follow-up of this FLIP.
>
> 3. The goal of FLIP-111 is to create an image which allows to start a
> session as well as job cluster. Hence, I believe that we will solve this
> problem soon.
>
> 4. Same as 3. The new image will also contain the native K8s integration so
> that there is no need to create a special image modulo the artifacts you
> want to add.
>
> Additional notes:
>
> 1. I agree that one log makes it harder to separate different execution
> attempts or different tasks. However, on the other hand, it gives you an
> overall picture of what's happening in a Flink process. If things were
> split apart, then it might become super hard to detect problems in the
> runtime which affect the user code to fail or vice versa, for example. In
> general cross correlation will be harder. I guess a solution could be to
> make this configurable. In any case, we should move the discussion about
> this topic into a separate thread.
>
> Cheers,
> Till
>
> On Mon, Apr 6, 2020 at 10:40 AM Niels Basjes <Ni...@basjes.nl> wrote:
>
> > Hi all,
> >
> > Sorry for jumping in at this late point of the discussion.
> > I see a lot of things I really like and I would like to put my "needs"
> and
> > observations here too so you take them into account (where possible).
> > I suspect that there will be overlap with things you already have taken
> > into account.
> >
> >    1. No more 'flink:latest' docker image tag.
> >    Related to https://issues.apache.org/jira/browse/FLINK-15794
> >    What I have learned is that the 'latest' version of a docker image
> only
> >    makes sense IFF this is an almost standalone thing.
> >    So if I have a servlet that does something in isolation (like my hobby
> >    project https://hub.docker.com/r/nielsbasjes/yauaa ) then 'latest'
> > makes
> >    sense.
> >    With Flink you have the application code and all nodes in the cluster
> >    that are depending on each other and as such must run the exact same
> >    versions of the base software.
> >    So if you run flink in a cluster (local/yarn/k8s/mesos/swarm/...)
> where
> >    the application and the nodes inter communicate and closely depend on
> > each
> >    other then 'latest' is a bad idea.
> >       1. Assume I have an application built against the Flink N api and
> the
> >       cluster downloads the latest which is also Flink N.
> >       Then a week later Flink N+1 is released and the API I use changes
> >       (Deprecated)
> >       and a while later Flink N+2 is released and the deprecated API is
> >       removed: Then my application no longer works even though I have
> > not changed
> >       anything.
> >       So I want my application to be 'pinned' to the exact version I
> built
> >       it with.
> >       2. I have a running cluster with my application and cluster running
> >       Flink N.
> >       I add some additional nodes and the new nodes pick up the Flink N+1
> >       image ... now I have a cluster with mixed versions.
> >       3. The version of flink is really the "Flink+Scala" version pair.
> >       If you have the right flink but the wrong scala you get really
> nasty
> >       errors: https://issues.apache.org/jira/browse/FLINK-16289
> >
> >       2. Deploy SNAPSHOT docker images (i.e. something like
> >    *flink:1.11-SNAPSHOT_2.12*) .
> >    More and more use cases will be running on the code delivered via
> Docker
> >    images instead of bare jar files.
> >    So if a "SNAPSHOT" is released and deployed into a 'staging' maven
> repo
> >    (which may be locally on the developers workstation) then it is my
> > opinion
> >    that at the same moment a "SNAPSHOT" docker image should be
> >    created/deployed.
> >    Each time a "SNAPSHOT" docker image is released this will overwrite
> the
> >    previous "SNAPSHOT".
> >    If the final version is released the SNAPSHOTs of that version
> >    can/should be removed.
> >    This will make testing in clusters a lot easier.
> >    Also building a local fix and then running it locally will work
> without
> >    additional modifications to the code.
> >
> >    3. Support for a 'single application cluster'
> >    I've been playing around with the S3 plugin and what I have found is
> >    that this essentially requires all nodes to have full access to the
> >    credentials needed to connect to S3.
> >    This essentially means that a multi-tenant setup is not possible in
> >    these cases.
> >    So I think the single application cluster should be a feature
> available
> >    in all cases.
> >
> >    4. I would like a native-kubernetes-single-application base image.
> >    I can then create a derived image where I only add the jar of my
> >    application.
> >    My desire is that I can then create a k8s yaml file for kubectl
> >    that adds the needed configs/secrets/arguments/environment variables
> and
> >    starts the cluster and application.
> >    Because the native kubernetes support makes it automatically scale
> based
> >    on the application this should 'just work'.
> >
> > Additional note:
> >
> >    1. Job/Task attempt logging instead of task manager logging.
> >    *I realize this has nothing to do with the docker images*
> >    I found something "hard to work with" while running some tests last
> > week.
> >    The logging is done to a single log for the task manager.
> >    So if I have multiple things running in the single task manager then
> the
> >    logs are mixed together.
> >    Also several attempts of the same task are mixed which makes it very
> >    hard to find out 'what went wrong'.
> >
> >
> >
> > On Fri, Apr 3, 2020 at 4:27 PM Ufuk Celebi <uc...@apache.org> wrote:
> >
> > > Thanks for the summary, Andrey. Good idea to link Patrick's document
> from
> > > the FLIP as a future direction so it doesn't get lost. Could you make
> > sure
> > > to revive that discussion when FLIP-111 nears an end?
> > >
> > > This is good to go on my part. +1 to start the VOTE.
> > >
> > >
> > > @Till, @Yang: Thanks for the clarification with the output
> redirection. I
> > > didn't see that. The concern with the `tee` approach is that the file
> > would
> > > grow indefinitely. I think we can solve this with regular logging by
> > > redirecting stderr to ERROR log level, but I'm not sure. We can look
> at a
> > > potential solution when we get to that point. :-)
> > >
> > >
> > >
> > > On Fri, Apr 3, 2020 at 3:36 PM Andrey Zagrebin <az...@apache.org>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Patrick and Ufuk, thanks a lot for more ideas and suggestions!
> > > >
> > > > I have updated the FLIP according to the current state of discussion.
> > > > Now it also contains the implementation steps and future follow-ups.
> > > > Please, review if there are any concerns.
> > > > The order of the steps aims for keeping Flink releasable at any point
> > if
> > > > something does not have enough time to get in.
> > > >
> > > > It looks that we are reaching mostly a consensus for the open
> > questions.
> > > > There is also a list of items, which have been discussed in this
> > thread,
> > > > and short summary below.
> > > > As soon as there are no concerns, I will create a voting thread.
> > > >
> > > > I also added some thoughts for further customising logging setup.
> This
> > > may
> > > > be an optional follow-up
> > > > which is additional to the default logging into files for Web UI.
> > > >
> > > > # FLIP scope
> > > > The focus is users of the official releases.
> > > > Create docs for how to use the official docker image.
> > > > Remove other Dockerfiles in Flink repo.
> > > > Rely on running the official docker image in different modes (JM/TM).
> > > > Customise running the official image with env vars (This should
> > minimise
> > > > manual manipulating of local files and creation of a custom image).
> > > >
> > > > # Base oficial image
> > > >
> > > > ## Java versions
> > > > There is a separate effort for this:
> > > > https://github.com/apache/flink-docker/pull/9
> > > >
> > > > # Run image
> > > >
> > > > ## Entry point modes
> > > > JM session, JM job, TM
> > > >
> > > > ## Entry point config
> > > > We use env vars for this, e.g. FLINK_PROPERTIES and
> > > ENABLE_BUILT_IN_PLUGINS
> > > >
> > > > ## Flink config options
> > > > We document the existing FLINK_PROPERTIES env var to override config
> > > > options in flink-conf.yaml.
> > > > Then later, we do not need to expose and handle any other special env
> > > vars
> > > > for config options (address, port etc).
> > > > The future plan is to make Flink process configurable by env vars,
> e.g.
> > > > 'some.yaml.option: val' -> FLINK_SOME_YAML_OPTION=val
> > > >
> > > > ## Extra files: jars, custom logging properties
> > > > We can provide env vars to point to custom locations, e.g. in mounted
> > > > volumes.
> > > >
> > > > # Extend image
> > > >
> > > > ## Python/hadoop versions, activating certain libs/plugins
> > > > Users can install extra dependencies and change configs in their
> custom
> > > > image which extends our base image.
> > > >
> > > > # Logging
> > > >
> > > > ## Web UI
> > > > Modify the *log4j-console.properties* to also output logs into the
> > files
> > > > for WebUI. Limit log file size.
> > > >
> > > > ## Container output
> > > > Separate effort for proper split of Flink process stdout and stderr
> > into
> > > > files and container output
> > > > (idea with tee command: `program start-foreground &2>1 | tee
> > > > flink-user-taskexecutor.out`)
> > > >
> > > > # Docker bash utils
> > > > We are not going to expose it to users as an API.
> > > > They should be able either to configure and run the standard entry
> > point
> > > > or the documentation should give short examples about how to extend
> and
> > > > customise the base image.
> > > >
> > > > During the implementation, we will see if it makes sense to factor
> out
> > > > certain bash procedures
> > > > to reuse them e.g. in custom dev versions of docker image.
> > > >
> > > > # Dockerfile / image for developers
> > > > We keep it on our future roadmap. This effort should help to
> understand
> > > > what we can reuse there.
> > > >
> > > > Best,
> > > > Andrey
> > > >
> > > >
> > > > On Fri, Apr 3, 2020 at 12:57 PM Till Rohrmann <tr...@apache.org>
> > > > wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> just a small inline comment.
> > > >>
> > > >> On Fri, Apr 3, 2020 at 11:42 AM Ufuk Celebi <uc...@apache.org> wrote:
> > > >>
> > > >> > Hey Yang,
> > > >> >
> > > >> > thanks! See inline answers.
> > > >> >
> > > >> > On Fri, Apr 3, 2020 at 5:11 AM Yang Wang <da...@gmail.com>
> > > wrote:
> > > >> >
> > > >> > > Hi Ufuk,
> > > >> > >
> > > >> > > Thanks for make the conclusion and directly point out what need
> to
> > > be
> > > >> > done
> > > >> > > in
> > > >> > > FLIP-111. I agree with you that we should narrow down the scope
> > and
> > > >> focus
> > > >> > > the
> > > >> > > most important and basic part about docker image unification.
> > > >> > >
> > > >> > > (1) Extend the entrypoint script in apache/flink-docker to start
> > the
> > > >> job
> > > >> > >> cluster entry point
> > > >> > >
> > > >> > > I want to add a small requirement for the entry point script.
> > > >> Currently,
> > > >> > > for the native
> > > >> > > K8s integration, we are using the apache/flink-docker image, but
> > > with
> > > >> > > different entry
> > > >> > > point("kubernetes-entry.sh"). Generate the java cmd in
> > > KubernetesUtils
> > > >> > and
> > > >> > > run it
> > > >> > > in the entry point. I really hope it could merge to
> > > >> apache/flink-docker
> > > >> > > "docker-entrypoint.sh".
> > > >> > >
> > > >> >
> > > >> > The script [1] only adds the FLINK_CLASSPATH env var which seems
> > > >> generally
> > > >> > reasonable to me. But since principled classpath and entrypoint
> > > >> > configuration is somewhat related to the follow-up improvement
> > > >> proposals, I
> > > >> > could also see this being done after FLIP-111.
> > > >> >
> > > >> >
> > > >> > > (2) Extend the example log4j-console configuration
> > > >> > >> => support log retrieval from the Flink UI out of the box
> > > >> > >
> > > >> > > If you mean to update the
> > "flink-dist/conf/log4j-console.properties"
> > > >> to
> > > >> > > support console and
> > > >> > > local log files. I will say "+1". But we need to find a proper
> way
> > > to
> > > >> > make
> > > >> > > stdout/stderr output
> > > >> > > both available for console and log files. Maybe till's proposal
> > > could
> > > >> > help
> > > >> > > to solve this.
> > > >> > > "`program &2>1 | tee flink-user-taskexecutor.out`"
> > > >> > >
> > > >> >
> > > >> > I think we can simply add a rolling file appender with a limit on
> > the
> > > >> log
> > > >> > size.
> > > >> >
> > > >> > I think this won't solve Yang's concern. What he wants to achieve
> is
> > > >> that
> > > >> STDOUT and STDERR go to STDOUT and STDERR as well as into some *.out
> > and
> > > >> *.err file which are accessible from the web ui. I don't think that
> > log
> > > >> appender will help with this problem.
> > > >>
> > > >> Cheers,
> > > >> Till
> > > >>
> > > >>
> > > >> > – Ufuk
> > > >> >
> > > >> > [1]
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/kubernetes-bin/kubernetes-entry.sh
> > > >> >
> > > >>
> > > >
> > >
> >
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
> >
>

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Till Rohrmann <tr...@apache.org>.
Thanks for the feedback Niels. This is very helpful.

1. I agree `flink:latest` is nice to get started but in the long run people
will want to pin their dependencies to a specific Flink version. I think
the fix will happen as part of FLINK-15794.

2. SNAPSHOT docker images will be really helpful for developers as well as
users who want to use the latest features. I believe that this will be a
follow-up of this FLIP.

3. The goal of FLIP-111 is to create an image which allows to start a
session as well as job cluster. Hence, I believe that we will solve this
problem soon.

4. Same as 3. The new image will also contain the native K8s integration so
that there is no need to create a special image modulo the artifacts you
want to add.

Additional notes:

1. I agree that one log makes it harder to separate different execution
attempts or different tasks. However, on the other hand, it gives you an
overall picture of what's happening in a Flink process. If things were
split apart, then it might become super hard to detect problems in the
runtime which affect the user code to fail or vice versa, for example. In
general cross correlation will be harder. I guess a solution could be to
make this configurable. In any case, we should move the discussion about
this topic into a separate thread.

Cheers,
Till

On Mon, Apr 6, 2020 at 10:40 AM Niels Basjes <Ni...@basjes.nl> wrote:

> Hi all,
>
> Sorry for jumping in at this late point of the discussion.
> I see a lot of things I really like and I would like to put my "needs" and
> observations here too so you take them into account (where possible).
> I suspect that there will be overlap with things you already have taken
> into account.
>
>    1. No more 'flink:latest' docker image tag.
>    Related to https://issues.apache.org/jira/browse/FLINK-15794
>    What I have learned is that the 'latest' version of a docker image only
>    makes sense IFF this is an almost standalone thing.
>    So if I have a servlet that does something in isolation (like my hobby
>    project https://hub.docker.com/r/nielsbasjes/yauaa ) then 'latest'
> makes
>    sense.
>    With Flink you have the application code and all nodes in the cluster
>    that are depending on each other and as such must run the exact same
>    versions of the base software.
>    So if you run flink in a cluster (local/yarn/k8s/mesos/swarm/...) where
>    the application and the nodes inter communicate and closely depend on
> each
>    other then 'latest' is a bad idea.
>       1. Assume I have an application built against the Flink N api and the
>       cluster downloads the latest which is also Flink N.
>       Then a week later Flink N+1 is released and the API I use changes
>       (Deprecated)
>       and a while later Flink N+2 is released and the deprecated API is
>       removed: Then my application no longer works even though I have
> not changed
>       anything.
>       So I want my application to be 'pinned' to the exact version I built
>       it with.
>       2. I have a running cluster with my application and cluster running
>       Flink N.
>       I add some additional nodes and the new nodes pick up the Flink N+1
>       image ... now I have a cluster with mixed versions.
>       3. The version of flink is really the "Flink+Scala" version pair.
>       If you have the right flink but the wrong scala you get really nasty
>       errors: https://issues.apache.org/jira/browse/FLINK-16289
>
>       2. Deploy SNAPSHOT docker images (i.e. something like
>    *flink:1.11-SNAPSHOT_2.12*) .
>    More and more use cases will be running on the code delivered via Docker
>    images instead of bare jar files.
>    So if a "SNAPSHOT" is released and deployed into a 'staging' maven repo
>    (which may be locally on the developers workstation) then it is my
> opinion
>    that at the same moment a "SNAPSHOT" docker image should be
>    created/deployed.
>    Each time a "SNAPSHOT" docker image is released this will overwrite the
>    previous "SNAPSHOT".
>    If the final version is released the SNAPSHOTs of that version
>    can/should be removed.
>    This will make testing in clusters a lot easier.
>    Also building a local fix and then running it locally will work without
>    additional modifications to the code.
>
>    3. Support for a 'single application cluster'
>    I've been playing around with the S3 plugin and what I have found is
>    that this essentially requires all nodes to have full access to the
>    credentials needed to connect to S3.
>    This essentially means that a multi-tenant setup is not possible in
>    these cases.
>    So I think the single application cluster should be a feature available
>    in all cases.
>
>    4. I would like a native-kubernetes-single-application base image.
>    I can then create a derived image where I only add the jar of my
>    application.
>    My desire is that I can then create a k8s yaml file for kubectl
>    that adds the needed configs/secrets/arguments/environment variables and
>    starts the cluster and application.
>    Because the native kubernetes support makes it automatically scale based
>    on the application this should 'just work'.
>
> Additional note:
>
>    1. Job/Task attempt logging instead of task manager logging.
>    *I realize this has nothing to do with the docker images*
>    I found something "hard to work with" while running some tests last
> week.
>    The logging is done to a single log for the task manager.
>    So if I have multiple things running in the single task manager then the
>    logs are mixed together.
>    Also several attempts of the same task are mixed which makes it very
>    hard to find out 'what went wrong'.
>
>
>
> On Fri, Apr 3, 2020 at 4:27 PM Ufuk Celebi <uc...@apache.org> wrote:
>
> > Thanks for the summary, Andrey. Good idea to link Patrick's document from
> > the FLIP as a future direction so it doesn't get lost. Could you make
> sure
> > to revive that discussion when FLIP-111 nears an end?
> >
> > This is good to go on my part. +1 to start the VOTE.
> >
> >
> > @Till, @Yang: Thanks for the clarification with the output redirection. I
> > didn't see that. The concern with the `tee` approach is that the file
> would
> > grow indefinitely. I think we can solve this with regular logging by
> > redirecting stderr to ERROR log level, but I'm not sure. We can look at a
> > potential solution when we get to that point. :-)
> >
> >
> >
> > On Fri, Apr 3, 2020 at 3:36 PM Andrey Zagrebin <az...@apache.org>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Patrick and Ufuk, thanks a lot for more ideas and suggestions!
> > >
> > > I have updated the FLIP according to the current state of discussion.
> > > Now it also contains the implementation steps and future follow-ups.
> > > Please, review if there are any concerns.
> > > The order of the steps aims for keeping Flink releasable at any point
> if
> > > something does not have enough time to get in.
> > >
> > > It looks that we are reaching mostly a consensus for the open
> questions.
> > > There is also a list of items, which have been discussed in this
> thread,
> > > and short summary below.
> > > As soon as there are no concerns, I will create a voting thread.
> > >
> > > I also added some thoughts for further customising logging setup. This
> > may
> > > be an optional follow-up
> > > which is additional to the default logging into files for Web UI.
> > >
> > > # FLIP scope
> > > The focus is users of the official releases.
> > > Create docs for how to use the official docker image.
> > > Remove other Dockerfiles in Flink repo.
> > > Rely on running the official docker image in different modes (JM/TM).
> > > Customise running the official image with env vars (This should
> minimise
> > > manual manipulating of local files and creation of a custom image).
> > >
> > > # Base oficial image
> > >
> > > ## Java versions
> > > There is a separate effort for this:
> > > https://github.com/apache/flink-docker/pull/9
> > >
> > > # Run image
> > >
> > > ## Entry point modes
> > > JM session, JM job, TM
> > >
> > > ## Entry point config
> > > We use env vars for this, e.g. FLINK_PROPERTIES and
> > ENABLE_BUILT_IN_PLUGINS
> > >
> > > ## Flink config options
> > > We document the existing FLINK_PROPERTIES env var to override config
> > > options in flink-conf.yaml.
> > > Then later, we do not need to expose and handle any other special env
> > vars
> > > for config options (address, port etc).
> > > The future plan is to make Flink process configurable by env vars, e.g.
> > > 'some.yaml.option: val' -> FLINK_SOME_YAML_OPTION=val
> > >
> > > ## Extra files: jars, custom logging properties
> > > We can provide env vars to point to custom locations, e.g. in mounted
> > > volumes.
> > >
> > > # Extend image
> > >
> > > ## Python/hadoop versions, activating certain libs/plugins
> > > Users can install extra dependencies and change configs in their custom
> > > image which extends our base image.
> > >
> > > # Logging
> > >
> > > ## Web UI
> > > Modify the *log4j-console.properties* to also output logs into the
> files
> > > for WebUI. Limit log file size.
> > >
> > > ## Container output
> > > Separate effort for proper split of Flink process stdout and stderr
> into
> > > files and container output
> > > (idea with tee command: `program start-foreground &2>1 | tee
> > > flink-user-taskexecutor.out`)
> > >
> > > # Docker bash utils
> > > We are not going to expose it to users as an API.
> > > They should be able either to configure and run the standard entry
> point
> > > or the documentation should give short examples about how to extend and
> > > customise the base image.
> > >
> > > During the implementation, we will see if it makes sense to factor out
> > > certain bash procedures
> > > to reuse them e.g. in custom dev versions of docker image.
> > >
> > > # Dockerfile / image for developers
> > > We keep it on our future roadmap. This effort should help to understand
> > > what we can reuse there.
> > >
> > > Best,
> > > Andrey
> > >
> > >
> > > On Fri, Apr 3, 2020 at 12:57 PM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> just a small inline comment.
> > >>
> > >> On Fri, Apr 3, 2020 at 11:42 AM Ufuk Celebi <uc...@apache.org> wrote:
> > >>
> > >> > Hey Yang,
> > >> >
> > >> > thanks! See inline answers.
> > >> >
> > >> > On Fri, Apr 3, 2020 at 5:11 AM Yang Wang <da...@gmail.com>
> > wrote:
> > >> >
> > >> > > Hi Ufuk,
> > >> > >
> > >> > > Thanks for make the conclusion and directly point out what need to
> > be
> > >> > done
> > >> > > in
> > >> > > FLIP-111. I agree with you that we should narrow down the scope
> and
> > >> focus
> > >> > > the
> > >> > > most important and basic part about docker image unification.
> > >> > >
> > >> > > (1) Extend the entrypoint script in apache/flink-docker to start
> the
> > >> job
> > >> > >> cluster entry point
> > >> > >
> > >> > > I want to add a small requirement for the entry point script.
> > >> Currently,
> > >> > > for the native
> > >> > > K8s integration, we are using the apache/flink-docker image, but
> > with
> > >> > > different entry
> > >> > > point("kubernetes-entry.sh"). Generate the java cmd in
> > KubernetesUtils
> > >> > and
> > >> > > run it
> > >> > > in the entry point. I really hope it could merge to
> > >> apache/flink-docker
> > >> > > "docker-entrypoint.sh".
> > >> > >
> > >> >
> > >> > The script [1] only adds the FLINK_CLASSPATH env var which seems
> > >> generally
> > >> > reasonable to me. But since principled classpath and entrypoint
> > >> > configuration is somewhat related to the follow-up improvement
> > >> proposals, I
> > >> > could also see this being done after FLIP-111.
> > >> >
> > >> >
> > >> > > (2) Extend the example log4j-console configuration
> > >> > >> => support log retrieval from the Flink UI out of the box
> > >> > >
> > >> > > If you mean to update the
> "flink-dist/conf/log4j-console.properties"
> > >> to
> > >> > > support console and
> > >> > > local log files. I will say "+1". But we need to find a proper way
> > to
> > >> > make
> > >> > > stdout/stderr output
> > >> > > both available for console and log files. Maybe till's proposal
> > could
> > >> > help
> > >> > > to solve this.
> > >> > > "`program &2>1 | tee flink-user-taskexecutor.out`"
> > >> > >
> > >> >
> > >> > I think we can simply add a rolling file appender with a limit on
> the
> > >> log
> > >> > size.
> > >> >
> > >> > I think this won't solve Yang's concern. What he wants to achieve is
> > >> that
> > >> STDOUT and STDERR go to STDOUT and STDERR as well as into some *.out
> and
> > >> *.err file which are accessible from the web ui. I don't think that
> log
> > >> appender will help with this problem.
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >>
> > >> > – Ufuk
> > >> >
> > >> > [1]
> > >> >
> > >> >
> > >>
> >
> https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/kubernetes-bin/kubernetes-entry.sh
> > >> >
> > >>
> > >
> >
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Niels Basjes <Ni...@basjes.nl>.
Hi all,

Sorry for jumping in at this late point of the discussion.
I see a lot of things I really like and I would like to put my "needs" and
observations here too so you take them into account (where possible).
I suspect that there will be overlap with things you already have taken
into account.

   1. No more 'flink:latest' docker image tag.
   Related to https://issues.apache.org/jira/browse/FLINK-15794
   What I have learned is that the 'latest' version of a docker image only
   makes sense IFF this is an almost standalone thing.
   So if I have a servlet that does something in isolation (like my hobby
   project https://hub.docker.com/r/nielsbasjes/yauaa ) then 'latest' makes
   sense.
   With Flink you have the application code and all nodes in the cluster
   that are depending on each other and as such must run the exact same
   versions of the base software.
   So if you run flink in a cluster (local/yarn/k8s/mesos/swarm/...) where
   the application and the nodes inter communicate and closely depend on each
   other then 'latest' is a bad idea.
      1. Assume I have an application built against the Flink N api and the
      cluster downloads the latest which is also Flink N.
      Then a week later Flink N+1 is released and the API I use changes
      (Deprecated)
      and a while later Flink N+2 is released and the deprecated API is
      removed: Then my application no longer works even though I have
not changed
      anything.
      So I want my application to be 'pinned' to the exact version I built
      it with.
      2. I have a running cluster with my application and cluster running
      Flink N.
      I add some additional nodes and the new nodes pick up the Flink N+1
      image ... now I have a cluster with mixed versions.
      3. The version of flink is really the "Flink+Scala" version pair.
      If you have the right flink but the wrong scala you get really nasty
      errors: https://issues.apache.org/jira/browse/FLINK-16289

      2. Deploy SNAPSHOT docker images (i.e. something like
   *flink:1.11-SNAPSHOT_2.12*) .
   More and more use cases will be running on the code delivered via Docker
   images instead of bare jar files.
   So if a "SNAPSHOT" is released and deployed into a 'staging' maven repo
   (which may be locally on the developers workstation) then it is my opinion
   that at the same moment a "SNAPSHOT" docker image should be
   created/deployed.
   Each time a "SNAPSHOT" docker image is released this will overwrite the
   previous "SNAPSHOT".
   If the final version is released the SNAPSHOTs of that version
   can/should be removed.
   This will make testing in clusters a lot easier.
   Also building a local fix and then running it locally will work without
   additional modifications to the code.

   3. Support for a 'single application cluster'
   I've been playing around with the S3 plugin and what I have found is
   that this essentially requires all nodes to have full access to the
   credentials needed to connect to S3.
   This essentially means that a multi-tenant setup is not possible in
   these cases.
   So I think the single application cluster should be a feature available
   in all cases.

   4. I would like a native-kubernetes-single-application base image.
   I can then create a derived image where I only add the jar of my
   application.
   My desire is that I can then create a k8s yaml file for kubectl
   that adds the needed configs/secrets/arguments/environment variables and
   starts the cluster and application.
   Because the native kubernetes support makes it automatically scale based
   on the application this should 'just work'.

Additional note:

   1. Job/Task attempt logging instead of task manager logging.
   *I realize this has nothing to do with the docker images*
   I found something "hard to work with" while running some tests last week.
   The logging is done to a single log for the task manager.
   So if I have multiple things running in the single task manager then the
   logs are mixed together.
   Also several attempts of the same task are mixed which makes it very
   hard to find out 'what went wrong'.



On Fri, Apr 3, 2020 at 4:27 PM Ufuk Celebi <uc...@apache.org> wrote:

> Thanks for the summary, Andrey. Good idea to link Patrick's document from
> the FLIP as a future direction so it doesn't get lost. Could you make sure
> to revive that discussion when FLIP-111 nears an end?
>
> This is good to go on my part. +1 to start the VOTE.
>
>
> @Till, @Yang: Thanks for the clarification with the output redirection. I
> didn't see that. The concern with the `tee` approach is that the file would
> grow indefinitely. I think we can solve this with regular logging by
> redirecting stderr to ERROR log level, but I'm not sure. We can look at a
> potential solution when we get to that point. :-)
>
>
>
> On Fri, Apr 3, 2020 at 3:36 PM Andrey Zagrebin <az...@apache.org>
> wrote:
>
> > Hi everyone,
> >
> > Patrick and Ufuk, thanks a lot for more ideas and suggestions!
> >
> > I have updated the FLIP according to the current state of discussion.
> > Now it also contains the implementation steps and future follow-ups.
> > Please, review if there are any concerns.
> > The order of the steps aims for keeping Flink releasable at any point if
> > something does not have enough time to get in.
> >
> > It looks that we are reaching mostly a consensus for the open questions.
> > There is also a list of items, which have been discussed in this thread,
> > and short summary below.
> > As soon as there are no concerns, I will create a voting thread.
> >
> > I also added some thoughts for further customising logging setup. This
> may
> > be an optional follow-up
> > which is additional to the default logging into files for Web UI.
> >
> > # FLIP scope
> > The focus is users of the official releases.
> > Create docs for how to use the official docker image.
> > Remove other Dockerfiles in Flink repo.
> > Rely on running the official docker image in different modes (JM/TM).
> > Customise running the official image with env vars (This should minimise
> > manual manipulating of local files and creation of a custom image).
> >
> > # Base oficial image
> >
> > ## Java versions
> > There is a separate effort for this:
> > https://github.com/apache/flink-docker/pull/9
> >
> > # Run image
> >
> > ## Entry point modes
> > JM session, JM job, TM
> >
> > ## Entry point config
> > We use env vars for this, e.g. FLINK_PROPERTIES and
> ENABLE_BUILT_IN_PLUGINS
> >
> > ## Flink config options
> > We document the existing FLINK_PROPERTIES env var to override config
> > options in flink-conf.yaml.
> > Then later, we do not need to expose and handle any other special env
> vars
> > for config options (address, port etc).
> > The future plan is to make Flink process configurable by env vars, e.g.
> > 'some.yaml.option: val' -> FLINK_SOME_YAML_OPTION=val
> >
> > ## Extra files: jars, custom logging properties
> > We can provide env vars to point to custom locations, e.g. in mounted
> > volumes.
> >
> > # Extend image
> >
> > ## Python/hadoop versions, activating certain libs/plugins
> > Users can install extra dependencies and change configs in their custom
> > image which extends our base image.
> >
> > # Logging
> >
> > ## Web UI
> > Modify the *log4j-console.properties* to also output logs into the files
> > for WebUI. Limit log file size.
> >
> > ## Container output
> > Separate effort for proper split of Flink process stdout and stderr into
> > files and container output
> > (idea with tee command: `program start-foreground &2>1 | tee
> > flink-user-taskexecutor.out`)
> >
> > # Docker bash utils
> > We are not going to expose it to users as an API.
> > They should be able either to configure and run the standard entry point
> > or the documentation should give short examples about how to extend and
> > customise the base image.
> >
> > During the implementation, we will see if it makes sense to factor out
> > certain bash procedures
> > to reuse them e.g. in custom dev versions of docker image.
> >
> > # Dockerfile / image for developers
> > We keep it on our future roadmap. This effort should help to understand
> > what we can reuse there.
> >
> > Best,
> > Andrey
> >
> >
> > On Fri, Apr 3, 2020 at 12:57 PM Till Rohrmann <tr...@apache.org>
> > wrote:
> >
> >> Hi everyone,
> >>
> >> just a small inline comment.
> >>
> >> On Fri, Apr 3, 2020 at 11:42 AM Ufuk Celebi <uc...@apache.org> wrote:
> >>
> >> > Hey Yang,
> >> >
> >> > thanks! See inline answers.
> >> >
> >> > On Fri, Apr 3, 2020 at 5:11 AM Yang Wang <da...@gmail.com>
> wrote:
> >> >
> >> > > Hi Ufuk,
> >> > >
> >> > > Thanks for make the conclusion and directly point out what need to
> be
> >> > done
> >> > > in
> >> > > FLIP-111. I agree with you that we should narrow down the scope and
> >> focus
> >> > > the
> >> > > most important and basic part about docker image unification.
> >> > >
> >> > > (1) Extend the entrypoint script in apache/flink-docker to start the
> >> job
> >> > >> cluster entry point
> >> > >
> >> > > I want to add a small requirement for the entry point script.
> >> Currently,
> >> > > for the native
> >> > > K8s integration, we are using the apache/flink-docker image, but
> with
> >> > > different entry
> >> > > point("kubernetes-entry.sh"). Generate the java cmd in
> KubernetesUtils
> >> > and
> >> > > run it
> >> > > in the entry point. I really hope it could merge to
> >> apache/flink-docker
> >> > > "docker-entrypoint.sh".
> >> > >
> >> >
> >> > The script [1] only adds the FLINK_CLASSPATH env var which seems
> >> generally
> >> > reasonable to me. But since principled classpath and entrypoint
> >> > configuration is somewhat related to the follow-up improvement
> >> proposals, I
> >> > could also see this being done after FLIP-111.
> >> >
> >> >
> >> > > (2) Extend the example log4j-console configuration
> >> > >> => support log retrieval from the Flink UI out of the box
> >> > >
> >> > > If you mean to update the "flink-dist/conf/log4j-console.properties"
> >> to
> >> > > support console and
> >> > > local log files. I will say "+1". But we need to find a proper way
> to
> >> > make
> >> > > stdout/stderr output
> >> > > both available for console and log files. Maybe till's proposal
> could
> >> > help
> >> > > to solve this.
> >> > > "`program &2>1 | tee flink-user-taskexecutor.out`"
> >> > >
> >> >
> >> > I think we can simply add a rolling file appender with a limit on the
> >> log
> >> > size.
> >> >
> >> > I think this won't solve Yang's concern. What he wants to achieve is
> >> that
> >> STDOUT and STDERR go to STDOUT and STDERR as well as into some *.out and
> >> *.err file which are accessible from the web ui. I don't think that log
> >> appender will help with this problem.
> >>
> >> Cheers,
> >> Till
> >>
> >>
> >> > – Ufuk
> >> >
> >> > [1]
> >> >
> >> >
> >>
> https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/kubernetes-bin/kubernetes-entry.sh
> >> >
> >>
> >
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Ufuk Celebi <uc...@apache.org>.
Thanks for the summary, Andrey. Good idea to link Patrick's document from
the FLIP as a future direction so it doesn't get lost. Could you make sure
to revive that discussion when FLIP-111 nears an end?

This is good to go on my part. +1 to start the VOTE.


@Till, @Yang: Thanks for the clarification with the output redirection. I
didn't see that. The concern with the `tee` approach is that the file would
grow indefinitely. I think we can solve this with regular logging by
redirecting stderr to ERROR log level, but I'm not sure. We can look at a
potential solution when we get to that point. :-)



On Fri, Apr 3, 2020 at 3:36 PM Andrey Zagrebin <az...@apache.org> wrote:

> Hi everyone,
>
> Patrick and Ufuk, thanks a lot for more ideas and suggestions!
>
> I have updated the FLIP according to the current state of discussion.
> Now it also contains the implementation steps and future follow-ups.
> Please, review if there are any concerns.
> The order of the steps aims for keeping Flink releasable at any point if
> something does not have enough time to get in.
>
> It looks that we are reaching mostly a consensus for the open questions.
> There is also a list of items, which have been discussed in this thread,
> and short summary below.
> As soon as there are no concerns, I will create a voting thread.
>
> I also added some thoughts for further customising logging setup. This may
> be an optional follow-up
> which is additional to the default logging into files for Web UI.
>
> # FLIP scope
> The focus is users of the official releases.
> Create docs for how to use the official docker image.
> Remove other Dockerfiles in Flink repo.
> Rely on running the official docker image in different modes (JM/TM).
> Customise running the official image with env vars (This should minimise
> manual manipulating of local files and creation of a custom image).
>
> # Base oficial image
>
> ## Java versions
> There is a separate effort for this:
> https://github.com/apache/flink-docker/pull/9
>
> # Run image
>
> ## Entry point modes
> JM session, JM job, TM
>
> ## Entry point config
> We use env vars for this, e.g. FLINK_PROPERTIES and ENABLE_BUILT_IN_PLUGINS
>
> ## Flink config options
> We document the existing FLINK_PROPERTIES env var to override config
> options in flink-conf.yaml.
> Then later, we do not need to expose and handle any other special env vars
> for config options (address, port etc).
> The future plan is to make Flink process configurable by env vars, e.g.
> 'some.yaml.option: val' -> FLINK_SOME_YAML_OPTION=val
>
> ## Extra files: jars, custom logging properties
> We can provide env vars to point to custom locations, e.g. in mounted
> volumes.
>
> # Extend image
>
> ## Python/hadoop versions, activating certain libs/plugins
> Users can install extra dependencies and change configs in their custom
> image which extends our base image.
>
> # Logging
>
> ## Web UI
> Modify the *log4j-console.properties* to also output logs into the files
> for WebUI. Limit log file size.
>
> ## Container output
> Separate effort for proper split of Flink process stdout and stderr into
> files and container output
> (idea with tee command: `program start-foreground &2>1 | tee
> flink-user-taskexecutor.out`)
>
> # Docker bash utils
> We are not going to expose it to users as an API.
> They should be able either to configure and run the standard entry point
> or the documentation should give short examples about how to extend and
> customise the base image.
>
> During the implementation, we will see if it makes sense to factor out
> certain bash procedures
> to reuse them e.g. in custom dev versions of docker image.
>
> # Dockerfile / image for developers
> We keep it on our future roadmap. This effort should help to understand
> what we can reuse there.
>
> Best,
> Andrey
>
>
> On Fri, Apr 3, 2020 at 12:57 PM Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hi everyone,
>>
>> just a small inline comment.
>>
>> On Fri, Apr 3, 2020 at 11:42 AM Ufuk Celebi <uc...@apache.org> wrote:
>>
>> > Hey Yang,
>> >
>> > thanks! See inline answers.
>> >
>> > On Fri, Apr 3, 2020 at 5:11 AM Yang Wang <da...@gmail.com> wrote:
>> >
>> > > Hi Ufuk,
>> > >
>> > > Thanks for make the conclusion and directly point out what need to be
>> > done
>> > > in
>> > > FLIP-111. I agree with you that we should narrow down the scope and
>> focus
>> > > the
>> > > most important and basic part about docker image unification.
>> > >
>> > > (1) Extend the entrypoint script in apache/flink-docker to start the
>> job
>> > >> cluster entry point
>> > >
>> > > I want to add a small requirement for the entry point script.
>> Currently,
>> > > for the native
>> > > K8s integration, we are using the apache/flink-docker image, but with
>> > > different entry
>> > > point("kubernetes-entry.sh"). Generate the java cmd in KubernetesUtils
>> > and
>> > > run it
>> > > in the entry point. I really hope it could merge to
>> apache/flink-docker
>> > > "docker-entrypoint.sh".
>> > >
>> >
>> > The script [1] only adds the FLINK_CLASSPATH env var which seems
>> generally
>> > reasonable to me. But since principled classpath and entrypoint
>> > configuration is somewhat related to the follow-up improvement
>> proposals, I
>> > could also see this being done after FLIP-111.
>> >
>> >
>> > > (2) Extend the example log4j-console configuration
>> > >> => support log retrieval from the Flink UI out of the box
>> > >
>> > > If you mean to update the "flink-dist/conf/log4j-console.properties"
>> to
>> > > support console and
>> > > local log files. I will say "+1". But we need to find a proper way to
>> > make
>> > > stdout/stderr output
>> > > both available for console and log files. Maybe till's proposal could
>> > help
>> > > to solve this.
>> > > "`program &2>1 | tee flink-user-taskexecutor.out`"
>> > >
>> >
>> > I think we can simply add a rolling file appender with a limit on the
>> log
>> > size.
>> >
>> > I think this won't solve Yang's concern. What he wants to achieve is
>> that
>> STDOUT and STDERR go to STDOUT and STDERR as well as into some *.out and
>> *.err file which are accessible from the web ui. I don't think that log
>> appender will help with this problem.
>>
>> Cheers,
>> Till
>>
>>
>> > – Ufuk
>> >
>> > [1]
>> >
>> >
>> https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/kubernetes-bin/kubernetes-entry.sh
>> >
>>
>

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Andrey Zagrebin <az...@apache.org>.
Hi everyone,

Patrick and Ufuk, thanks a lot for more ideas and suggestions!

I have updated the FLIP according to the current state of discussion.
Now it also contains the implementation steps and future follow-ups.
Please, review if there are any concerns.
The order of the steps aims for keeping Flink releasable at any point if
something does not have enough time to get in.

It looks that we are reaching mostly a consensus for the open questions.
There is also a list of items, which have been discussed in this thread,
and short summary below.
As soon as there are no concerns, I will create a voting thread.

I also added some thoughts for further customising logging setup. This may
be an optional follow-up
which is additional to the default logging into files for Web UI.

# FLIP scope
The focus is users of the official releases.
Create docs for how to use the official docker image.
Remove other Dockerfiles in Flink repo.
Rely on running the official docker image in different modes (JM/TM).
Customise running the official image with env vars (This should minimise
manual manipulating of local files and creation of a custom image).

# Base oficial image

## Java versions
There is a separate effort for this:
https://github.com/apache/flink-docker/pull/9

# Run image

## Entry point modes
JM session, JM job, TM

## Entry point config
We use env vars for this, e.g. FLINK_PROPERTIES and ENABLE_BUILT_IN_PLUGINS

## Flink config options
We document the existing FLINK_PROPERTIES env var to override config
options in flink-conf.yaml.
Then later, we do not need to expose and handle any other special env vars
for config options (address, port etc).
The future plan is to make Flink process configurable by env vars, e.g.
'some.yaml.option: val' -> FLINK_SOME_YAML_OPTION=val

## Extra files: jars, custom logging properties
We can provide env vars to point to custom locations, e.g. in mounted
volumes.

# Extend image

## Python/hadoop versions, activating certain libs/plugins
Users can install extra dependencies and change configs in their custom
image which extends our base image.

# Logging

## Web UI
Modify the *log4j-console.properties* to also output logs into the files
for WebUI. Limit log file size.

## Container output
Separate effort for proper split of Flink process stdout and stderr into
files and container output
(idea with tee command: `program start-foreground &2>1 | tee
flink-user-taskexecutor.out`)

# Docker bash utils
We are not going to expose it to users as an API.
They should be able either to configure and run the standard entry point
or the documentation should give short examples about how to extend and
customise the base image.

During the implementation, we will see if it makes sense to factor out
certain bash procedures
to reuse them e.g. in custom dev versions of docker image.

# Dockerfile / image for developers
We keep it on our future roadmap. This effort should help to understand
what we can reuse there.

Best,
Andrey


On Fri, Apr 3, 2020 at 12:57 PM Till Rohrmann <tr...@apache.org> wrote:

> Hi everyone,
>
> just a small inline comment.
>
> On Fri, Apr 3, 2020 at 11:42 AM Ufuk Celebi <uc...@apache.org> wrote:
>
> > Hey Yang,
> >
> > thanks! See inline answers.
> >
> > On Fri, Apr 3, 2020 at 5:11 AM Yang Wang <da...@gmail.com> wrote:
> >
> > > Hi Ufuk,
> > >
> > > Thanks for make the conclusion and directly point out what need to be
> > done
> > > in
> > > FLIP-111. I agree with you that we should narrow down the scope and
> focus
> > > the
> > > most important and basic part about docker image unification.
> > >
> > > (1) Extend the entrypoint script in apache/flink-docker to start the
> job
> > >> cluster entry point
> > >
> > > I want to add a small requirement for the entry point script.
> Currently,
> > > for the native
> > > K8s integration, we are using the apache/flink-docker image, but with
> > > different entry
> > > point("kubernetes-entry.sh"). Generate the java cmd in KubernetesUtils
> > and
> > > run it
> > > in the entry point. I really hope it could merge to apache/flink-docker
> > > "docker-entrypoint.sh".
> > >
> >
> > The script [1] only adds the FLINK_CLASSPATH env var which seems
> generally
> > reasonable to me. But since principled classpath and entrypoint
> > configuration is somewhat related to the follow-up improvement
> proposals, I
> > could also see this being done after FLIP-111.
> >
> >
> > > (2) Extend the example log4j-console configuration
> > >> => support log retrieval from the Flink UI out of the box
> > >
> > > If you mean to update the "flink-dist/conf/log4j-console.properties" to
> > > support console and
> > > local log files. I will say "+1". But we need to find a proper way to
> > make
> > > stdout/stderr output
> > > both available for console and log files. Maybe till's proposal could
> > help
> > > to solve this.
> > > "`program &2>1 | tee flink-user-taskexecutor.out`"
> > >
> >
> > I think we can simply add a rolling file appender with a limit on the log
> > size.
> >
> > I think this won't solve Yang's concern. What he wants to achieve is that
> STDOUT and STDERR go to STDOUT and STDERR as well as into some *.out and
> *.err file which are accessible from the web ui. I don't think that log
> appender will help with this problem.
>
> Cheers,
> Till
>
>
> > – Ufuk
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/kubernetes-bin/kubernetes-entry.sh
> >
>

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Till Rohrmann <tr...@apache.org>.
Hi everyone,

just a small inline comment.

On Fri, Apr 3, 2020 at 11:42 AM Ufuk Celebi <uc...@apache.org> wrote:

> Hey Yang,
>
> thanks! See inline answers.
>
> On Fri, Apr 3, 2020 at 5:11 AM Yang Wang <da...@gmail.com> wrote:
>
> > Hi Ufuk,
> >
> > Thanks for make the conclusion and directly point out what need to be
> done
> > in
> > FLIP-111. I agree with you that we should narrow down the scope and focus
> > the
> > most important and basic part about docker image unification.
> >
> > (1) Extend the entrypoint script in apache/flink-docker to start the job
> >> cluster entry point
> >
> > I want to add a small requirement for the entry point script. Currently,
> > for the native
> > K8s integration, we are using the apache/flink-docker image, but with
> > different entry
> > point("kubernetes-entry.sh"). Generate the java cmd in KubernetesUtils
> and
> > run it
> > in the entry point. I really hope it could merge to apache/flink-docker
> > "docker-entrypoint.sh".
> >
>
> The script [1] only adds the FLINK_CLASSPATH env var which seems generally
> reasonable to me. But since principled classpath and entrypoint
> configuration is somewhat related to the follow-up improvement proposals, I
> could also see this being done after FLIP-111.
>
>
> > (2) Extend the example log4j-console configuration
> >> => support log retrieval from the Flink UI out of the box
> >
> > If you mean to update the "flink-dist/conf/log4j-console.properties" to
> > support console and
> > local log files. I will say "+1". But we need to find a proper way to
> make
> > stdout/stderr output
> > both available for console and log files. Maybe till's proposal could
> help
> > to solve this.
> > "`program &2>1 | tee flink-user-taskexecutor.out`"
> >
>
> I think we can simply add a rolling file appender with a limit on the log
> size.
>
> I think this won't solve Yang's concern. What he wants to achieve is that
STDOUT and STDERR go to STDOUT and STDERR as well as into some *.out and
*.err file which are accessible from the web ui. I don't think that log
appender will help with this problem.

Cheers,
Till


> – Ufuk
>
> [1]
>
> https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/kubernetes-bin/kubernetes-entry.sh
>

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Ufuk Celebi <uc...@apache.org>.
Hey Yang,

thanks! See inline answers.

On Fri, Apr 3, 2020 at 5:11 AM Yang Wang <da...@gmail.com> wrote:

> Hi Ufuk,
>
> Thanks for make the conclusion and directly point out what need to be done
> in
> FLIP-111. I agree with you that we should narrow down the scope and focus
> the
> most important and basic part about docker image unification.
>
> (1) Extend the entrypoint script in apache/flink-docker to start the job
>> cluster entry point
>
> I want to add a small requirement for the entry point script. Currently,
> for the native
> K8s integration, we are using the apache/flink-docker image, but with
> different entry
> point("kubernetes-entry.sh"). Generate the java cmd in KubernetesUtils and
> run it
> in the entry point. I really hope it could merge to apache/flink-docker
> "docker-entrypoint.sh".
>

The script [1] only adds the FLINK_CLASSPATH env var which seems generally
reasonable to me. But since principled classpath and entrypoint
configuration is somewhat related to the follow-up improvement proposals, I
could also see this being done after FLIP-111.


> (2) Extend the example log4j-console configuration
>> => support log retrieval from the Flink UI out of the box
>
> If you mean to update the "flink-dist/conf/log4j-console.properties" to
> support console and
> local log files. I will say "+1". But we need to find a proper way to make
> stdout/stderr output
> both available for console and log files. Maybe till's proposal could help
> to solve this.
> "`program &2>1 | tee flink-user-taskexecutor.out`"
>

I think we can simply add a rolling file appender with a limit on the log
size.

– Ufuk

[1]
https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/kubernetes-bin/kubernetes-entry.sh

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Yang Wang <da...@gmail.com>.
Hi Ufuk,

Thanks for make the conclusion and directly point out what need to be done
in
FLIP-111. I agree with you that we should narrow down the scope and focus
the
most important and basic part about docker image unification.

(1) Extend the entrypoint script in apache/flink-docker to start the job
> cluster entry point

I want to add a small requirement for the entry point script. Currently,
for the native
K8s integration, we are using the apache/flink-docker image, but with
different entry
point("kubernetes-entry.sh"). Generate the java cmd in KubernetesUtils and
run it
in the entry point. I really hope it could merge to apache/flink-docker
"docker-entrypoint.sh".

(2) Extend the example log4j-console configuration
> => support log retrieval from the Flink UI out of the box

If you mean to update the "flink-dist/conf/log4j-console.properties" to
support console and
local log files. I will say "+1". But we need to find a proper way to make
stdout/stderr output
both available for console and log files. Maybe till's proposal could help
to solve this.
"`program &2>1 | tee flink-user-taskexecutor.out`"

(3) Document typical usage scenarios in apache/flink-docker
> => this should replace the proposed flink_docker_utils helper

 I agree with you that in the first step, the documentation is enough for
typical usage(e.g. standalone
session, standalone perjob, native, plugins, python, etc.).


Best,
Yang


Ufuk Celebi <uc...@apache.org> 于2020年4月3日周五 上午1:03写道:

> Hey all,
>
> thanks for the proposal and the detailed discussion. In particular, thanks
> to Andrey for starting this thread and to Patrick for the additional ideas
> in the linked Google doc.
>
> I find many of the improvements proposed during the discussion (such as the
> unified entrypoint in Flink, proper configuration via environment
> variables, Dockerfiles for development, etc.) really important. At the same
> time, I believe that these improvements have quite a large scope and could
> be tackled independently as Till already suggested. I think we should
> ideally split the discussions for those improvements out of this thread and
> focus on the main target of FLIP-111.
>
> To me the major point of this FLIP is to consolidate existing Dockerfiles
> into apache/flink-docker and document typical usage scenarios (e.g. linking
> plugins, installing shaded Hadoop, running a job cluster, etc.).
>
> In order to achieve this, I think we could move forward as follows:
>
> (1) Extend the entrypoint script in apache/flink-docker to start the job
> cluster entry point
> => this is currently missing and would block removal of the Dockerfile in
> flink-container
>
> (2) Extend the example log4j-console configuration
> => support log retrieval from the Flink UI out of the box
>
> (3) Document typical usage scenarios in apache/flink-docker
> => this should replace the proposed flink_docker_utils helper
>
> (4) Remove the existing Dockerfiles from apache/flink
>
>
> I really like the convenience of a script such as flink_docker_utils, but I
> think we should avoid it for now, because most of the desired usage
> scenarios can be covered by documentation. After we have concluded (1)-(4)
> we can take a holistic look and identify what would benefit the most from
> such a script and how it would interact with the other planned
> improvements.
>
> I think this will give us a good basis to tackle the other major
> improvements that were proposed.
>
> – Ufuk
>
> On Thu, Apr 2, 2020 at 4:34 PM Patrick Lucas <pa...@ververica.com>
> wrote:
> >
> > Thanks Andrey for working on this, and everyone else for your feedback.
> >
> > This FLIP inspired me to discuss and write down some ideas I've had for a
> > while about configuring and running Flink (especially in Docker) that go
> > beyond the scope of this FLIP, but don't contradict what it sets out to
> do.
> >
> > The crux of it is that Flink should be maximally configurable using
> > environment variables, and not require manipulation of the filesystem
> (i.e.
> > moving/linking JARs or editing config files) in order to run in a large
> > majority of cases. And beyond that, particular for running Flink in
> Docker,
> > is that as much logic as possible should be a part of Flink itself and
> not,
> > for instance, in the docker-entrypoint.sh script. I've resisted adding
> > additional logic to the Flink Docker images except where necessary since
> > the beginning, and I believe we can get to the point where the only thing
> > the entrypoint script does is drop privileges before invoking a script
> > included in Flink.
> >
> > Ultimately, my ideal end-goal for running Flink in containers would
> fulfill
> > > the following points:
> > >
> > >    - A user can configure all “start-time” aspects of Flink with
> > >    environment variables, including additions to the classpath
> > >    - Flink automatically adapts to the resources available to the
> > >    container (such as what BashJavaUtils helps with today)
> > >    - A user can include additional JARs using a mounted volume, or at
> > >    image build time with convenient tooling
> > >    - The role/mode (jobmanager, session) is specified as a command line
> > >    argument, with a single entrypoint program sufficing for all uses of
> the
> > >    image
> > >
> > > As a bonus, if we could eliminate some or most of the layers of shell
> > > scripts that are involved in starting a Flink server, perhaps by
> > > re-implementing this part of the stack in Java, and exec-ing to
> actually
> > > run Flink with the proper java CLI arguments, I think it would be a big
> win
> > > for the project.
> >
> >
> > You can read the rest of my notes here:
> >
>
> https://docs.google.com/document/d/1JCACSeDaqeZiXD9G1XxQBunwi-chwrdnFm38U1JxTDQ/edit
> >
> > On Wed, Mar 4, 2020 at 10:34 AM Andrey Zagrebin <az...@apache.org>
> > wrote:
> >
> > > Hi All,
> > >
> > > If you have ever touched the docker topic in Flink, you
> > > probably noticed that we have multiple places in docs and repos which
> > > address its various concerns.
> > >
> > > We have prepared a FLIP [1] to simplify the perception of docker topic
> in
> > > Flink by users. It mostly advocates for an approach of extending
> official
> > > Flink image from the docker hub. For convenience, it can come with a
> set of
> > > bash utilities and documented examples of their usage. The utilities
> allow
> > > to:
> > >
> > >    - run the docker image in various modes (single job, session master,
> > >    task manager etc)
> > >    - customise the extending Dockerfile
> > >    - and its entry point
> > >
> > > Eventually, the FLIP suggests to remove all other user facing
> Dockerfiles
> > > and building scripts from Flink repo, move all docker docs to
> > > apache/flink-docker and adjust existing docker use cases to refer to
> this
> > > new approach (mostly Kubernetes now).
> > >
> > > The first contributed version of Flink docker integration also
> contained
> > > example and docs for the integration with Bluemix in IBM cloud. We also
> > > suggest to maintain it outside of Flink repository (cc Markus Müller).
> > >
> > > Thanks,
> > > Andrey
> > >
> > > [1]
> > >
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification
> > >
>

Re: [DISCUSS] FLIP-111: Docker image unification

Posted by Ufuk Celebi <uc...@apache.org>.
Hey all,

thanks for the proposal and the detailed discussion. In particular, thanks
to Andrey for starting this thread and to Patrick for the additional ideas
in the linked Google doc.

I find many of the improvements proposed during the discussion (such as the
unified entrypoint in Flink, proper configuration via environment
variables, Dockerfiles for development, etc.) really important. At the same
time, I believe that these improvements have quite a large scope and could
be tackled independently as Till already suggested. I think we should
ideally split the discussions for those improvements out of this thread and
focus on the main target of FLIP-111.

To me the major point of this FLIP is to consolidate existing Dockerfiles
into apache/flink-docker and document typical usage scenarios (e.g. linking
plugins, installing shaded Hadoop, running a job cluster, etc.).

In order to achieve this, I think we could move forward as follows:

(1) Extend the entrypoint script in apache/flink-docker to start the job
cluster entry point
=> this is currently missing and would block removal of the Dockerfile in
flink-container

(2) Extend the example log4j-console configuration
=> support log retrieval from the Flink UI out of the box

(3) Document typical usage scenarios in apache/flink-docker
=> this should replace the proposed flink_docker_utils helper

(4) Remove the existing Dockerfiles from apache/flink


I really like the convenience of a script such as flink_docker_utils, but I
think we should avoid it for now, because most of the desired usage
scenarios can be covered by documentation. After we have concluded (1)-(4)
we can take a holistic look and identify what would benefit the most from
such a script and how it would interact with the other planned improvements.

I think this will give us a good basis to tackle the other major
improvements that were proposed.

– Ufuk

On Thu, Apr 2, 2020 at 4:34 PM Patrick Lucas <pa...@ververica.com> wrote:
>
> Thanks Andrey for working on this, and everyone else for your feedback.
>
> This FLIP inspired me to discuss and write down some ideas I've had for a
> while about configuring and running Flink (especially in Docker) that go
> beyond the scope of this FLIP, but don't contradict what it sets out to
do.
>
> The crux of it is that Flink should be maximally configurable using
> environment variables, and not require manipulation of the filesystem
(i.e.
> moving/linking JARs or editing config files) in order to run in a large
> majority of cases. And beyond that, particular for running Flink in
Docker,
> is that as much logic as possible should be a part of Flink itself and
not,
> for instance, in the docker-entrypoint.sh script. I've resisted adding
> additional logic to the Flink Docker images except where necessary since
> the beginning, and I believe we can get to the point where the only thing
> the entrypoint script does is drop privileges before invoking a script
> included in Flink.
>
> Ultimately, my ideal end-goal for running Flink in containers would
fulfill
> > the following points:
> >
> >    - A user can configure all “start-time” aspects of Flink with
> >    environment variables, including additions to the classpath
> >    - Flink automatically adapts to the resources available to the
> >    container (such as what BashJavaUtils helps with today)
> >    - A user can include additional JARs using a mounted volume, or at
> >    image build time with convenient tooling
> >    - The role/mode (jobmanager, session) is specified as a command line
> >    argument, with a single entrypoint program sufficing for all uses of
the
> >    image
> >
> > As a bonus, if we could eliminate some or most of the layers of shell
> > scripts that are involved in starting a Flink server, perhaps by
> > re-implementing this part of the stack in Java, and exec-ing to actually
> > run Flink with the proper java CLI arguments, I think it would be a big
win
> > for the project.
>
>
> You can read the rest of my notes here:
>
https://docs.google.com/document/d/1JCACSeDaqeZiXD9G1XxQBunwi-chwrdnFm38U1JxTDQ/edit
>
> On Wed, Mar 4, 2020 at 10:34 AM Andrey Zagrebin <az...@apache.org>
> wrote:
>
> > Hi All,
> >
> > If you have ever touched the docker topic in Flink, you
> > probably noticed that we have multiple places in docs and repos which
> > address its various concerns.
> >
> > We have prepared a FLIP [1] to simplify the perception of docker topic
in
> > Flink by users. It mostly advocates for an approach of extending
official
> > Flink image from the docker hub. For convenience, it can come with a
set of
> > bash utilities and documented examples of their usage. The utilities
allow
> > to:
> >
> >    - run the docker image in various modes (single job, session master,
> >    task manager etc)
> >    - customise the extending Dockerfile
> >    - and its entry point
> >
> > Eventually, the FLIP suggests to remove all other user facing
Dockerfiles
> > and building scripts from Flink repo, move all docker docs to
> > apache/flink-docker and adjust existing docker use cases to refer to
this
> > new approach (mostly Kubernetes now).
> >
> > The first contributed version of Flink docker integration also contained
> > example and docs for the integration with Bluemix in IBM cloud. We also
> > suggest to maintain it outside of Flink repository (cc Markus Müller).
> >
> > Thanks,
> > Andrey
> >
> > [1]
> >
https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification
> >