You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by rm...@apache.org on 2020/06/16 07:51:35 UTC

[flink] branch release-1.11 updated: [FLINK-17976][docs][k8s/docker] Improvements about custom docker images

This is an automated email from the ASF dual-hosted git repository.

rmetzger pushed a commit to branch release-1.11
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/release-1.11 by this push:
     new 030df18  [FLINK-17976][docs][k8s/docker] Improvements about custom docker images
030df18 is described below

commit 030df18deaa7a21382eb5254a037c0c1b3438ad8
Author: Robert Metzger <rm...@apache.org>
AuthorDate: Tue Jun 9 16:12:03 2020 +0200

    [FLINK-17976][docs][k8s/docker] Improvements about custom docker images
    
    This closes #12558
---
 docs/ops/deployment/docker.md            | 16 +++++++-
 docs/ops/deployment/docker.zh.md         | 16 +++++++-
 docs/ops/deployment/index.md             | 67 ++++++++++++++++++++++++++++---
 docs/ops/deployment/index.zh.md          | 69 ++++++++++++++++++++++++++++----
 docs/ops/deployment/native_kubernetes.md |  6 ++-
 5 files changed, 155 insertions(+), 19 deletions(-)

diff --git a/docs/ops/deployment/docker.md b/docs/ops/deployment/docker.md
index 2525185..87564b5 100644
--- a/docs/ops/deployment/docker.md
+++ b/docs/ops/deployment/docker.md
@@ -152,6 +152,8 @@ the *Flink Master* and *TaskManagers*:
 
 * **or extend the Flink image** by writing a custom `Dockerfile`, build it and use it for starting the *Flink Master* and *TaskManagers*:
 
+    *Dockerfile*:
+
     ```dockerfile
     FROM flink
     ADD /host/path/to/job/artifacts/1 /opt/flink/usrlib/artifacts/1
@@ -233,6 +235,8 @@ To provide a custom location for the Flink configuration files, you can
 
 * or add them to your **custom Flink image**, build and run it:
 
+    *Dockerfile*:
+
     ```dockerfile
     FROM flink
     ADD /host/path/to/flink-conf.yaml /opt/flink/conf/flink-conf.yaml
@@ -264,10 +268,12 @@ There are several ways in which you can further customize the Flink image:
 
 * install custom software (e.g. python)
 * enable (symlink) optional libraries or plugins from `/opt/flink/opt` into `/opt/flink/lib` or `/opt/flink/plugins`
-* add other libraries to `/opt/flink/lib` (e.g. [hadoop](hadoop.html#adding-hadoop-to-lib))
+* add other libraries to `/opt/flink/lib` (e.g. Hadoop)
 * add other plugins to `/opt/flink/plugins`
 
-you can achieve this in several ways:
+See also: [How to provide dependencies in the classpath]({% link index.md %}#how-to-provide-dependencies-in-the-classpath).
+
+You can customize the Flink image in several ways:
 
 * **override the container entry point** with a custom script where you can run any bootstrap actions.
 At the end you can call the standard `/docker-entrypoint.sh` script of the Flink image with the same arguments
@@ -303,6 +309,8 @@ as described in [how to run the Flink image](#how-to-run-flink-image).
 
 * **extend the Flink image** by writing a custom `Dockerfile` and build a custom image:
 
+    *Dockerfile*:
+
     ```dockerfile
     FROM flink
 
@@ -319,6 +327,8 @@ as described in [how to run the Flink image](#how-to-run-flink-image).
     ENV VAR_NAME value
     ```
 
+    **Commands for building**:
+
     ```sh
     docker build -t custom_flink_image .
     # optional push to your docker image registry if you have it,
@@ -397,6 +407,7 @@ The next chapters show examples of configuration files to run Flink.
 ### Session Cluster with Docker Compose
 
 **docker-compose.yml:**
+
 ```yaml
 version: "2.2"
 services:
@@ -430,6 +441,7 @@ See also [how to specify the Flink Master arguments](#flink-master-additional-co
 in the `command` for the `jobmanager` service.
 
 **docker-compose.yml:**
+
 ```yaml
 version: "2.2"
 services:
diff --git a/docs/ops/deployment/docker.zh.md b/docs/ops/deployment/docker.zh.md
index 6a4aea7..94df910 100644
--- a/docs/ops/deployment/docker.zh.md
+++ b/docs/ops/deployment/docker.zh.md
@@ -152,6 +152,8 @@ the *Flink Master* and *TaskManagers*:
 
 * **or extend the Flink image** by writing a custom `Dockerfile`, build it and use it for starting the *Flink Master* and *TaskManagers*:
 
+    *Dockerfile*:
+
     ```dockerfile
     FROM flink
     ADD /host/path/to/job/artifacts/1 /opt/flink/usrlib/artifacts/1
@@ -233,6 +235,8 @@ To provide a custom location for the Flink configuration files, you can
 
 * or add them to your **custom Flink image**, build and run it:
 
+    *Dockerfile*:
+
     ```dockerfile
     FROM flink
     ADD /host/path/to/flink-conf.yaml /opt/flink/conf/flink-conf.yaml
@@ -264,10 +268,12 @@ There are several ways in which you can further customize the Flink image:
 
 * install custom software (e.g. python)
 * enable (symlink) optional libraries or plugins from `/opt/flink/opt` into `/opt/flink/lib` or `/opt/flink/plugins`
-* add other libraries to `/opt/flink/lib` (e.g. [hadoop](hadoop.html#adding-hadoop-to-lib))
+* add other libraries to `/opt/flink/lib` (e.g. Hadoop)
 * add other plugins to `/opt/flink/plugins`
 
-you can achieve this in several ways:
+See also: [How to provide dependencies in the classpath]({% link index.md %}#how-to-provide-dependencies-in-the-classpath).
+
+You can customize the Flink image in several ways:
 
 * **override the container entry point** with a custom script where you can run any bootstrap actions.
 At the end you can call the standard `/docker-entrypoint.sh` script of the Flink image with the same arguments
@@ -303,6 +309,8 @@ as described in [how to run the Flink image](#how-to-run-flink-image).
 
 * **extend the Flink image** by writing a custom `Dockerfile` and build a custom image:
 
+    *Dockerfile*:
+
     ```dockerfile
     FROM flink
 
@@ -319,6 +327,8 @@ as described in [how to run the Flink image](#how-to-run-flink-image).
     ENV VAR_NAME value
     ```
 
+    **Commands for building**:
+
     ```sh
     docker build -t custom_flink_image .
     # optional push to your docker image registry if you have it,
@@ -397,6 +407,7 @@ The next chapters show examples of configuration files to run Flink.
 ### Session Cluster with Docker Compose
 
 **docker-compose.yml:**
+
 ```yaml
 version: "2.2"
 services:
@@ -430,6 +441,7 @@ See also [how to specify the Flink Master arguments](#flink-master-additional-co
 in the `command` for the `jobmanager` service.
 
 **docker-compose.yml:**
+
 ```yaml
 version: "2.2"
 services:
diff --git a/docs/ops/deployment/index.md b/docs/ops/deployment/index.md
index 9824541..6a856b2 100644
--- a/docs/ops/deployment/index.md
+++ b/docs/ops/deployment/index.md
@@ -119,7 +119,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         Run Flink locally for basic testing and experimentation
-        <br><a href="{{ site.baseurl }}/ops/deployment/local.html">Learn more</a>
+        <br><a href="{% link ops/deployment/local.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -130,7 +130,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         A simple solution for running Flink on bare metal or VM's 
-        <br><a href="{{ site.baseurl }}/ops/deployment/cluster_setup.html">Learn more</a>
+        <br><a href="{% link ops/deployment/cluster_setup.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -141,7 +141,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         Deploy Flink on-top of Apache Hadoop's resource manager 
-        <br><a href="{{ site.baseurl }}/ops/deployment/yarn_setup.html">Learn more</a>
+        <br><a href="{% link ops/deployment/yarn_setup.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -154,7 +154,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         A generic resource manager for running distriubted systems
-        <br><a href="{{ site.baseurl }}/ops/deployment/mesos.html">Learn more</a>
+        <br><a href="{% link ops/deployment/mesos.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -165,7 +165,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         A popular solution for running Flink within a containerized environment
-        <br><a href="{{ site.baseurl }}/ops/deployment/docker.html">Learn more</a>
+        <br><a href="{% link ops/deployment/docker.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -176,7 +176,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         An automated system for deploying containerized applications
-        <br><a href="{{ site.baseurl }}/ops/deployment/kubernetes.html">Learn more</a>
+        <br><a href="{% link ops/deployment/kubernetes.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -247,3 +247,58 @@ Supported Environments:
 <span class="label label-primary">Azure</span>
 <span class="label label-primary">Google Cloud</span>
 <span class="label label-primary">On-Premise</span>
+
+## Deployment Best Practices
+
+### How to provide dependencies in the classpath
+
+Flink provides several approaches for providing dependencies (such as `*.jar` files or static data) to Flink or user-provided
+applications. These approaches differ based on the deployment mode and target, but also have commonalities, which are described here.
+
+To provide a dependency, there are the following options:
+- files in the **`lib/` folder** are added to the classpath used to start Flink. It is suitable for libraries such as Hadoop or file systems not available as plugins. Beware that classes added here can potentially interfere with Flink, for example if you are adding a different version of a library already provided by Flink.
+
+- **`plugins/<name>/`** are loaded at runtime by Flink through separate classloaders to avoid conflicts with classes loaded and used by Flink. Only jar files which are prepared as [plugins]({% link ops/plugins.md %}) can be added here.
+
+### Download Maven dependencies locally
+
+If you need to extend the Flink with a Maven dependency (and its transitive dependencies),
+you can use an [Apache Maven](https://maven.apache.org) *pom.xml* file to download all required files into a local folder:
+
+*pom.xml*:
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+  <modelVersion>4.0.0</modelVersion>
+  <groupId>org.apache.flink</groupId>
+  <artifactId>docker-dependencies</artifactId>
+  <version>1.0-SNAPSHOT</version>
+
+  <dependencies>
+        <!-- Put your dependency here, for example a Hadoop GCS connector -->
+  </dependencies>
+
+  <build>
+      <plugins>
+        <plugin>
+          <groupId>org.apache.maven.plugins</groupId>
+          <artifactId>maven-dependency-plugin</artifactId>
+          <version>3.1.2</version>
+          <executions>
+            <execution>
+              <id>copy-dependencies</id>
+              <phase>package</phase>
+              <goals><goal>copy-dependencies</goal></goals>
+              <configuration><outputDirectory>jars</outputDirectory></configuration>
+            </execution>
+          </executions>
+        </plugin>
+      </plugins>
+  </build>
+</project>
+```
+
+Running `mvn package` in the same directory will create a `jars/` folder containing all the jar files, 
+which you can add to the desired folder, Docker image etc.
diff --git a/docs/ops/deployment/index.zh.md b/docs/ops/deployment/index.zh.md
index 53b4558..60dca63 100644
--- a/docs/ops/deployment/index.zh.md
+++ b/docs/ops/deployment/index.zh.md
@@ -119,7 +119,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         Run Flink locally for basic testing and experimentation
-        <br><a href="{{ site.baseurl }}/ops/deployment/local.html">Learn more</a>
+        <br><a href="{% link ops/deployment/local.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -130,7 +130,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         A simple solution for running Flink on bare metal or VM's 
-        <br><a href="{{ site.baseurl }}/ops/deployment/cluster_setup.html">Learn more</a>
+        <br><a href="{% link ops/deployment/cluster_setup.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -140,8 +140,8 @@ Apache Flink ships with first class support for a number of common deployment ta
         <b>Yarn</b>
       </div>
       <div class="panel-body">
-        Deploy Flink on-top Apache Hadoop's resource manager 
-        <br><a href="{{ site.baseurl }}/ops/deployment/yarn_setup.html">Learn more</a>
+        Deploy Flink on-top of Apache Hadoop's resource manager 
+        <br><a href="{% link ops/deployment/yarn_setup.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -154,7 +154,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         A generic resource manager for running distriubted systems
-        <br><a href="{{ site.baseurl }}/ops/deployment/mesos.html">Learn more</a>
+        <br><a href="{% link ops/deployment/mesos.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -165,7 +165,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         A popular solution for running Flink within a containerized environment
-        <br><a href="{{ site.baseurl }}/ops/deployment/docker.html">Learn more</a>
+        <br><a href="{% link ops/deployment/docker.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -176,7 +176,7 @@ Apache Flink ships with first class support for a number of common deployment ta
       </div>
       <div class="panel-body">
         An automated system for deploying containerized applications
-        <br><a href="{{ site.baseurl }}/ops/deployment/kubernetes.html">Learn more</a>
+        <br><a href="{% link ops/deployment/kubernetes.md %}">Learn more</a>
       </div>
     </div>
   </div>
@@ -247,3 +247,58 @@ Supported Environments:
 <span class="label label-primary">Azure</span>
 <span class="label label-primary">Google Cloud</span>
 <span class="label label-primary">On-Premise</span>
+
+## Deployment Best Practices
+
+### How to provide dependencies in the classpath
+
+Flink provides several approaches for providing dependencies (such as `*.jar` files or static data) to Flink or user-provided
+applications. These approaches differ based on the deployment mode and target, but also have commonalities, which are described here.
+
+To provide a dependency, there are the following options:
+- files in the **`lib/` folder** are added to the classpath used to start Flink. It is suitable for libraries such as Hadoop or file systems not available as plugins. Beware that classes added here can potentially interfere with Flink, for example if you are adding a different version of a library already provided by Flink.
+
+- **`plugins/<name>/`** are loaded at runtime by Flink through separate classloaders to avoid conflicts with classes loaded and used by Flink. Only jar files which are prepared as [plugins]({% link ops/plugins.md %}) can be added here.
+
+### Download Maven dependencies locally
+
+If you need to extend the Flink with a Maven dependency (and its transitive dependencies),
+you can use an [Apache Maven](https://maven.apache.org) *pom.xml* file to download all required files into a local folder:
+
+*pom.xml*:
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+  <modelVersion>4.0.0</modelVersion>
+  <groupId>org.apache.flink</groupId>
+  <artifactId>docker-dependencies</artifactId>
+  <version>1.0-SNAPSHOT</version>
+
+  <dependencies>
+        <!-- Put your dependency here, for example a Hadoop GCS connector -->
+  </dependencies>
+
+  <build>
+      <plugins>
+        <plugin>
+          <groupId>org.apache.maven.plugins</groupId>
+          <artifactId>maven-dependency-plugin</artifactId>
+          <version>3.1.2</version>
+          <executions>
+            <execution>
+              <id>copy-dependencies</id>
+              <phase>package</phase>
+              <goals><goal>copy-dependencies</goal></goals>
+              <configuration><outputDirectory>jars</outputDirectory></configuration>
+            </execution>
+          </executions>
+        </plugin>
+      </plugins>
+  </build>
+</project>
+```
+
+Running `mvn package` in the same directory will create a `jars/` folder containing all the jar files, 
+which you can add to the desired folder, Docker image etc.
diff --git a/docs/ops/deployment/native_kubernetes.md b/docs/ops/deployment/native_kubernetes.md
index 31a12f4..930dcdc 100644
--- a/docs/ops/deployment/native_kubernetes.md
+++ b/docs/ops/deployment/native_kubernetes.md
@@ -103,7 +103,7 @@ $ ./bin/flink run -d -t kubernetes-session -Dkubernetes.cluster-id=<ClusterId> e
 ### Accessing Job Manager UI
 
 There are several ways to expose a Service onto an external (outside of your cluster) IP address.
-This can be configured using `kubernetes.service.exposed.type`.
+This can be configured using [`kubernetes.rest-service.exposed.type`]({% link ops/config.md %}#kubernetes-rest-service-exposed-type).
 
 - `ClusterIP`: Exposes the service on a cluster-internal IP.
 The Service is only reachable within the cluster. If you want to access the Job Manager ui or submit job to the existing session, you need to start a local proxy.
@@ -116,10 +116,12 @@ $ kubectl port-forward service/<ServiceName> 8081
 - `NodePort`: Exposes the service on each Node’s IP at a static port (the `NodePort`). `<NodeIP>:<NodePort>` could be used to contact the Job Manager Service. `NodeIP` could be easily replaced with Kubernetes ApiServer address.
 You could find it in your kube config file.
 
-- `LoadBalancer`: Default value, exposes the service externally using a cloud provider’s load balancer.
+- `LoadBalancer`: Exposes the service externally using a cloud provider’s load balancer.
 Since the cloud provider and Kubernetes needs some time to prepare the load balancer, you may get a `NodePort` JobManager Web Interface in the client log.
 You can use `kubectl get services/<ClusterId>` to get EXTERNAL-IP and then construct the load balancer JobManager Web Interface manually `http://<EXTERNAL-IP>:8081`.
 
+  <span class="label label-warning">Warning!</span> Your JobManager (which can run arbitary jar files) might be exposed to the public internet, without authentication.
+
 - `ExternalName`: Map a service to a DNS name, not supported in current version.
 
 Please reference the official documentation on [publishing services in Kubernetes](https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types) for more information.