You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by mb...@apache.org on 2015/06/16 13:02:38 UTC

[2/3] flink git commit: [FLINK-2209] [docs] Document linking with jars not in the binary dist

[FLINK-2209] [docs] Document linking with jars not in the binary dist


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/be507951
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/be507951
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/be507951

Branch: refs/heads/master
Commit: be5079518e703ad958fa4eda9c45f4abf4d0f090
Parents: 21605d6
Author: mbalassi <mb...@apache.org>
Authored: Sun Jun 14 22:21:43 2015 +0200
Committer: mbalassi <mb...@apache.org>
Committed: Tue Jun 16 13:00:44 2015 +0200

----------------------------------------------------------------------
 docs/apis/cluster_execution.md | 72 +++++++++++++++++++++++++++++++++++++
 docs/apis/streaming_guide.md   | 12 +++++--
 docs/libs/gelly_guide.md       |  2 ++
 docs/libs/ml/index.md          |  2 ++
 docs/libs/table.md             |  2 ++
 5 files changed, 87 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/be507951/docs/apis/cluster_execution.md
----------------------------------------------------------------------
diff --git a/docs/apis/cluster_execution.md b/docs/apis/cluster_execution.md
index c2b3c27..7193cf6 100644
--- a/docs/apis/cluster_execution.md
+++ b/docs/apis/cluster_execution.md
@@ -144,3 +144,75 @@ public static void main(String[] args) throws Exception {
 Note that the program contains custom UDFs and hence requires a JAR file with
 the classes of the code attached. The constructor of the remote executor takes
 the path(s) to the JAR file(s).
+
+## Linking with modules not contained in the binary distribution
+
+The binary distribution contains jar packages in the `lib` folder that are automatically
+provided to the classpath of your distrbuted programs. Almost all of Flink classes are
+located there with a few exceptions, for example the streaming connectors and some freshly
+added modules. To run code depending on these modules you need to make them accessible
+during runtime, for which we suggest two options:
+
+1. Either copy the required jar files to the `lib` folder onto all of your TaskManagers.
+Note that you have to restar your TaskManagers after this.
+2. Or package them with your usercode.
+
+The latter version is recommended as it respects the classloader management in Flink.
+
+### Packaging dependencies with your usercode with Maven
+
+To provide these dependencies not included by Flink we suggest two options with Maven.
+
+1. The maven assembly plugin builds a so called fat jar cointaining all your dependencies.
+Assembly configuration is straight-forward, but the resulting jar might become bulky. See 
+[usage](http://maven.apache.org/plugins/maven-assembly-plugin/usage.html).
+2. The maven unpack plugin, for unpacking the relevant parts of the dependencies and
+then package it with your code.
+
+Using the latter approach in order to bundle the Kafka connector, `flink-connector-kafka`
+you would need to add the classes from both the connector and the Kafka API itself. Add
+the following to your plugins section.
+
+~~~xml
+<plugin>
+    <groupId>org.apache.maven.plugins</groupId>
+    <artifactId>maven-dependency-plugin</artifactId>
+    <version>2.9</version>
+    <executions>
+        <execution>
+            <id>unpack</id>
+            <!-- executed just before the package phase -->
+            <phase>prepare-package</phase>
+            <goals>
+                <goal>unpack</goal>
+            </goals>
+            <configuration>
+                <artifactItems>
+                    <!-- For Flink connector classes -->
+                    <artifactItem>
+                        <groupId>org.apache.flink</groupId>
+                        <artifactId>flink-connector-kafka</artifactId>
+                        <version>{{ site.version }}</version>
+                        <type>jar</type>
+                        <overWrite>false</overWrite>
+                        <outputDirectory>${project.build.directory}/classes</outputDirectory>
+                        <includes>org/apache/flink/**</includes>
+                    </artifactItem>
+                    <!-- For Kafka API classes -->
+                    <artifactItem>
+                        <groupId>org.apache.kafka</groupId>
+                        <artifactId>kafka_<YOUR_SCALA_VERSION></artifactId>
+                        <version><YOUR_KAFKA_VERSION></version>
+                        <type>jar</type>
+                        <overWrite>false</overWrite>
+                        <outputDirectory>${project.build.directory}/classes</outputDirectory>
+                        <includes>kafka/**</includes>
+                    </artifactItem>
+                </artifactItems>
+            </configuration>
+        </execution>
+    </executions>
+</plugin>
+~~~
+
+Now when running `mvn clean package` the produced jar includes the required dependencies.

http://git-wip-us.apache.org/repos/asf/flink/blob/be507951/docs/apis/streaming_guide.md
----------------------------------------------------------------------
diff --git a/docs/apis/streaming_guide.md b/docs/apis/streaming_guide.md
index 9c32903..e9fc264 100644
--- a/docs/apis/streaming_guide.md
+++ b/docs/apis/streaming_guide.md
@@ -1377,11 +1377,13 @@ This connector provides access to data streams from [Apache Kafka](https://kafka
 {% highlight xml %}
 <dependency>
   <groupId>org.apache.flink</groupId>
-  <artifactId>flink-kafka-connector</artifactId>
+  <artifactId>flink-connector-kafka</artifactId>
   <version>{{site.version }}</version>
 </dependency>
 {% endhighlight %}
 
+Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 #### Installing Apache Kafka
 * Follow the instructions from [Kafka's quickstart](https://kafka.apache.org/documentation.html#quickstart) to download the code and launch a server (launching a Zookeeper and a Kafka server is required every time before starting the application).
 * On 32 bit computers [this](http://stackoverflow.com/questions/22325364/unrecognized-vm-option-usecompressedoops-when-running-kafka-from-my-ubuntu-in) problem may occur. 
@@ -1513,11 +1515,13 @@ This connector provides access to data streams from [RabbitMQ](http://www.rabbit
 {% highlight xml %}
 <dependency>
   <groupId>org.apache.flink</groupId>
-  <artifactId>flink-rabbitmq-connector</artifactId>
+  <artifactId>flink-connector-rabbitmq</artifactId>
   <version>{{site.version }}</version>
 </dependency>
 {% endhighlight %}
 
+Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 #### Installing RabbitMQ
 Follow the instructions from the [RabbitMQ download page](http://www.rabbitmq.com/download.html). After the installation the server automatically starts, and the application connecting to RabbitMQ can be launched.
 
@@ -1585,11 +1589,13 @@ Twitter Streaming API provides opportunity to connect to the stream of tweets ma
 {% highlight xml %}
 <dependency>
   <groupId>org.apache.flink</groupId>
-  <artifactId>flink-twitter-connector</artifactId>
+  <artifactId>flink-connector-twitter</artifactId>
   <version>{{site.version }}</version>
 </dependency>
 {% endhighlight %}
 
+Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 #### Authentication
 In order to connect to Twitter stream the user has to register their program and acquire the necessary information for the authentication. The process is described below.
 

http://git-wip-us.apache.org/repos/asf/flink/blob/be507951/docs/libs/gelly_guide.md
----------------------------------------------------------------------
diff --git a/docs/libs/gelly_guide.md b/docs/libs/gelly_guide.md
index 804efab..c788012 100644
--- a/docs/libs/gelly_guide.md
+++ b/docs/libs/gelly_guide.md
@@ -43,6 +43,8 @@ Add the following dependency to your `pom.xml` to use Gelly.
 </dependency>
 ~~~
 
+Note that Gelly is currently not part of the binary distribution. See linking with it for cluster execution [here](../apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 The remaining sections provide a description of available methods and present several examples of how to use Gelly and how to mix it with the Flink Java API. After reading this guide, you might also want to check the {% gh_link /flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/ "Gelly examples" %}.
 
 Graph Representation

http://git-wip-us.apache.org/repos/asf/flink/blob/be507951/docs/libs/ml/index.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/index.md b/docs/libs/ml/index.md
index 9ff7a4b..e81b354 100644
--- a/docs/libs/ml/index.md
+++ b/docs/libs/ml/index.md
@@ -69,6 +69,8 @@ Next, you have to add the FlinkML dependency to the `pom.xml` of your project.
 </dependency>
 {% endhighlight %}
 
+Note that FlinkML is currently not part of the binary distribution. See linking with it for cluster execution [here](../apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 Now you can start solving your analysis task.
 The following code snippet shows how easy it is to train a multiple linear regression model.
 

http://git-wip-us.apache.org/repos/asf/flink/blob/be507951/docs/libs/table.md
----------------------------------------------------------------------
diff --git a/docs/libs/table.md b/docs/libs/table.md
index 829c9cf..4db5a87 100644
--- a/docs/libs/table.md
+++ b/docs/libs/table.md
@@ -37,6 +37,8 @@ The following dependency must be added to your project when using the Table API:
 </dependency>
 {% endhighlight %}
 
+Note that the Table API is currently not part of the binary distribution. See linking with it for cluster execution [here](../apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 ## Scala Table API
  
 The Table API can be enabled by importing `org.apache.flink.api.scala.table._`.  This enables