You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by mb...@apache.org on 2015/06/16 13:02:37 UTC

[1/3] flink git commit: [FLINK-2221] [docs] Docs for not using local filesystem on the cluster as state backup

Repository: flink
Updated Branches:
  refs/heads/master af4481c7a -> 20d7e5ad3


[FLINK-2221] [docs] Docs for not using local filesystem on the cluster as state backup

This is just a clear documentation of the problem for the 0.9 release.

Closes #839


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/21605d6e
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/21605d6e
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/21605d6e

Branch: refs/heads/master
Commit: 21605d6e896edfda28671665be26cb72eb739a15
Parents: af4481c
Author: mbalassi <mb...@apache.org>
Authored: Mon Jun 15 18:53:08 2015 +0200
Committer: mbalassi <mb...@apache.org>
Committed: Tue Jun 16 12:59:51 2015 +0200

----------------------------------------------------------------------
 docs/apis/streaming_guide.md                  | 4 ++--
 flink-dist/src/main/resources/flink-conf.yaml | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/21605d6e/docs/apis/streaming_guide.md
----------------------------------------------------------------------
diff --git a/docs/apis/streaming_guide.md b/docs/apis/streaming_guide.md
index 0a7a486..9c32903 100644
--- a/docs/apis/streaming_guide.md
+++ b/docs/apis/streaming_guide.md
@@ -1194,7 +1194,7 @@ For example when implementing a rolling count over the stream Flink gives you th
 
 Checkpointing can be enabled from the `StreamExecutionEnvironment` using the `enableCheckpointing(…)` where additional parameters can be passed to modify the default 5 second checkpoint interval.
 
-By default state checkpoints will be stored in-memory at the JobManager. Flink also supports storing the checkpoints on any flink-supported file system (such as HDFS or Tachyon) which can be set in the flink-conf.yaml. 
+By default state checkpoints will be stored in-memory at the JobManager. Flink also supports storing the checkpoints on any flink-supported file system (such as HDFS or Tachyon) which can be set in the flink-conf.yaml. Note that the state backend must be accessible from the JobManager, use `file://` only for local setups.
 
 For example let us write a reduce function that besides summing the data it also counts have many elements it has seen.
 
@@ -1210,7 +1210,7 @@ public class CounterSum implements ReduceFunction<Long>, CheckpointedAsynchronou
         return value1 + value2;
     }
 
-    // regurarly persists state during normal operation 
+    // regularly persists state during normal operation
     @Override
     public Serializable snapshotState(long checkpointId, long checkpointTimestamp) throws Exception {
         return new Long(counter);

http://git-wip-us.apache.org/repos/asf/flink/blob/21605d6e/flink-dist/src/main/resources/flink-conf.yaml
----------------------------------------------------------------------
diff --git a/flink-dist/src/main/resources/flink-conf.yaml b/flink-dist/src/main/resources/flink-conf.yaml
index f0c0d56..a258815 100644
--- a/flink-dist/src/main/resources/flink-conf.yaml
+++ b/flink-dist/src/main/resources/flink-conf.yaml
@@ -59,6 +59,8 @@ webclient.port: 8080
 state.backend: jobmanager
 
 # Directory for storing checkpoints in a flink supported filesystem
+# Note: State backend must be accessible from the JobManager, use file://
+# only for local setups. 
 #
 # state.backend.fs.checkpointdir: hdfs://checkpoints
 


[3/3] flink git commit: [docs] Update obsolate cluster execution guide

Posted by mb...@apache.org.
[docs] Update obsolate cluster execution guide

Closes #835


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/20d7e5ad
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/20d7e5ad
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/20d7e5ad

Branch: refs/heads/master
Commit: 20d7e5ad3be9dcb9be4049e354f5aadf1cb08df1
Parents: be50795
Author: mbalassi <mb...@apache.org>
Authored: Sun Jun 14 22:24:52 2015 +0200
Committer: mbalassi <mb...@apache.org>
Committed: Tue Jun 16 13:01:17 2015 +0200

----------------------------------------------------------------------
 docs/apis/cluster_execution.md | 67 +------------------------------------
 1 file changed, 1 insertion(+), 66 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/20d7e5ad/docs/apis/cluster_execution.md
----------------------------------------------------------------------
diff --git a/docs/apis/cluster_execution.md b/docs/apis/cluster_execution.md
index 7193cf6..f9844d7 100644
--- a/docs/apis/cluster_execution.md
+++ b/docs/apis/cluster_execution.md
@@ -60,7 +60,7 @@ The following illustrates the use of the `RemoteEnvironment`:
 ~~~java
 public static void main(String[] args) throws Exception {
     ExecutionEnvironment env = ExecutionEnvironment
-        .createRemoteEnvironment("strato-master", "7661", "/home/user/udfs.jar");
+        .createRemoteEnvironment("flink-master", 6123, "/home/user/udfs.jar");
 
     DataSet<String> data = env.readTextFile("hdfs://path/to/file");
 
@@ -80,71 +80,6 @@ Note that the program contains custom user code and hence requires a JAR file wi
 the classes of the code attached. The constructor of the remote environment
 takes the path(s) to the JAR file(s).
 
-## Remote Executor
-
-Similar to the RemoteEnvironment, the RemoteExecutor lets you execute
-Flink programs on a cluster directly. The remote executor accepts a
-*Plan* object, which describes the program as a single executable unit.
-
-### Maven Dependency
-
-If you are developing your program in a Maven project, you have to add the
-`flink-clients` module using this dependency:
-
-~~~xml
-<dependency>
-  <groupId>org.apache.flink</groupId>
-  <artifactId>flink-clients</artifactId>
-  <version>{{ site.version }}</version>
-</dependency>
-~~~
-
-### Example
-
-The following illustrates the use of the `RemoteExecutor` with the Scala API:
-
-~~~scala
-def main(args: Array[String]) {
-    val input = TextFile("hdfs://path/to/file")
-
-    val words = input flatMap { _.toLowerCase().split("""\W+""") filter { _ != "" } }
-    val counts = words groupBy { x => x } count()
-
-    val output = counts.write(wordsOutput, CsvOutputFormat())
-  
-    val plan = new ScalaPlan(Seq(output), "Word Count")
-    val executor = new RemoteExecutor("strato-master", 7881, "/path/to/jarfile.jar")
-    executor.executePlan(p);
-}
-~~~
-
-The following illustrates the use of the `RemoteExecutor` with the Java API (as
-an alternative to the RemoteEnvironment):
-
-~~~java
-public static void main(String[] args) throws Exception {
-    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-    DataSet<String> data = env.readTextFile("hdfs://path/to/file");
-
-    data
-        .filter(new FilterFunction<String>() {
-            public boolean filter(String value) {
-                return value.startsWith("http://");
-            }
-        })
-        .writeAsText("hdfs://path/to/result");
-
-    Plan p = env.createProgramPlan();
-    RemoteExecutor e = new RemoteExecutor("strato-master", 7881, "/path/to/jarfile.jar");
-    e.executePlan(p);
-}
-~~~
-
-Note that the program contains custom UDFs and hence requires a JAR file with
-the classes of the code attached. The constructor of the remote executor takes
-the path(s) to the JAR file(s).
-
 ## Linking with modules not contained in the binary distribution
 
 The binary distribution contains jar packages in the `lib` folder that are automatically


[2/3] flink git commit: [FLINK-2209] [docs] Document linking with jars not in the binary dist

Posted by mb...@apache.org.
[FLINK-2209] [docs] Document linking with jars not in the binary dist


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/be507951
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/be507951
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/be507951

Branch: refs/heads/master
Commit: be5079518e703ad958fa4eda9c45f4abf4d0f090
Parents: 21605d6
Author: mbalassi <mb...@apache.org>
Authored: Sun Jun 14 22:21:43 2015 +0200
Committer: mbalassi <mb...@apache.org>
Committed: Tue Jun 16 13:00:44 2015 +0200

----------------------------------------------------------------------
 docs/apis/cluster_execution.md | 72 +++++++++++++++++++++++++++++++++++++
 docs/apis/streaming_guide.md   | 12 +++++--
 docs/libs/gelly_guide.md       |  2 ++
 docs/libs/ml/index.md          |  2 ++
 docs/libs/table.md             |  2 ++
 5 files changed, 87 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/be507951/docs/apis/cluster_execution.md
----------------------------------------------------------------------
diff --git a/docs/apis/cluster_execution.md b/docs/apis/cluster_execution.md
index c2b3c27..7193cf6 100644
--- a/docs/apis/cluster_execution.md
+++ b/docs/apis/cluster_execution.md
@@ -144,3 +144,75 @@ public static void main(String[] args) throws Exception {
 Note that the program contains custom UDFs and hence requires a JAR file with
 the classes of the code attached. The constructor of the remote executor takes
 the path(s) to the JAR file(s).
+
+## Linking with modules not contained in the binary distribution
+
+The binary distribution contains jar packages in the `lib` folder that are automatically
+provided to the classpath of your distrbuted programs. Almost all of Flink classes are
+located there with a few exceptions, for example the streaming connectors and some freshly
+added modules. To run code depending on these modules you need to make them accessible
+during runtime, for which we suggest two options:
+
+1. Either copy the required jar files to the `lib` folder onto all of your TaskManagers.
+Note that you have to restar your TaskManagers after this.
+2. Or package them with your usercode.
+
+The latter version is recommended as it respects the classloader management in Flink.
+
+### Packaging dependencies with your usercode with Maven
+
+To provide these dependencies not included by Flink we suggest two options with Maven.
+
+1. The maven assembly plugin builds a so called fat jar cointaining all your dependencies.
+Assembly configuration is straight-forward, but the resulting jar might become bulky. See 
+[usage](http://maven.apache.org/plugins/maven-assembly-plugin/usage.html).
+2. The maven unpack plugin, for unpacking the relevant parts of the dependencies and
+then package it with your code.
+
+Using the latter approach in order to bundle the Kafka connector, `flink-connector-kafka`
+you would need to add the classes from both the connector and the Kafka API itself. Add
+the following to your plugins section.
+
+~~~xml
+<plugin>
+    <groupId>org.apache.maven.plugins</groupId>
+    <artifactId>maven-dependency-plugin</artifactId>
+    <version>2.9</version>
+    <executions>
+        <execution>
+            <id>unpack</id>
+            <!-- executed just before the package phase -->
+            <phase>prepare-package</phase>
+            <goals>
+                <goal>unpack</goal>
+            </goals>
+            <configuration>
+                <artifactItems>
+                    <!-- For Flink connector classes -->
+                    <artifactItem>
+                        <groupId>org.apache.flink</groupId>
+                        <artifactId>flink-connector-kafka</artifactId>
+                        <version>{{ site.version }}</version>
+                        <type>jar</type>
+                        <overWrite>false</overWrite>
+                        <outputDirectory>${project.build.directory}/classes</outputDirectory>
+                        <includes>org/apache/flink/**</includes>
+                    </artifactItem>
+                    <!-- For Kafka API classes -->
+                    <artifactItem>
+                        <groupId>org.apache.kafka</groupId>
+                        <artifactId>kafka_<YOUR_SCALA_VERSION></artifactId>
+                        <version><YOUR_KAFKA_VERSION></version>
+                        <type>jar</type>
+                        <overWrite>false</overWrite>
+                        <outputDirectory>${project.build.directory}/classes</outputDirectory>
+                        <includes>kafka/**</includes>
+                    </artifactItem>
+                </artifactItems>
+            </configuration>
+        </execution>
+    </executions>
+</plugin>
+~~~
+
+Now when running `mvn clean package` the produced jar includes the required dependencies.

http://git-wip-us.apache.org/repos/asf/flink/blob/be507951/docs/apis/streaming_guide.md
----------------------------------------------------------------------
diff --git a/docs/apis/streaming_guide.md b/docs/apis/streaming_guide.md
index 9c32903..e9fc264 100644
--- a/docs/apis/streaming_guide.md
+++ b/docs/apis/streaming_guide.md
@@ -1377,11 +1377,13 @@ This connector provides access to data streams from [Apache Kafka](https://kafka
 {% highlight xml %}
 <dependency>
   <groupId>org.apache.flink</groupId>
-  <artifactId>flink-kafka-connector</artifactId>
+  <artifactId>flink-connector-kafka</artifactId>
   <version>{{site.version }}</version>
 </dependency>
 {% endhighlight %}
 
+Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 #### Installing Apache Kafka
 * Follow the instructions from [Kafka's quickstart](https://kafka.apache.org/documentation.html#quickstart) to download the code and launch a server (launching a Zookeeper and a Kafka server is required every time before starting the application).
 * On 32 bit computers [this](http://stackoverflow.com/questions/22325364/unrecognized-vm-option-usecompressedoops-when-running-kafka-from-my-ubuntu-in) problem may occur. 
@@ -1513,11 +1515,13 @@ This connector provides access to data streams from [RabbitMQ](http://www.rabbit
 {% highlight xml %}
 <dependency>
   <groupId>org.apache.flink</groupId>
-  <artifactId>flink-rabbitmq-connector</artifactId>
+  <artifactId>flink-connector-rabbitmq</artifactId>
   <version>{{site.version }}</version>
 </dependency>
 {% endhighlight %}
 
+Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 #### Installing RabbitMQ
 Follow the instructions from the [RabbitMQ download page](http://www.rabbitmq.com/download.html). After the installation the server automatically starts, and the application connecting to RabbitMQ can be launched.
 
@@ -1585,11 +1589,13 @@ Twitter Streaming API provides opportunity to connect to the stream of tweets ma
 {% highlight xml %}
 <dependency>
   <groupId>org.apache.flink</groupId>
-  <artifactId>flink-twitter-connector</artifactId>
+  <artifactId>flink-connector-twitter</artifactId>
   <version>{{site.version }}</version>
 </dependency>
 {% endhighlight %}
 
+Note that the streaming connectors are currently not part of the binary distribution. See linking with them for cluster execution [here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 #### Authentication
 In order to connect to Twitter stream the user has to register their program and acquire the necessary information for the authentication. The process is described below.
 

http://git-wip-us.apache.org/repos/asf/flink/blob/be507951/docs/libs/gelly_guide.md
----------------------------------------------------------------------
diff --git a/docs/libs/gelly_guide.md b/docs/libs/gelly_guide.md
index 804efab..c788012 100644
--- a/docs/libs/gelly_guide.md
+++ b/docs/libs/gelly_guide.md
@@ -43,6 +43,8 @@ Add the following dependency to your `pom.xml` to use Gelly.
 </dependency>
 ~~~
 
+Note that Gelly is currently not part of the binary distribution. See linking with it for cluster execution [here](../apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 The remaining sections provide a description of available methods and present several examples of how to use Gelly and how to mix it with the Flink Java API. After reading this guide, you might also want to check the {% gh_link /flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/ "Gelly examples" %}.
 
 Graph Representation

http://git-wip-us.apache.org/repos/asf/flink/blob/be507951/docs/libs/ml/index.md
----------------------------------------------------------------------
diff --git a/docs/libs/ml/index.md b/docs/libs/ml/index.md
index 9ff7a4b..e81b354 100644
--- a/docs/libs/ml/index.md
+++ b/docs/libs/ml/index.md
@@ -69,6 +69,8 @@ Next, you have to add the FlinkML dependency to the `pom.xml` of your project.
 </dependency>
 {% endhighlight %}
 
+Note that FlinkML is currently not part of the binary distribution. See linking with it for cluster execution [here](../apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 Now you can start solving your analysis task.
 The following code snippet shows how easy it is to train a multiple linear regression model.
 

http://git-wip-us.apache.org/repos/asf/flink/blob/be507951/docs/libs/table.md
----------------------------------------------------------------------
diff --git a/docs/libs/table.md b/docs/libs/table.md
index 829c9cf..4db5a87 100644
--- a/docs/libs/table.md
+++ b/docs/libs/table.md
@@ -37,6 +37,8 @@ The following dependency must be added to your project when using the Table API:
 </dependency>
 {% endhighlight %}
 
+Note that the Table API is currently not part of the binary distribution. See linking with it for cluster execution [here](../apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
+
 ## Scala Table API
  
 The Table API can be enabled by importing `org.apache.flink.api.scala.table._`.  This enables