You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by da...@apache.org on 2016/03/04 19:11:33 UTC
[38/50] [abbrv] incubator-beam git commit: [flink] update README
[flink] update README
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/ceb2c87f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/ceb2c87f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/ceb2c87f
Branch: refs/heads/master
Commit: ceb2c87f8f749cb4db0582b9f1abc15c4da752fd
Parents: 28fcfd7
Author: Maximilian Michels <mx...@apache.org>
Authored: Wed Mar 2 23:51:38 2016 +0100
Committer: Davor Bonaci <da...@users.noreply.github.com>
Committed: Fri Mar 4 10:04:23 2016 -0800
----------------------------------------------------------------------
runners/flink/README.md | 60 ++++++++++++++++++++++----------------------
1 file changed, 30 insertions(+), 30 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/ceb2c87f/runners/flink/README.md
----------------------------------------------------------------------
diff --git a/runners/flink/README.md b/runners/flink/README.md
index 499ed6d..0fee6f0 100644
--- a/runners/flink/README.md
+++ b/runners/flink/README.md
@@ -1,17 +1,17 @@
-Flink-Dataflow
---------------
+Flink Beam Runner (Flink-Runner)
+-------------------------------
-Flink-Dataflow is a Runner for Google Dataflow (aka Apache Beam) which enables you to
-run Dataflow programs with Flink. It integrates seamlessly with the Dataflow
-API, allowing you to execute Dataflow programs in streaming or batch mode.
+Flink-Runner is a Runner for Apache Beam which enables you to
+run Beam dataflows with Flink. It integrates seamlessly with the Beam
+API, allowing you to execute Apache Beam programs in streaming or batch mode.
## Streaming
-### Full Dataflow Windowing and Triggering Semantics
+### Full Beam Windowing and Triggering Semantics
-The Flink Dataflow Runner supports *Event Time* allowing you to analyze data with respect to its
+The Flink Beam Runner supports *Event Time* allowing you to analyze data with respect to its
associated timestamp. It handles out-or-order and late-arriving elements. You may leverage the full
-power of the Dataflow windowing semantics like *time-based*, *sliding*, *tumbling*, or *count*
+power of the Beam windowing semantics like *time-based*, *sliding*, *tumbling*, or *count*
windows. You may build *session* windows which allow you to keep track of events associated with
each other.
@@ -27,7 +27,7 @@ and sinks or use the provided support for Apache Kafka.
### Seamless integration
-To execute a Dataflow program in streaming mode, just enable streaming in the `PipelineOptions`:
+To execute a Beam program in streaming mode, just enable streaming in the `PipelineOptions`:
options.setStreaming(true);
@@ -52,7 +52,7 @@ and sinks.
## Features
-The Flink Dataflow Runner maintains as much compatibility with the Dataflow API as possible. We
+The Flink Beam Runner maintains as much compatibility with the Beam API as possible. We
support transformations on data like:
- Grouping
@@ -66,25 +66,25 @@ support transformations on data like:
# Getting Started
-To get started using Flink-Dataflow, we first need to install the latest version.
+To get started using the Flink Runner, we first need to install the latest version.
-## Install Flink-Dataflow ##
+## Install Flink-Runner ##
-To retrieve the latest version of Flink-Dataflow, run the following command
+To retrieve the latest version of Flink-Runner, run the following command
- git clone https://github.com/dataArtisans/flink-dataflow
+ git clone https://github.com/apache/incubator-beam
-Then switch to the newly created directory and run Maven to build the Dataflow runner:
+Then switch to the newly created directory and run Maven to build the Beam runner:
- cd flink-dataflow
+ cd incubator-beam
mvn clean install -DskipTests
-Flink-Dataflow is now installed in your local maven repository.
+Flink-Runner is now installed in your local maven repository.
## Executing an example
Next, let's run the classic WordCount example. It's semantically identically to
-the example provided with Google Dataflow. Only this time, we chose the
+the example provided with ApacheBeam. Only this time, we chose the
`FlinkPipelineRunner` to execute the WordCount on top of Flink.
Here's an excerpt from the WordCount class file:
@@ -113,15 +113,15 @@ Then let's run the included WordCount locally on your machine:
mvn exec:exec -Dinput=kinglear.txt -Doutput=wordcounts.txt
-Congratulations, you have run your first Google Dataflow program on top of Apache Flink!
+Congratulations, you have run your first ApacheBeam program on top of Apache Flink!
-# Running Dataflow programs on a Flink cluster
+# Running Beam programs on a Flink cluster
-You can run your Dataflow program on an Apache Flink cluster. Please start off by creating a new
+You can run your Beam program on an Apache Flink cluster. Please start off by creating a new
Maven project.
- mvn archetype:generate -DgroupId=com.mycompany.dataflow -DartifactId=dataflow-test \
+ mvn archetype:generate -DgroupId=com.mycompany.beam -DartifactId=beam-test \
-DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
The contents of the root `pom.xml` should be slightly changed aftewards (explanation below):
@@ -133,14 +133,14 @@ The contents of the root `pom.xml` should be slightly changed aftewards (explana
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
- <groupId>com.mycompany.dataflow</groupId>
- <artifactId>dataflow-test</artifactId>
+ <groupId>com.mycompany.beam</groupId>
+ <artifactId>beam-test</artifactId>
<version>1.0</version>
<dependencies>
<dependency>
- <groupId>com.dataartisans</groupId>
- <artifactId>flink-dataflow</artifactId>
+ <groupId>org.apache.beam</groupId>
+ <artifactId>flink-runner</artifactId>
<version>0.2</version>
</dependency>
</dependencies>
@@ -182,13 +182,13 @@ The contents of the root `pom.xml` should be slightly changed aftewards (explana
The following changes have been made:
-1. The Flink Dataflow Runner was added as a dependency.
+1. The Flink Beam Runner was added as a dependency.
2. The Maven Shade plugin was added to build a fat jar.
-A fat jar is necessary if you want to submit your Dataflow code to a Flink cluster. The fat jar
-includes your program code but also Dataflow code which is necessary during runtime. Note that this
-step is necessary because the Dataflow Runner is not part of Flink.
+A fat jar is necessary if you want to submit your Beam code to a Flink cluster. The fat jar
+includes your program code but also Beam code which is necessary during runtime. Note that this
+step is necessary because the Beam Runner is not part of Flink.
You can then build the jar using `mvn clean package`. Please submit the fat jar in the `target`
folder to the Flink cluster using the command-line utility like so: