You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by al...@apache.org on 2015/10/09 12:32:05 UTC

[3/3] flink git commit: [FLINK-2779] Update documentation to reflect new Stream/Window API

[FLINK-2779] Update documentation to reflect new Stream/Window API


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/c9088a49
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/c9088a49
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/c9088a49

Branch: refs/heads/master
Commit: c9088a491a459e44be41b074054a11883abef5c8
Parents: 9a21ab1
Author: Kostas Tzoumas <ko...@gmail.com>
Authored: Thu Oct 1 19:54:27 2015 +0200
Committer: Aljoscha Krettek <al...@gmail.com>
Committed: Fri Oct 9 12:31:04 2015 +0200

----------------------------------------------------------------------
 docs/_includes/navbar.html     |   12 +-
 docs/apis/programming_guide.md |   42 +-
 docs/apis/streaming_guide.md   | 3745 ++++++++++++++++++++++++++---------
 docs/index.md                  |   24 +-
 docs/internals/general_arch.md |   46 +-
 5 files changed, 2893 insertions(+), 976 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/c9088a49/docs/_includes/navbar.html
----------------------------------------------------------------------
diff --git a/docs/_includes/navbar.html b/docs/_includes/navbar.html
index 1ea3916..d39b330 100644
--- a/docs/_includes/navbar.html
+++ b/docs/_includes/navbar.html
@@ -76,22 +76,22 @@ under the License.
             <li class="dropdown{% if page.url contains '/apis/' %} active{% endif %}">
               <a href="{{ apis }}" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Programming Guides <span class="caret"></span></a>
               <ul class="dropdown-menu" role="menu">
-                <li><a href="{{ apis }}/programming_guide.html"><strong>Batch: DataSet API</strong></a></li>
-                <li><a href="{{ apis }}/streaming_guide.html"><strong>Streaming: DataStream API</strong> <span class="badge">Beta</span></a></li>
+                <li><a href="{{ apis }}/programming_guide.html"><strong>DataSet API</strong></a></li>
+                <li><a href="{{ apis }}/streaming_guide.html"><strong>DataStream API</strong></a></li>
                 <li><a href="{{ apis }}/python.html">Python API <span class="badge">Beta</span></a></li>
 
                 <li class="divider"></li>
                 <li><a href="{{ apis }}/scala_shell.html">Interactive Scala Shell</a></li>
-                <li><a href="{{ apis }}/dataset_transformations.html">Dataset Transformations</a></li>
+                <li><a href="{{ apis }}/dataset_transformations.html">DataSet Transformations</a></li>
                 <li><a href="{{ apis }}/best_practices.html">Best Practices</a></li>
-                <li><a href="{{ apis }}/example_connectors.html">Connectors (Batch)</a></li>
-                <li><a href="{{ apis }}/kafka.html">Kafka Connector <span class="badge">Beta</span></a></li>
+                <li><a href="{{ apis }}/example_connectors.html">Connectors (DataSet API)</a></li>
+                <!--<li><a href="{{ apis }}/kafka.html">Kafka Connector <span class="badge">Beta</span></a></li>-->
                 <li><a href="{{ apis }}/examples.html">Examples</a></li>
                 <li><a href="{{ apis }}/local_execution.html">Local Execution</a></li>
                 <li><a href="{{ apis }}/cluster_execution.html">Cluster Execution</a></li>
                 <li><a href="{{ apis }}/cli.html">Command Line Interface</a></li>
                 <li><a href="{{ apis }}/web_client.html">Web Client</a></li>
-                <li><a href="{{ apis }}/iterations.html">Iterations</a></li>
+                <li><a href="{{ apis }}/iterations.html">Iterations (DataSet API)</a></li>
                 <li><a href="{{ apis }}/java8.html">Java 8</a></li>
                 <li><a href="{{ apis }}/hadoop_compatibility.html">Hadoop Compatibility <span class="badge">Beta</span></a></li>
                 <li><a href="{{ apis }}/storm_compatibility.html">Storm Compatibility <span class="badge">Beta</span></a></li>

http://git-wip-us.apache.org/repos/asf/flink/blob/c9088a49/docs/apis/programming_guide.md
----------------------------------------------------------------------
diff --git a/docs/apis/programming_guide.md b/docs/apis/programming_guide.md
index da141a9..ad29ac3 100644
--- a/docs/apis/programming_guide.md
+++ b/docs/apis/programming_guide.md
@@ -1,5 +1,5 @@
 ---
-title: "Flink Programming Guide"
+title: "Flink DataSet API Programming Guide"
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -22,14 +22,14 @@ under the License.
 
 <a href="#top"></a>
 
-Analysis programs in Flink are regular programs that implement transformations on data sets
+DataSet programs in Flink are regular programs that implement transformations on data sets
 (e.g., filtering, mapping, joining, grouping). The data sets are initially created from certain
 sources (e.g., by reading files, or from local collections). Results are returned via sinks, which may for
 example write the data to (distributed) files, or to standard output (for example the command line
 terminal). Flink programs run in a variety of contexts, standalone, or embedded in other programs.
 The execution can happen in a local JVM, or on clusters of many machines.
 
-In order to create your own Flink program, we encourage you to start with the
+In order to create your own Flink DataSet program, we encourage you to start with the
 [program skeleton](#program-skeleton) and gradually add your own
 [transformations](#transformations). The remaining sections act as references for additional
 operations and advanced features.
@@ -221,7 +221,7 @@ Program Skeleton
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
 
-As we already saw in the example, Flink programs look like regular Java
+As we already saw in the example, Flink DataSet programs look like regular Java
 programs with a `main()` method. Each program consists of the same basic parts:
 
 1. Obtain an `ExecutionEnvironment`,
@@ -233,7 +233,7 @@ programs with a `main()` method. Each program consists of the same basic parts:
 We will now give an overview of each of those steps, please refer to the respective sections for
 more details. Note that all core classes of the Java API are found in the package {% gh_link /flink-java/src/main/java/org/apache/flink/api/java "org.apache.flink.api.java" %}.
 
-The `ExecutionEnvironment` is the basis for all Flink programs. You can
+The `ExecutionEnvironment` is the basis for all Flink DataSet programs. You can
 obtain one using these static methods on class `ExecutionEnvironment`:
 
 {% highlight java %}
@@ -253,7 +253,7 @@ Typically, you only need to use `getExecutionEnvironment()`, since this
 will do the right thing depending on the context: if you are executing
 your program inside an IDE or as a regular Java program it will create
 a local environment that will execute your program on your local machine. If
-you created a JAR file from you program, and invoke it through the [command line](cli.html)
+you created a JAR file from your program, and invoke it through the [command line](cli.html)
 or the [web interface](web_client.html),
 the Flink cluster manager will execute your main method and `getExecutionEnvironment()` will return
 an execution environment for executing your program on a cluster.
@@ -276,7 +276,7 @@ more information on data sources and input formats, please refer to
 Once you have a DataSet you can apply transformations to create a new
 DataSet which you can then write to a file, transform again, or
 combine with other DataSets. You apply transformations by calling
-methods on DataSet with your own custom transformation function. For example,
+methods on DataSet with your own custom transformation functions. For example,
 a map transformation looks like this:
 
 {% highlight java %}
@@ -447,18 +447,14 @@ accessed from the `getLastJobExecutionResult()` method.
 DataSet abstraction
 ---------------
 
-The batch processing APIs of Flink are centered around the `DataSet` abstraction. A `DataSet` is only
-an abstract representation of a set of data that can contain duplicates.
-
-Also note that Flink is not always physically creating (materializing) each DataSet at runtime. This 
-depends on the used runtime, the configuration and optimizer decisions.
-
-The Flink runtime does not need to always materialize the DataSets because it is using a streaming runtime model.
-
-DataSets are only materialized to avoid distributed deadlocks (at points where the data flow graph branches out and joins again later) or if the execution mode has explicitly been set to a batched execution.
+A `DataSet` is an abstract representation of a finite immutable collection of data of the same type that may contain duplicates.
 
-When using Flink on Tez, all DataSets are materialized.
+Note that Flink is not always physically creating (materializing) each DataSet at runtime. This
+depends on the used runtime, the configuration and optimizer decisions. DataSets may be "streamed through" 
+operations during execution, as under the hood Flink uses a streaming data processing engine.
 
+Some DataSets are materialized automatically to avoid distributed deadlocks (at points where the data flow graph branches
+out and joins again later) or if the execution mode has explicitly been set to blocking execution.
 
 [Back to top](#top)
 
@@ -466,7 +462,7 @@ When using Flink on Tez, all DataSets are materialized.
 Lazy Evaluation
 ---------------
 
-All Flink programs are executed lazily: When the program's main method is executed, the data loading
+All Flink DataSet programs are executed lazily: When the program's main method is executed, the data loading
 and transformations do not happen directly. Rather, each operation is created and added to the
 program's plan. The operations are actually executed when the execution is explicitly triggered by 
 an `execute()` call on the ExecutionEnvironment object. Also, `collect()` and `print()` will trigger
@@ -1323,7 +1319,7 @@ data.map (new MyMapFunction());
 
 #### Anonymous classes
 
-You can pass a function as an anonmymous class:
+You can pass a function as an anonymous class:
 {% highlight java %}
 data.map(new MapFunction<String, Integer> () {
   public Integer map(String value) { return Integer.parseInt(value); }
@@ -1451,7 +1447,7 @@ for a complete example.
 Data Types
 ----------
 
-Flink places some restrictions on the type of elements that are used in DataSets and as results
+Flink places some restrictions on the type of elements that are used in DataSets and in results
 of transformations. The reason for this is that the system analyzes the types to determine
 efficient execution strategies.
 
@@ -1473,7 +1469,7 @@ Tuples are composite types that contain a fixed number of fields with various ty
 The Java API provides classes from `Tuple1` up to `Tuple25`. Every field of a tuple
 can be an arbitrary Flink type including further tuples, resulting in nested tuples. Fields of a
 tuple can be accessed directly using the field's name as `tuple.f4`, or using the generic getter method
-`tuple.getField(int position)`. The field indicies start at 0. Note that this stands in contrast
+`tuple.getField(int position)`. The field indices start at 0. Note that this stands in contrast
 to the Scala tuples, but it is more consistent with Java's general indexing.
 
 {% highlight java %}
@@ -2239,7 +2235,7 @@ result data. This section give some hints how to ease the development of Flink p
 ### Local Execution Environment
 
 A `LocalEnvironment` starts a Flink system within the same JVM process it was created in. If you
-start the LocalEnvironement from an IDE, you can set breakpoint in your code and easily debug your
+start the LocalEnvironement from an IDE, you can set breakpoints in your code and easily debug your
 program.
 
 A LocalEnvironment is created and used as follows:
@@ -2957,7 +2953,7 @@ public static final class Tokenizer extends RichFlatMapFunction<String, Tuple2<S
 
 [Back to top](#top)
 
-Program Packaging & Distributed Execution
+Program Packaging and Distributed Execution
 -----------------------------------------
 
 As described in the [program skeleton](#program-skeleton) section, Flink programs can be executed on