You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by gr...@apache.org on 2016/09/21 14:40:33 UTC

flink git commit: [FLINK-4654] [docs] Small improvements to the docs.

Repository: flink
Updated Branches:
  refs/heads/master 4cc38fd36 -> 721220203


[FLINK-4654] [docs] Small improvements to the docs.

This closes #2525


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/72122020
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/72122020
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/72122020

Branch: refs/heads/master
Commit: 7212202036235f41e376872dc268735ba9ef81e9
Parents: 4cc38fd
Author: David Anderson <da...@alpinegizmo.com>
Authored: Tue Sep 20 14:54:44 2016 +0200
Committer: Greg Hogan <co...@greghogan.com>
Committed: Wed Sep 21 10:38:13 2016 -0400

----------------------------------------------------------------------
 docs/dev/datastream_api.md                |  6 +++---
 docs/dev/libs/cep.md                      |  9 +++++----
 docs/dev/state.md                         | 17 ++++++++---------
 docs/dev/state_backends.md                |  6 +++---
 docs/dev/windows.md                       |  4 ++--
 docs/quickstart/run_example_quickstart.md | 23 ++++++++++++-----------
 6 files changed, 33 insertions(+), 32 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/72122020/docs/dev/datastream_api.md
----------------------------------------------------------------------
diff --git a/docs/dev/datastream_api.md b/docs/dev/datastream_api.md
index 2dd7842..425dd6a 100644
--- a/docs/dev/datastream_api.md
+++ b/docs/dev/datastream_api.md
@@ -357,7 +357,7 @@ windowedStream.reduce (new ReduceFunction<Tuple2<String,Integer>() {
                The example function, when applied on the sequence (1,2,3,4,5),
                folds the sequence into the string "start-1-2-3-4-5":</p>
     {% highlight java %}
-windowedStream.fold("start-", new FoldFunction<Integer, String>() {
+windowedStream.fold("start", new FoldFunction<Integer, String>() {
     public String fold(String current, Integer value) {
         return current + "-" + value;
     }
@@ -1324,7 +1324,7 @@ File-based:
 
     *IMPORTANT NOTES:*
 
-    1. If the `watchType` is set to `FileProcessingMode.PROCESS_CONTINUOUSLY`, when a file is modified, its contents are re-processed entirely. This can brake the "exactly-once" semantics, as appending data at the end of a file will lead to **all** its contents being re-processed.
+    1. If the `watchType` is set to `FileProcessingMode.PROCESS_CONTINUOUSLY`, when a file is modified, its contents are re-processed entirely. This can break the "exactly-once" semantics, as appending data at the end of a file will lead to **all** its contents being re-processed.
 
     2. If the `watchType` is set to `FileProcessingMode.PROCESS_ONCE`, the source scans the path **once** and exits, without waiting for the readers to finish reading the file contents. Of course the readers will continue reading until all file contents are read. Closing the source leads to no more checkpoints after that point. This may lead to slower recovery after a node failure, as the job will resume reading from the last checkpoint.
 
@@ -1382,7 +1382,7 @@ File-based:
 
     *IMPORTANT NOTES:*
 
-    1. If the `watchType` is set to `FileProcessingMode.PROCESS_CONTINUOUSLY`, when a file is modified, its contents are re-processed entirely. This can brake the "exactly-once" semantics, as appending data at the end of a file will lead to **all** its contents being re-processed.
+    1. If the `watchType` is set to `FileProcessingMode.PROCESS_CONTINUOUSLY`, when a file is modified, its contents are re-processed entirely. This can break the "exactly-once" semantics, as appending data at the end of a file will lead to **all** its contents being re-processed.
 
     2. If the `watchType` is set to `FileProcessingMode.PROCESS_ONCE`, the source scans the path **once** and exits, without waiting for the readers to finish reading the file contents. Of course the readers will continue reading until all file contents are read. Closing the source leads to no more checkpoints after that point. This may lead to slower recovery after a node failure, as the job will resume reading from the last checkpoint.
 

http://git-wip-us.apache.org/repos/asf/flink/blob/72122020/docs/dev/libs/cep.md
----------------------------------------------------------------------
diff --git a/docs/dev/libs/cep.md b/docs/dev/libs/cep.md
index 77266bc..d27cf9f 100644
--- a/docs/dev/libs/cep.md
+++ b/docs/dev/libs/cep.md
@@ -98,7 +98,7 @@ val result: DataStream[Alert] = patternStream.select(createAlert(_))
 </div>
 </div>
 
-Note that we use use Java 8 lambdas in our Java code examples to make them more succinct.
+Note that we use Java 8 lambdas in our Java code examples to make them more succinct.
 
 ## The Pattern API
 
@@ -521,10 +521,11 @@ def flatSelectFn(pattern : mutable.Map[String, IN], collector : Collector[OUT])
 
 ### Handling Timed Out Partial Patterns
 
-Whenever a pattern has a window length associated via the `within` key word, it is possible that partial event patterns will be discarded because they exceed the window length.
-In order to react to these timeout events the `select` and `flatSelect` API calls allow to specify a timeout handler.
+Whenever a pattern has a window length associated via the `within` keyword, it is possible that partial event patterns will be discarded because they exceed the window length.
+In order to react to these timeout events the `select` and `flatSelect` API calls allow a timeout handler to be specified.
 This timeout handler is called for each partial event pattern which has timed out.
-The timeout handler receives all so far matched events of the partial pattern and the timestamp when the timeout was detected.
+The timeout handler receives all the events that have been matched so far by the pattern, and the timestamp when the timeout was detected.
+
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">

http://git-wip-us.apache.org/repos/asf/flink/blob/72122020/docs/dev/state.md
----------------------------------------------------------------------
diff --git a/docs/dev/state.md b/docs/dev/state.md
index ec8c5eb..37de0a8 100644
--- a/docs/dev/state.md
+++ b/docs/dev/state.md
@@ -73,20 +73,19 @@ active key (i.e. the key of the input element).
 It is important to keep in mind that these state objects are only used for interfacing
 with state. The state is not necessarily stored inside but might reside on disk or somewhere else.
 The second thing to keep in mind is that the value you get from the state
-depend on the key of the input element. So the value you get in one invocation of your
-user function can be different from the one you get in another invocation if the key of
-the element is different.
+depends on the key of the input element. So the value you get in one invocation of your
+user function can differ from the value in another invocation if the keys involved are different.
 
-To get a state handle you have to create a `StateDescriptor` this holds the name of the state
+To get a state handle you have to create a `StateDescriptor`. This holds the name of the state
 (as we will later see you can create several states, and they have to have unique names so
-that you can reference them), the type of the values that the state holds and possibly
+that you can reference them), the type of the values that the state holds, and possibly
 a user-specified function, such as a `ReduceFunction`. Depending on what type of state you
-want to retrieve you create one of `ValueStateDescriptor`, `ListStateDescriptor` or
-`ReducingStateDescriptor`.
+want to retrieve, you create either a `ValueStateDescriptor`, a `ListStateDescriptor` or
+a `ReducingStateDescriptor`.
 
 State is accessed using the `RuntimeContext`, so it is only possible in *rich functions*.
 Please see [here]({{ site.baseurl }}/apis/common/#specifying-transformation-functions) for
-information about that but we will also see an example shortly. The `RuntimeContext` that
+information about that, but we will also see an example shortly. The `RuntimeContext` that
 is available in a `RichFunction` has these methods for accessing state:
 
 * `ValueState<T> getState(ValueStateDescriptor<T>)`
@@ -147,7 +146,7 @@ env.fromElements(Tuple2.of(1L, 3L), Tuple2.of(1L, 5L), Tuple2.of(1L, 7L), Tuple2
 
 This example implements a poor man's counting window. We key the tuples by the first field
 (in the example all have the same key `1`). The function stores the count and a running sum in
-a `ValueState`, once the count reaches 2 it will emit the average and clear the state so that
+a `ValueState`. Once the count reaches 2 it will emit the average and clear the state so that
 we start over from `0`. Note that this would keep a different state value for each different input
 key if we had tuples with different values in the first field.
 

http://git-wip-us.apache.org/repos/asf/flink/blob/72122020/docs/dev/state_backends.md
----------------------------------------------------------------------
diff --git a/docs/dev/state_backends.md b/docs/dev/state_backends.md
index e5b9c2a..70e472d 100644
--- a/docs/dev/state_backends.md
+++ b/docs/dev/state_backends.md
@@ -41,16 +41,16 @@ chosen **State Backend**.
 
 Out of the box, Flink bundles these state backends:
 
- - *MemoryStateBacked*
+ - *MemoryStateBackend*
  - *FsStateBackend*
  - *RocksDBStateBackend*
 
-If nothing else is configured, the system will use the MemoryStateBacked.
+If nothing else is configured, the system will use the MemoryStateBackend.
 
 
 ### The MemoryStateBackend
 
-The *MemoryStateBacked* holds data internally as objects on the Java heap. Key/value state and window operators hold hash tables
+The *MemoryStateBackend* holds data internally as objects on the Java heap. Key/value state and window operators hold hash tables
 that store the values, triggers, etc.
 
 Upon checkpoints, this state backend will snapshot the state and send it as part of the checkpoint acknowledgement messages to the

http://git-wip-us.apache.org/repos/asf/flink/blob/72122020/docs/dev/windows.md
----------------------------------------------------------------------
diff --git a/docs/dev/windows.md b/docs/dev/windows.md
index 67280dd..2611870 100644
--- a/docs/dev/windows.md
+++ b/docs/dev/windows.md
@@ -111,7 +111,7 @@ which we could process the aggregated elements.
 ### Tumbling Windows
 
 A *tumbling windows* assigner assigns elements to fixed length, non-overlapping windows of a
-specified *window size*.. For example, if you specify a window size of 5 minutes, the window
+specified *window size*. For example, if you specify a window size of 5 minutes, the window
 function will get 5 minutes worth of elements in each invocation.
 
 <img src="{{ site.baseurl }}/fig/tumbling-windows.svg" class="center" style="width: 80%;" />
@@ -381,7 +381,7 @@ a concatenation of all the `Long` fields of the input.
 
 ### WindowFunction - The Generic Case
 
-Using a `WindowFunction` provides most flexibility, at the cost of performance. The reason for this
+Using a `WindowFunction` provides the most flexibility, at the cost of performance. The reason for this
 is that elements cannot be incrementally aggregated for a window and instead need to be buffered
 internally until the window is considered ready for processing. A `WindowFunction` gets an
 `Iterable` containing all the elements of the window being processed. The signature of

http://git-wip-us.apache.org/repos/asf/flink/blob/72122020/docs/quickstart/run_example_quickstart.md
----------------------------------------------------------------------
diff --git a/docs/quickstart/run_example_quickstart.md b/docs/quickstart/run_example_quickstart.md
index 70f8756..e079280 100644
--- a/docs/quickstart/run_example_quickstart.md
+++ b/docs/quickstart/run_example_quickstart.md
@@ -26,12 +26,12 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-In this guide we will start from scratch and go from setting up a Flink project and running
+In this guide we will start from scratch and go from setting up a Flink project to running
 a streaming analysis program on a Flink cluster.
 
 Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to
 read this channel in Flink and count the number of bytes that each user edits within
-a given window of time. This is easy enough to implement in a few minutes using Flink but it will
+a given window of time. This is easy enough to implement in a few minutes using Flink, but it will
 give you a good foundation from which to start building more complex analysis programs on your own.
 
 ## Setting up a Maven Project
@@ -125,21 +125,21 @@ public class WikipediaAnalysis {
 }
 {% endhighlight %}
 
-I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give
+The program is very basic now, but we will fill it in as we go. Note that I'll not give
 import statements here since IDEs can add them automatically. At the end of this section I'll show
 the complete code with import statements if you simply want to skip ahead and enter that in your
 editor.
 
 The first step in a Flink program is to create a `StreamExecutionEnvironment`
 (or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution
-parameters and create sources for reading from external systems. So let's go ahead, add
+parameters and create sources for reading from external systems. So let's go ahead and add
 this to the main method:
 
 {% highlight java %}
 StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
 {% endhighlight %}
 
-Next, we will create a source that reads from the Wikipedia IRC log:
+Next we will create a source that reads from the Wikipedia IRC log:
 
 {% highlight java %}
 DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource());
@@ -149,7 +149,7 @@ This creates a `DataStream` of `WikipediaEditEvent` elements that we can further
 the purposes of this example we are interested in determining the number of added or removed
 bytes that each user causes in a certain time window, let's say five seconds. For this we first
 have to specify that we want to key the stream on the user name, that is to say that operations
-on this should take the key into account. In our case the summation of edited bytes in the windows
+on this stream should take the user name into account. In our case the summation of edited bytes in the windows
 should be per unique user. For keying a Stream we have to provide a `KeySelector`, like this:
 
 {% highlight java %}
@@ -165,8 +165,8 @@ KeyedStream<WikipediaEditEvent, String> keyedEdits = edits
 This gives us a Stream of `WikipediaEditEvent` that has a `String` key, the user name.
 We can now specify that we want to have windows imposed on this stream and compute a
 result based on elements in these windows. A window specifies a slice of a Stream
-on which to perform a computation. They are required when performing an aggregation
-computation on an infinite stream of elements. In our example we will say
+on which to perform a computation. Windows are required when computing aggregations
+on an infinite stream of elements. In our example we will say
 that we want to aggregate the sum of edited bytes for every five seconds:
 
 {% highlight java %}
@@ -276,9 +276,10 @@ similar to this:
 The number in front of each line tells you on which parallel instance of the print sink the output
 was produced.
 
-This should get you started with writing your own Flink programs. You can check out our guides
-about [basic concepts]{{{ site.baseurl }}/apis/common/index.html} and the
-[DataStream API]{{{ site.baseurl }}/apis/streaming/index.html} if you want to learn more. Stick
+This should get you started with writing your own Flink programs. To learn more 
+you can check out our guides
+about [basic concepts]({{ site.baseurl }}/apis/common/index.html) and the
+[DataStream API]({{ site.baseurl }}/apis/streaming/index.html). Stick
 around for the bonus exercise if you want to learn about setting up a Flink cluster on
 your own machine and writing results to [Kafka](http://kafka.apache.org).