You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by se...@apache.org on 2014/08/20 14:41:30 UTC

git commit: [FLINK-1053] Add documentation for mapPartition() function

Repository: incubator-flink
Updated Branches:
  refs/heads/master 6c48063ad -> 4717c0c7d


[FLINK-1053] Add documentation for mapPartition() function

[ci skip]


Project: http://git-wip-us.apache.org/repos/asf/incubator-flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-flink/commit/4717c0c7
Tree: http://git-wip-us.apache.org/repos/asf/incubator-flink/tree/4717c0c7
Diff: http://git-wip-us.apache.org/repos/asf/incubator-flink/diff/4717c0c7

Branch: refs/heads/master
Commit: 4717c0c7d6964553343003611dda810ede3a7e56
Parents: 6c48063
Author: Stephan Ewen <se...@apache.org>
Authored: Wed Aug 20 14:39:16 2014 +0200
Committer: Stephan Ewen <se...@apache.org>
Committed: Wed Aug 20 14:40:39 2014 +0200

----------------------------------------------------------------------
 docs/java_api_guide.md           | 24 ++++++++++++++++++++++--
 docs/java_api_transformations.md | 24 ++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/4717c0c7/docs/java_api_guide.md
----------------------------------------------------------------------
diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md
index 76dd332..4441d93 100644
--- a/docs/java_api_guide.md
+++ b/docs/java_api_guide.md
@@ -246,7 +246,7 @@ data.map(new MapFunction<String, Integer>() {
     <tr>
       <td><strong>FlatMap</strong></td>
       <td>
-        <p>Takes one element and produces zero, one, or more elements.</p>
+        <p>Takes one element and produces zero, one, or more elements. </p>
 {% highlight java %}
 data.flatMap(new FlatMapFunction<String, String>() {
   public void flatMap(String value, Collector<String> out) {
@@ -260,6 +260,26 @@ data.flatMap(new FlatMapFunction<String, String>() {
     </tr>
 
     <tr>
+      <td><strong>MapPartition</strong></td>
+      <td>
+        <p>Transforms a parallel partition in a single function call. The function get the partition as an `Iterable` stream and
+           can produce an arbitrary number of result values. The number of elements in each partition depends on the degree-of-parallelism
+           and previous operations.</p>
+{% highlight java %}
+data.mapPartition(new MapPartitionFunction<String, Long>() {
+  public void mapPartition(Iterable<String> values, Collector<Long> out) {
+    long c = 0;
+    for (String s : values) {
+      c++;
+    }
+    out.collect(c);
+  }
+});
+{% endhighlight %}
+      </td>
+    </tr>
+
+    <tr>
       <td><strong>Filter</strong></td>
       <td>
         <p>Evaluates a boolean function for each element and retains those for which the function returns true.</p>
@@ -403,7 +423,7 @@ DataSet<Tuple2<String, Integer>> out = in.project(2,0).types(String.class, Integ
 Defining Keys
 -------------
 
-One transformation (join, coGroup) require that a key is defined on
+Some transformations (join, coGroup) require that a key is defined on
 its argument DataSets, and other transformations (Reduce, GroupReduce,
 Aggregate) allow that the DataSet is grouped on a key before they are
 applied.

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/4717c0c7/docs/java_api_transformations.md
----------------------------------------------------------------------
diff --git a/docs/java_api_transformations.md b/docs/java_api_transformations.md
index e4980de..3e303c7 100644
--- a/docs/java_api_transformations.md
+++ b/docs/java_api_transformations.md
@@ -53,7 +53,31 @@ public class Tokenizer implements FlatMapFunction<String, String> {
 // [...]
 DataSet<String> textLines = // [...]
 DataSet<String> words = textLines.flatMap(new Tokenizer());
+```
+
+### MapPartition
+
+The MapPartition function transforms a parallel partition in a single function call. The function get the partition as an `Iterable` stream and
+can produce an arbitrary number of result values. The number of elements in each partition depends on the degree-of-parallelism
+and previous operations.
+
+The following code transforms a `DataSet` of text lines into a `DataSet` of counts per partition:
+
+```java
+public class PartitionCounter implements MapPartitionFunction<String, Long> {
 
+  public void mapPartition(Iterable<String> values, Collector<Long> out) {
+    long c = 0;
+    for (String s : values) {
+      c++;
+    }
+    out.collect(c);
+  }
+}
+
+// [...]
+DataSet<String> textLines = // [...]
+DataSet<Long> counts = textLines.mapPartition(new PartitionCounter());
 ```
 
 ### Filter