You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by fh...@apache.org on 2014/09/24 11:33:45 UTC

[3/3] git commit: Added documentation for rebalance() and partitionByHash()

Added documentation for rebalance() and partitionByHash()


Project: http://git-wip-us.apache.org/repos/asf/incubator-flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-flink/commit/583c527f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-flink/tree/583c527f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-flink/diff/583c527f

Branch: refs/heads/master
Commit: 583c527fc3fc693dd40b908d969f1e510ff7dfb3
Parents: a775c32
Author: Fabian Hueske <fh...@apache.org>
Authored: Wed Sep 24 10:41:07 2014 +0200
Committer: Fabian Hueske <fh...@apache.org>
Committed: Wed Sep 24 10:44:23 2014 +0200

----------------------------------------------------------------------
 docs/dataset_transformations.md | 34 ++++++++++++++++++++++++++++++++++
 docs/programming_guide.md       | 22 ++++++++++++++++++++++
 2 files changed, 56 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/583c527f/docs/dataset_transformations.md
----------------------------------------------------------------------
diff --git a/docs/dataset_transformations.md b/docs/dataset_transformations.md
index fec796b..a490a26 100644
--- a/docs/dataset_transformations.md
+++ b/docs/dataset_transformations.md
@@ -1103,3 +1103,37 @@ val unioned = vals1.union(vals2).union(vals3)
 </div>
 </div>
 
+### Rebalance (Java API Only)
+
+Evenly rebalances the parallel partitions of a DataSet to eliminate data skew.
+Only Map-like transformations may follow a rebalance transformation, i.e.,
+
+- Map
+- FlatMap
+- Filter
+- MapPartition
+
+~~~java
+DataSet<String> in = // [...]
+// rebalance DataSet and apply a Map transformation.
+DataSet<Tuple2<String, String>> out = in.rebalance()
+                                        .map(new Mapper());
+~~~
+
+### Hash-Partition (Java API Only)
+
+Hash-partitions a DataSet on a given key. 
+Keys can be specified as key-selector functions or field position keys (see [Reduce examples](#reduce-on-grouped-dataset) for how to specify keys).
+Only Map-like transformations may follow a hash-partition transformation, i.e.,
+
+- Map
+- FlatMap
+- Filter
+- MapPartition
+
+~~~java
+DataSet<Tuple2<String, Integer>> in = // [...]
+// hash-partition DataSet by String value and apply a MapPartition transformation.
+DataSet<Tuple2<String, String>> links = in.partitionByHash(0)
+                                          .mapPartition(new PartitionMapper());
+~~~
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-flink/blob/583c527f/docs/programming_guide.md
----------------------------------------------------------------------
diff --git a/docs/programming_guide.md b/docs/programming_guide.md
index 3180692..99fc6d8 100644
--- a/docs/programming_guide.md
+++ b/docs/programming_guide.md
@@ -594,6 +594,28 @@ DataSet<String> result = data1.union(data2);
 {% endhighlight %}
       </td>
     </tr>
+    <tr>
+      <td><strong>Rebalance</strong></td>
+      <td>
+        <p>Evenly rebalances the parallel partitions of a data set to eliminate data skew. Only Map-like transformations may follow a rebalance transformation. (Java API Only)</p>
+{% highlight java %}
+DataSet<String> in = // [...]
+DataSet<String> result = in.rebalance()
+                           .map(new Mapper())
+{% endhighlight %}
+      </td>
+    </tr>
+    <tr>
+      <td><strong>Hash-Partition</strong></td>
+      <td>
+        <p>Hash-partitions a data set on a given key. Keys can be specified as key-selector functions or field position keys. Only Map-like transformations may follow a hash-partition transformation. (Java API Only)</p>
+{% highlight java %}
+DataSet<Tuple2<String,Integer>> in = // [...]
+DataSet<Integer> result = in.partitionByHash(0)
+                            .mapPartition(new PartitionMapper())
+{% endhighlight %}
+      </td>
+    </tr>
   </tbody>
 </table>