You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by al...@apache.org on 2017/02/06 12:43:58 UTC

flink git commit: [FLINK-5709] Add Max Parallelism to Parallel Execution Doc

Repository: flink
Updated Branches:
  refs/heads/master 215776b81 -> fffc8f055


[FLINK-5709] Add Max Parallelism to Parallel Execution Doc


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/fffc8f05
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/fffc8f05
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/fffc8f05

Branch: refs/heads/master
Commit: fffc8f0557153e9d0ba724f329e5ce09ad5a412b
Parents: 215776b
Author: Aljoscha Krettek <al...@gmail.com>
Authored: Fri Feb 3 15:15:10 2017 +0100
Committer: Aljoscha Krettek <al...@gmail.com>
Committed: Mon Feb 6 13:42:54 2017 +0100

----------------------------------------------------------------------
 docs/dev/parallel.md | 40 +++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/fffc8f05/docs/dev/parallel.md
----------------------------------------------------------------------
diff --git a/docs/dev/parallel.md b/docs/dev/parallel.md
index 8d38884..549481f 100644
--- a/docs/dev/parallel.md
+++ b/docs/dev/parallel.md
@@ -27,10 +27,21 @@ program consists of multiple tasks (transformations/operators, data sources, and
 several parallel instances for execution and each parallel instance processes a subset of the task's
 input data. The number of parallel instances of a task is called its *parallelism*.
 
+If you want to use [savepoints]({{ site.baseurl }}/setup/savepoints.html) you should also consider
+setting a maximum parallelism (or `max parallelism`). When restoring from a savepoint you can
+change the parallelism of specific operators or the whole program and this setting specifies
+an upper bound on the parallelism. This is required because Flink internally partitions state
+into key-groups and we cannot have `+Inf` number of key-groups because this would be detrimental
+to performance.
 
-The parallelism of a task can be specified in Flink on different levels.
+* toc
+{:toc}
 
-## Operator Level
+## Setting the Parallelism
+
+The parallelism of a task can be specified in Flink on different levels:
+
+### Operator Level
 
 The parallelism of an individual operator, data source, or data sink can be defined by calling its
 `setParallelism()` method.  For example, like this:
@@ -69,10 +80,10 @@ env.execute("Word Count Example")
 </div>
 </div>
 
-## Execution Environment Level
+### Execution Environment Level
 
-As mentioned [here](#anatomy-of-a-flink-program) Flink programs are executed in the context
-of an execution environment. An
+As mentioned [here]({{ site.baseurl }}/dev/api_concepts.html#anatomy-of-a-flink-program) Flink
+programs are executed in the context of an execution environment. An
 execution environment defines a default parallelism for all operators, data sources, and data sinks
 it executes. Execution environment parallelism can be overwritten by explicitly configuring the
 parallelism of an operator.
@@ -112,7 +123,7 @@ env.execute("Word Count Example")
 </div>
 </div>
 
-## Client Level
+### Client Level
 
 The parallelism can be set at the Client when submitting jobs to Flink. The
 Client can either be a Java or a Scala program. One example of such a Client is
@@ -166,10 +177,25 @@ try {
 </div>
 
 
-## System Level
+### System Level
 
 A system-wide default parallelism for all execution environments can be defined by setting the
 `parallelism.default` property in `./conf/flink-conf.yaml`. See the
 [Configuration]({{ site.baseurl }}/setup/config.html) documentation for details.
 
+## Setting the Maximum Parallelism
+
+The maximum parallelism can be set in places where you can also set a parallelism
+(except client level and system level). Instead of calling `setParallelism()` you call
+`setMaxParallelism()` to set the maximum parallelism.
+
+The default setting for the maximum parallelism is roughly `operatorParallelism + (operatorParallelism / 2)` with
+a lower bound of `127` and an upper bound of `32768`.
+
+<span class="label label-danger">Attention</span> Setting the maximum parallelism to a very large
+value can be detrimental to performance because some state backends have to keep internal data
+structures that scale with the number of key-groups (which are the internal implementation mechanism for
+rescalable state).
+
+
 {% top %}