You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ja...@apache.org on 2019/07/10 14:38:44 UTC

[flink] branch master updated: FLINK-13106][doc-zh] Translate "Parallel Execution" page into Chinese

This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/master by this push:
     new f150665  FLINK-13106][doc-zh] Translate "Parallel Execution" page into Chinese
f150665 is described below

commit f1506653cc18a95478884fba59ee929e83d4fcf6
Author: guanghui01.rong <gu...@vipshop.com>
AuthorDate: Fri Jul 5 15:44:20 2019 +0800

    FLINK-13106][doc-zh] Translate "Parallel Execution" page into Chinese
    
    This closes #8995
---
 docs/dev/parallel.md    |  2 +-
 docs/dev/parallel.zh.md | 67 ++++++++++++++++---------------------------------
 2 files changed, 22 insertions(+), 47 deletions(-)

diff --git a/docs/dev/parallel.md b/docs/dev/parallel.md
index ae6f863..693f83f 100644
--- a/docs/dev/parallel.md
+++ b/docs/dev/parallel.md
@@ -190,7 +190,7 @@ The maximum parallelism can be set in places where you can also set a parallelis
 `setMaxParallelism()` to set the maximum parallelism.
 
 The default setting for the maximum parallelism is roughly `operatorParallelism + (operatorParallelism / 2)` with
-a lower bound of `127` and an upper bound of `32768`.
+a lower bound of `128` and an upper bound of `32768`.
 
 <span class="label label-danger">Attention</span> Setting the maximum parallelism to a very large
 value can be detrimental to performance because some state backends have to keep internal data
diff --git a/docs/dev/parallel.zh.md b/docs/dev/parallel.zh.md
index 9031256..c1b6038 100644
--- a/docs/dev/parallel.zh.md
+++ b/docs/dev/parallel.zh.md
@@ -1,5 +1,5 @@
 ---
-title: "并发执行"
+title: "并行执行"
 nav-parent_id: execution
 nav-pos: 30
 ---
@@ -22,29 +22,20 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-This section describes how the parallel execution of programs can be configured in Flink. A Flink
-program consists of multiple tasks (transformations/operators, data sources, and sinks). A task is split into
-several parallel instances for execution and each parallel instance processes a subset of the task's
-input data. The number of parallel instances of a task is called its *parallelism*.
+本节描述了在 Flink 中配置程序的并行执行。一个 Flink 程序由多个任务 task 组成(转换/算子、数据源和数据接收器)。一个 task 包括多个并行执行的实例,且每一个实例都处理 task 输入数据的一个子集。一个 task 的并行实例数被称为该 task 的 *并行度* (parallelism)。
 
-If you want to use [savepoints]({{ site.baseurl }}/ops/state/savepoints.html) you should also consider
-setting a maximum parallelism (or `max parallelism`). When restoring from a savepoint you can
-change the parallelism of specific operators or the whole program and this setting specifies
-an upper bound on the parallelism. This is required because Flink internally partitions state
-into key-groups and we cannot have `+Inf` number of key-groups because this would be detrimental
-to performance.
+使用 [savepoints]({{ site.baseurl }}/zh/ops/state/savepoints.html) 时,应该考虑设置最大并行度。当作业从一个 savepoint 恢复时,你可以改变特定算子或着整个程序的并行度,并且此设置会限定整个程序的并行度的上限。由于在 Flink 内部将状态划分为了 key-groups,且性能所限不能无限制地增加 key-groups,因此设定最大并行度是有必要的。
 
 * toc
 {:toc}
 
-## Setting the Parallelism
+## 设置并行度
 
-The parallelism of a task can be specified in Flink on different levels:
+一个 task 的并行度可以从多个层次指定:
 
-### Operator Level
+### 算子层次
 
-The parallelism of an individual operator, data source, or data sink can be defined by calling its
-`setParallelism()` method.  For example, like this:
+单个算子、数据源和数据接收器的并行度可以通过调用 `setParallelism()`方法来指定。如下所示:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -80,17 +71,11 @@ env.execute("Word Count Example")
 </div>
 </div>
 
-### Execution Environment Level
+### 执行环境层次
 
-As mentioned [here]({{ site.baseurl }}/dev/api_concepts.html#anatomy-of-a-flink-program) Flink
-programs are executed in the context of an execution environment. An
-execution environment defines a default parallelism for all operators, data sources, and data sinks
-it executes. Execution environment parallelism can be overwritten by explicitly configuring the
-parallelism of an operator.
+如[此节]({{ site.baseurl }}/zh/dev/api_concepts.html#anatomy-of-a-flink-program)所描述,Flink 程序运行在执行环境的上下文中。执行环境为所有执行的算子、数据源、数据接收器 (data sink) 定义了一个默认的并行度。可以显式配置算子层次的并行度去覆盖执行环境的并行度。
 
-The default parallelism of an execution environment can be specified by calling the
-`setParallelism()` method. To execute all operators, data sources, and data sinks with a parallelism
-of `3`, set the default parallelism of the execution environment as follows:
+可以通过调用 `setParallelism()` 方法指定执行环境的默认并行度。如果想以并行度`3`来执行所有的算子、数据源和数据接收器。可以在执行环境上设置默认并行度,如下所示:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -123,19 +108,16 @@ env.execute("Word Count Example")
 </div>
 </div>
 
-### Client Level
+### 客户端层次
 
-The parallelism can be set at the Client when submitting jobs to Flink. The
-Client can either be a Java or a Scala program. One example of such a Client is
-Flink's Command-line Interface (CLI).
+将作业提交到 Flink 时可在客户端设定其并行度。客户端可以是 Java 或 Scala 程序,Flink 的命令行接口(CLI)就是一种典型的客户端。
 
-For the CLI client, the parallelism parameter can be specified with `-p`. For
-example:
+在 CLI 客户端中,可以通过 `-p` 参数指定并行度,例如:
 
     ./bin/flink run -p 10 ../examples/*WordCount-java*.jar
 
 
-In a Java/Scala program, the parallelism is set as follows:
+在 Java/Scala 程序中,可以通过如下方式指定并行度:
 
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
@@ -177,25 +159,18 @@ try {
 </div>
 
 
-### System Level
+### 系统层次
 
-A system-wide default parallelism for all execution environments can be defined by setting the
-`parallelism.default` property in `./conf/flink-conf.yaml`. See the
-[Configuration]({{ site.baseurl }}/ops/config.html) documentation for details.
+可以通过设置 `./conf/flink-conf.yaml` 文件中的 `parallelism.default` 参数,在系统层次来指定所有执行环境的默认并行度。你可以通过查阅[配置文档]({{ site.baseurl }}/zh/ops/config.html)获取更多细节。
 
-## Setting the Maximum Parallelism
 
-The maximum parallelism can be set in places where you can also set a parallelism
-(except client level and system level). Instead of calling `setParallelism()` you call
-`setMaxParallelism()` to set the maximum parallelism.
+## 设置最大并行度
 
-The default setting for the maximum parallelism is roughly `operatorParallelism + (operatorParallelism / 2)` with
-a lower bound of `127` and an upper bound of `32768`.
+最大并行度可以在所有设置并行度的地方进行设定(客户端和系统层次除外)。与调用 `setParallelism()` 方法修改并行度相似,你可以通过调用 `setMaxParallelism()` 方法来设定最大并行度。
 
-<span class="label label-danger">Attention</span> Setting the maximum parallelism to a very large
-value can be detrimental to performance because some state backends have to keep internal data
-structures that scale with the number of key-groups (which are the internal implementation mechanism for
-rescalable state).
+默认的最大并行度等于将 `operatorParallelism + (operatorParallelism / 2)` 值四舍五入到大于等于该值的一个整型值,并且这个整型值是 `2` 的幂次方,注意默认最大并行度下限为 `128`,上限为 `32768`。
+
+<span class="label label-danger">注意</span> 为最大并行度设置一个非常大的值将会降低性能,因为一些 state backends 需要维持内部的数据结构,而这些数据结构将会随着 key-groups 的数目而扩张(key-group 是状态重新分配的最小单元)。
 
 
 {% top %}