You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/04/09 09:45:14 UTC

[GitHub] [flink] liuzhuang2017 opened a new pull request, #19413: [FLINK-16078] [docs-zh] Translate "Tuning Checkpoints and Large State…

liuzhuang2017 opened a new pull request, #19413:
URL: https://github.com/apache/flink/pull/19413

   …" page into Chinese
   
   <!--
   *Thank you very much for contributing to Apache Flink - we are happy that you want to help us improve Flink. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.*
   
   *Please understand that we do not do this to make contributions to Flink a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.*
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [JIRA issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue.
     
     - Name the pull request in the form "[FLINK-XXXX] [component] Title of the pull request", where *FLINK-XXXX* should be replaced by the actual issue number. Skip *component* if you are unsure about which is the best component.
     Typo fixes that have no associated JIRA issue should be named following this pattern: `[hotfix] [docs] Fix typo in event time introduction` or `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
   
     - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
     
     - Make sure that the change passes the automated tests, i.e., `mvn clean verify` passes. You can set up Azure Pipelines CI to do that following [this guide](https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository).
   
     - Each pull request should address only one issue, not mix up code from multiple issues.
     
     - Each commit in the pull request has a meaningful commit message (including the JIRA id)
   
     - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.
   
   
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
     - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact*
     - *Deployments RPC transmits only the blob storage reference*
     - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (100MB)*
     - *Extended integration test for recovery after master (JobManager) failure*
     - *Added test that validates that TaskInfo is transferred only once across recoveries*
     - *Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no)
     - The serializers: (yes / no / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / no / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
     - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / no)
     - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] liuzhuang2017 commented on pull request #19413: [FLINK-16078] [docs-zh] Translate "Tuning Checkpoints and Large State"

Posted by GitBox <gi...@apache.org>.
liuzhuang2017 commented on PR #19413:
URL: https://github.com/apache/flink/pull/19413#issuecomment-1101131040

   @Myasuka ,I'm really sorry, this pr closed her due to my personal reasons, the new pr (https://github.com/apache/flink/pull/19503), if you have free time,  can you  help me review this new pr ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] flinkbot commented on pull request #19413: [FLINK-16078] [docs-zh] Translate "Tuning Checkpoints and Large State…

Posted by GitBox <gi...@apache.org>.
flinkbot commented on PR #19413:
URL: https://github.com/apache/flink/pull/19413#issuecomment-1093840338

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5313a08b9e63d286641dc58f95101cf407278623",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5313a08b9e63d286641dc58f95101cf407278623",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5313a08b9e63d286641dc58f95101cf407278623 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] liuzhuang2017 commented on a diff in pull request #19413: [FLINK-16078] [docs-zh] Translate "Tuning Checkpoints and Large State…

Posted by GitBox <gi...@apache.org>.
liuzhuang2017 commented on code in PR #19413:
URL: https://github.com/apache/flink/pull/19413#discussion_r847083524


##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -166,149 +125,101 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 }
 ```
 
-## Capacity Planning
-
-This section discusses how to decide how many resources should be used for a Flink job to run reliably.
-The basic rules of thumb for capacity planning are:
-
-  - Normal operation should have enough capacity to not operate under constant *back pressure*.
-    See [back pressure monitoring]({{< ref "docs/ops/monitoring/back_pressure" >}}) for details on how to check whether the application runs under back pressure.
-
-  - Provision some extra resources on top of the resources needed to run the program back-pressure-free during failure-free time.
-    These resources are needed to "catch up" with the input data that accumulated during the time the application
-    was recovering.
-    How much that should be depends on how long recovery operations usually take (which depends on the size of the state
-    that needs to be loaded into the new TaskManagers on a failover) and how fast the scenario requires failures to recover.
-
-    *Important*: The base line should to be established with checkpointing activated, because checkpointing ties up
-    some amount of resources (such as network bandwidth).
-
-  - Temporary back pressure is usually okay, and an essential part of execution flow control during load spikes,
-    during catch-up phases, or when external systems (that are written to in a sink) exhibit temporary slowdown.
+## 容量规划
+本节讨论如何确定 Flink 作业应该使用多少资源才能可靠地运行。
+容量规划的基本经验法则是:
+  - 正常运行应该有足够的能力在恒定的*反压*下运行。
+    如何检查应用程序是否在反压下运行的详细信息,请参阅 [反压监控]({{< ref "docs/ops/monitoring/back_pressure" >}})。
+  - 在无故障时间内无反压程序运行所需资源之上提供一些额外的资源。
+    需要这些资源来“赶上”在应用程序恢复期间积累的输入数据。
+    这通常取决于恢复操作需要多长时间(这取决于需要在故障转移时加载到新 TaskManager 中的状态大小)以及故障恢复的速度。
+    *重要*:基线应该在开启 checkpointing 的情况下建立,因为 checkpointing 会占用一些资源(例如网络带宽)。
+  - 临时反压通常是可以的,在负载峰值、追赶阶段或外部系统(写入接收器中)出现临时减速时,这是执行流控制的重要部分。
 
-  - Certain operations (like large windows) result in a spiky load for their downstream operators: 
-    In the case of windows, the downstream operators may have little to do while the window is being built,
-    and have a load to do when the windows are emitted.
-    The planning for the downstream parallelism needs to take into account how much the windows emit and how
-    fast such a spike needs to be processed.
+  - 某些操作(如大窗口)会导致其下游算子的负载激增:
+    在窗口的情况下,下游算子可能在构建窗口时几乎无事可做,而在窗口发出时有负载要做。
+    下游并行度的设置需要考虑到窗口输出多少以及需要以多快的速度处理这种峰值。
 
-**Important:** In order to allow for adding resources later, make sure to set the *maximum parallelism* of the
-data stream program to a reasonable number. The maximum parallelism defines how high you can set the programs
-parallelism when re-scaling the program (via a savepoint).
+**重要**:为了方便以后添加资源,请务必将数据流程序的*最大并行度*设置为合理的数字。 最大并行度定义了在重新缩放程序时(通过 savepoint )可以设置程序并行度的高度。
 
-Flink's internal bookkeeping tracks parallel state in the granularity of max-parallelism-many *key groups*.
-Flink's design strives to make it efficient to have a very high value for the maximum parallelism, even if
-executing the program with a low parallelism.
+Flink 的内部以多个*键组(key groups)* 的最大并行度为粒度跟踪并行状态。
+Flink 的设计力求使最大并行度的值达到很高的效率,即使执行程序时并行度很低。
 
-## Compression
-
-Flink offers optional compression (default: off) for all checkpoints and savepoints. Currently, compression always uses 
-the [snappy compression algorithm (version 1.1.4)](https://github.com/xerial/snappy-java) but we are planning to support
-custom compression algorithms in the future. Compression works on the granularity of key-groups in keyed state, i.e.
-each key-group can be decompressed individually, which is important for rescaling. 
-
-Compression can be activated through the `ExecutionConfig`:
+## 压缩
+Flink 为所有 checkpoints 和 savepoints 提供可选的压缩(默认:关闭)。 目前,压缩总是使用 [snappy 压缩算法(版本 1.1.4)](https://github.com/xerial/snappy-java),
+但我们计划在未来支持自定义压缩算法。 压缩作用于 keyed state 下 key-groups 的粒度,即每个 key-groups 可以单独解压缩,这对于重新缩放很重要。
 
+可以通过 `ExecutionConfig` 开启压缩:
 ```java
 ExecutionConfig executionConfig = new ExecutionConfig();
 executionConfig.setUseSnapshotCompression(true);
 ```
 
-<span class="label label-info">Note</span> The compression option has no impact on incremental snapshots, because they are using RocksDB's internal
-format which is always using snappy compression out of the box.
-
-## Task-Local Recovery
+<span class="label label-info">注意</span> 压缩选项对增量快照没有影响,因为它们使用的是 RocksDB 的内部格式,该格式始终使用开箱即用的 snappy 压缩。
 
-### Motivation
+## Task 本地恢复
+### 问题引入
+在 Flink 的 checkpointing 中,每个 task 都会生成其状态快照,然后将其写入分布式存储。 每个 task 通过发送一个描述分布式存储中的位置状态的句柄,向 jobmanager 确认状态的成功写入。
+JobManager 反过来收集所有 tasks 的句柄并将它们捆绑到一个 checkpoint 对象中。
 
-In Flink's checkpointing, each task produces a snapshot of its state that is then written to a distributed store. Each task acknowledges
-a successful write of the state to the job manager by sending a handle that describes the location of the state in the distributed store.
-The job manager, in turn, collects the handles from all tasks and bundles them into a checkpoint object.
+在恢复的情况下,jobmanager 打开最新的 checkpoint 对象并将句柄发送回相应的 tasks,然后可以从分布式存储中恢复它们的状态。 使用分布式存储来存储状态有两个重要的优势。 
+首先,存储是容错的,其次,分布式存储中的所有状态都可以被所有节点访问,并且可以很容易地重新分配(例如,用于重新缩放)。
 
-In case of recovery, the job manager opens the latest checkpoint object and sends the handles back to the corresponding tasks, which can
-then restore their state from the distributed storage. Using a distributed storage to store state has two important advantages. First, the storage
-is fault tolerant and second, all state in the distributed store is accessible to all nodes and can be easily redistributed (e.g. for rescaling).
+但是,使用远程分布式存储也有一个很大的缺点:所有 tasks 都必须通过网络从远程位置读取它们的状态。
+在许多场景中,恢复可能会将失败的 tasks 重新调度到与前一次运行相同的 taskmanager 中(当然也有像机器故障这样的异常),但我们仍然必须读取远程状态。这可能导致*大状态的长时间恢复*,即使在一台机器上只有一个小故障。
 
-However, using a remote distributed store has also one big disadvantage: all tasks must read their state from a remote location, over the network.
-In many scenarios, recovery could reschedule failed tasks to the same task manager as in the previous run (of course there are exceptions like machine
-failures), but we still have to read remote state. This can result in *long recovery time for large states*, even if there was only a small failure on
-a single machine.
+### 解决办法
 
-### Approach
+Task 本地状态恢复正是针对这个恢复时间长的问题,其主要思想如下:对于每个 checkpoint ,每个 task 不仅将 task 状态写入分布式存储中,
+而且还在 task 本地存储(例如本地磁盘或内存)中保存状态快照的次要副本。请注意,快照的主存储仍然必须是分布式存储,因为本地存储不能确保节点故障下的持久性,也不能为其他节点提供重新分发状态的访问,所以这个功能仍然需要主副本。
 
-Task-local state recovery targets exactly this problem of long recovery time and the main idea is the following: for every checkpoint, each task
-does not only write task states to the distributed storage, but also keep *a secondary copy of the state snapshot in a storage that is local to
-the task* (e.g. on local disk or in memory). Notice that the primary store for snapshots must still be the distributed store, because local storage
-does not ensure durability under node failures and also does not provide access for other nodes to redistribute state, this functionality still
-requires the primary copy.
+然而,对于每个 task 可以重新调度到以前的位置进行恢复的 task ,我们可以从次要本地状态副本恢复,并避免远程读取状态的成本。考虑到*许多故障不是节点故障,节点故障通常一次只影响一个或非常少的节点*,
+在恢复过程中,大多数 task 很可能会返回到它们以前的位置,并发现它们的本地状态完好无损。
+这就是 task 本地恢复有效地减少恢复时间的原因。
 
-However, for each task that can be rescheduled to the previous location for recovery, we can restore state from the secondary, local
-copy and avoid the costs of reading the state remotely. Given that *many failures are not node failures and node failures typically only affect one
-or very few nodes at a time*, it is very likely that in a recovery most tasks can return to their previous location and find their local state intact.
-This is what makes local recovery effective in reducing recovery time.
-
-Please note that this can come at some additional costs per checkpoint for creating and storing the secondary local state copy, depending on the
-chosen state backend and checkpointing strategy. For example, in most cases the implementation will simply duplicate the writes to the distributed
-store to a local file.
+请注意,根据所选的 state backend 和 checkpointing 策略,在每个 checkpoint 创建和存储次要本地状态副本时,可能会有一些额外的成本。
+例如,在大多数情况下,实现只是简单地将对分布式存储的写操作复制到本地文件。
 
 {{< img src="/fig/local_recovery.png" class="center" width=50% alt="Illustration of checkpointing with task-local recovery." >}}
+### 主要(分布式存储)和次要(task 本地)状态快照的关系
+Task 本地状态始终被视为次要副本,checkpoint 状态是分布式存储中的主副本。 这对 checkpointing 和恢复期间的本地状态问题有影响:
 
-### Relationship of primary (distributed store) and secondary (task-local) state snapshots
-
-Task-local state is always considered a secondary copy, the ground truth of the checkpoint state is the primary copy in the distributed store. This
-has implications for problems with local state during checkpointing and recovery:
-
-- For checkpointing, the *primary copy must be successful* and a failure to produce the *secondary, local copy will not fail* the checkpoint. A checkpoint
-will fail if the primary copy could not be created, even if the secondary copy was successfully created.
-
-- Only the primary copy is acknowledged and managed by the job manager, secondary copies are owned by task managers and their life cycles can be
-independent from their primary copies. For example, it is possible to retain a history of the 3 latest checkpoints as primary copies and only keep
-the task-local state of the latest checkpoint.
-
-- For recovery, Flink will always *attempt to restore from task-local state first*, if a matching secondary copy is available. If any problem occurs during
-the recovery from the secondary copy, Flink will *transparently retry to recover the task from the primary copy*. Recovery only fails, if primary
-and the (optional) secondary copy failed. In this case, depending on the configuration Flink could still fall back to an older checkpoint.
-
-- It is possible that the task-local copy contains only parts of the full task state (e.g. exception while writing one local file). In this case,
-Flink will first try to recover local parts locally, non-local state is restored from the primary copy. Primary state must always be complete and is
-a *superset of the task-local state*.
+- 对于 checkpointing ,*主副本必须成功*,并且生成*次要本地副本的失败不会使* checkpoint 失败。 如果无法创建主副本,即使已成功创建次要副本,checkpoint 也会失败。
 
-- Task-local state can have a different format than the primary state, they are not required to be byte identical. For example, it could be even possible
-that the task-local state is an in-memory consisting of heap objects, and not stored in any files.
+- 只有主副本由 jobmanager 确认和管理,次要副本属于 taskmanager ,并且它们的生命周期可以独立于它们的主副本。 例如,可以保留 3 个最新 checkpoints 的历史记录作为主副本,并且只保留最新 checkpoint 的 task 本地状态。
 
-- If a task manager is lost, the local state from all its task is lost.
+- 对于恢复,如果匹配的次要副本可用,Flink 将始终*首先尝试从 task 本地状态恢复*。 如果在次要副本恢复过程中出现任何问题,Flink 将*透明地重试从主副本恢复 task*。 仅当主副本和(可选)次要副本失败时,恢复才会失败。 
+  在这种情况下,根据配置,Flink 仍可能回退到旧的 checkpoint 。
+- Task 本地副本可能仅包含完整 task 状态的一部分(例如,写入一个本地文件时出现异常)。 在这种情况下,Flink 会首先尝试在本地恢复本地部分,非本地状态从主副本恢复。 主状态必须始终是完整的,并且是*task 本地状态的超集*。
 
-### Configuring task-local recovery
+- Task 本地状态可以具有与主状态不同的格式,它们不需要相同字节。 例如,task 本地状态甚至可能是在堆对象组成的内存中,而不是存储在任何文件中。
 
-Task-local recovery is *deactivated by default* and can be activated through Flink's configuration with the key `state.backend.local-recovery` as specified
-in `CheckpointingOptions.LOCAL_RECOVERY`. The value for this setting can either be *true* to enable or *false* (default) to disable local recovery.
+- 如果 taskmanager 丢失,则其所有 task 的本地状态都会丢失。
+### 配置 task 本地恢复
 
-Note that [unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) currently do not support task-local recovery.
+Task 本地恢复*默认禁用*,可以通过 Flink 的 CheckpointingOptions.LOCAL_RECOVERY 配置中指定的键 state.backend.local-recovery 来启用。 此设置的值可以是 *true* 以启用或 *false*(默认)以禁用本地恢复。
 
-### Details on task-local recovery for different state backends
+请注意,[unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) 目前不支持 task 本地恢复。
 
-***Limitation**: Currently, task-local recovery only covers keyed state backends. Keyed state is typically by far the largest part of the state. In the near future, we will
-also cover operator state and timers.*
+### 不同 state backends 的 task 本地恢复的详细信息
 
-The following state backends can support task-local recovery.
+***限制**:目前,task 本地恢复仅涵盖 keyed state backends。 Keyed state 通常是该状态的最大部分。 在不久的将来,我们还将介绍算子状态和计时器(timers)。*

Review Comment:
   这里可能没有多星号?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] mddxhj commented on a diff in pull request #19413: [FLINK-16078] [docs-zh] Translate "Tuning Checkpoints and Large State"

Posted by GitBox <gi...@apache.org>.
mddxhj commented on code in PR #19413:
URL: https://github.com/apache/flink/pull/19413#discussion_r906644336


##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -26,122 +26,81 @@ under the License.
 
 # 大状态与 Checkpoint 调优
 
-This page gives a guide how to configure and tune applications that use large state.
+本文提供了如何配置和调整使用大状态的应用程序指南。
 
-## Overview
+## 概述
 
-For Flink applications to run reliably at large scale, two conditions must be fulfilled:
+Flink 应用要想在大规模场景下可靠地运行,必须要满足如下两个条件:
 
-  - The application needs to be able to take checkpoints reliably
+  - 应用程序需要能够可靠地创建 checkpoints
+  - 在应用故障后,需要有足够的资源追赶数据输入流
 
-  - The resources need to be sufficient catch up with the input data streams after a failure
+第一部分讨论如何大规模获得良好性能的 checkpoints 。
+后一部分解释了一些关于要规划使用多少资源的最佳实践。
 
-The first sections discuss how to get well performing checkpoints at scale.
-The last section explains some best practices concerning planning how many resources to use.
 
+## 监控状态和 Checkpoints
 
-## Monitoring State and Checkpoints
+监控 checkpoint 行为最简单的方法是通过 UI 的 checkpoint 部分。 [监控 Checkpoint]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}) 的文档说明了如何查看可用的 checkpoint 指标。
 
-The easiest way to monitor checkpoint behavior is via the UI's checkpoint section. The documentation
-for [checkpoint monitoring]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}) shows how to access the available checkpoint
-metrics.
+这两个指标(均通过 Task 级别 [Checkpointing 指标]({{< ref "docs/ops/metrics" >}}#checkpointing) 展示)
+以及在 [监控 Checkpoint]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}))中,当查看 checkpoint 详细信息时,特别有趣的是:
 
-The two numbers (both exposed via Task level [metrics]({{< ref "docs/ops/metrics" >}}#checkpointing)
-and in the [web interface]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}})) that are of particular interest when scaling
-up checkpoints are:
+  - 算子收到第一个 checkpoint barrier 的时间。当触发 checkpoint 的延迟时间一直很高时,这意味着 *checkpoint barrier* 需要很长时间才能从 source 到达 operators。 这通常表明系统处于反压下运行。
 
-  - The time until operators receive their first checkpoint barrier
-    When the time to trigger the checkpoint is constantly very high, it means that the *checkpoint barriers* need a long
-    time to travel from the source to the operators. That typically indicates that the system is operating under a
-    constant backpressure.
+  - Alignment Duration,为处理第一个和最后一个 checkpoint barrier 之间的时间。在 unaligned checkpoints 下,`exactly-once` 和 `at-least-once` checkpoints 的 subtasks 处理来自上游 subtasks 的所有数据,且没有任何中断。
+    然而,对于 aligned `exactly-once` checkpoints,已经收到 checkpoint barrier 的通道被阻止继续发送数据,直到所有剩余的通道都赶上并接收它们的 checkpoint barrier(对齐时间)。
 
-  - The alignment duration, which is defined as the time between receiving first and the last checkpoint barrier.
-    During unaligned `exactly-once` checkpoints and `at-least-once` checkpoints subtasks are processing all of the
-    data from the upstream subtasks without any interruptions. However with aligned `exactly-once` checkpoints,
-    the channels that have already received a checkpoint barrier are blocked from sending further data until
-    all of the remaining channels catch up and receive theirs checkpoint barriers (alignment time).
+理想情况下,这两个值都应该很低 - 较高的数值意味着由于存在反压(没有足够的资源来处理传入的记录),导致 checkpoint barriers 在作业中的移动速度较慢,这也可以通过处理记录的端到端延迟在增加来观察到。
+请注意,在出现瞬态反压、数据倾斜或网络问题时,这些数值偶尔会很高。
 
-Both of those values should ideally be low - higher amounts means that checkpoint barriers traveling through the job graph
-slowly, due to some back-pressure (not enough resources to process the incoming records). This can also be observed
-via increased end-to-end latency of processed records. Note that those numbers can be occasionally high in the presence of
-a transient backpressure, data skew, or network issues.
+[Unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) 可用于加快传播时间的 checkpoint barriers。 但是请注意,这并不能解决导致反压的根本问题(端到端记录延迟仍然很高)。

Review Comment:
   “可用于加快传播时间的 checkpoint barriers”  ->  "可用于加快checkpoint barriers的传播"



##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -166,149 +125,100 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 }
 ```
 
-## Capacity Planning
-
-This section discusses how to decide how many resources should be used for a Flink job to run reliably.
-The basic rules of thumb for capacity planning are:
-
-  - Normal operation should have enough capacity to not operate under constant *back pressure*.
-    See [back pressure monitoring]({{< ref "docs/ops/monitoring/back_pressure" >}}) for details on how to check whether the application runs under back pressure.
-
-  - Provision some extra resources on top of the resources needed to run the program back-pressure-free during failure-free time.
-    These resources are needed to "catch up" with the input data that accumulated during the time the application
-    was recovering.
-    How much that should be depends on how long recovery operations usually take (which depends on the size of the state
-    that needs to be loaded into the new TaskManagers on a failover) and how fast the scenario requires failures to recover.
-
-    *Important*: The base line should to be established with checkpointing activated, because checkpointing ties up
-    some amount of resources (such as network bandwidth).
-
-  - Temporary back pressure is usually okay, and an essential part of execution flow control during load spikes,
-    during catch-up phases, or when external systems (that are written to in a sink) exhibit temporary slowdown.
-
-  - Certain operations (like large windows) result in a spiky load for their downstream operators: 
-    In the case of windows, the downstream operators may have little to do while the window is being built,
-    and have a load to do when the windows are emitted.
-    The planning for the downstream parallelism needs to take into account how much the windows emit and how
-    fast such a spike needs to be processed.
-
-**Important:** In order to allow for adding resources later, make sure to set the *maximum parallelism* of the
-data stream program to a reasonable number. The maximum parallelism defines how high you can set the programs
-parallelism when re-scaling the program (via a savepoint).
-
-Flink's internal bookkeeping tracks parallel state in the granularity of max-parallelism-many *key groups*.
-Flink's design strives to make it efficient to have a very high value for the maximum parallelism, even if
-executing the program with a low parallelism.
-
-## Compression
-
-Flink offers optional compression (default: off) for all checkpoints and savepoints. Currently, compression always uses 
-the [snappy compression algorithm (version 1.1.4)](https://github.com/xerial/snappy-java) but we are planning to support
-custom compression algorithms in the future. Compression works on the granularity of key-groups in keyed state, i.e.
-each key-group can be decompressed individually, which is important for rescaling. 
-
-Compression can be activated through the `ExecutionConfig`:
-
+## 容量规划
+本节讨论如何确定 Flink 作业应该使用多少资源才能可靠地运行。
+容量规划的基本经验法则是:
+  - 应有足够的能力在恒定*反压*下正常运行。
+    如何检查应用程序是否在反压下运行,详细信息请参阅 [反压监控]({{< ref "docs/ops/monitoring/back_pressure" >}})。
+  - 在无故障时间内无反压运行程序所需的资源之上能够提供一些额外的资源。
+    需要这些资源来“追赶”在应用程序恢复期间积累的输入数据。
+    这通常取决于恢复操作需要多长时间(这取决于在故障转移时需要加载到新 TaskManager 中的状态大小)以及故障恢复的速度。
+    *重要提示*:基准点应该在开启 checkpointing 来建立,因为 checkpointing 会占用一些资源(例如网络带宽)。
+  - 临时反压通常是允许的,在负载峰值、追赶阶段或外部系统(sink 到外部系统)出现临时减速时,这是执行流控制的重要部分。
+  - 在某些操作下(如大窗口)会导致其下游算子的负载激增:
+    在有窗口的情况下,下游算子可能在构建窗口时几乎无事可做,而在触发窗口时有负载要做。
+    下游并行度的规划需要考虑窗口的输出量以及处理这种峰值的速度。
+
+**重要提示**:为了方便以后增加资源,请确保将流应用程序的*最大并行度*设置为一个合理的数字。最大并行度定义了当扩缩容程序时(通过 savepoint )可以设置程序并行度的高度。

Review Comment:
   “可以设置程序并行度的高度” -> “可以设置程序并行度的上限”



##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -166,149 +125,100 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 }
 ```
 
-## Capacity Planning
-
-This section discusses how to decide how many resources should be used for a Flink job to run reliably.
-The basic rules of thumb for capacity planning are:
-
-  - Normal operation should have enough capacity to not operate under constant *back pressure*.
-    See [back pressure monitoring]({{< ref "docs/ops/monitoring/back_pressure" >}}) for details on how to check whether the application runs under back pressure.
-
-  - Provision some extra resources on top of the resources needed to run the program back-pressure-free during failure-free time.
-    These resources are needed to "catch up" with the input data that accumulated during the time the application
-    was recovering.
-    How much that should be depends on how long recovery operations usually take (which depends on the size of the state
-    that needs to be loaded into the new TaskManagers on a failover) and how fast the scenario requires failures to recover.
-
-    *Important*: The base line should to be established with checkpointing activated, because checkpointing ties up
-    some amount of resources (such as network bandwidth).
-
-  - Temporary back pressure is usually okay, and an essential part of execution flow control during load spikes,
-    during catch-up phases, or when external systems (that are written to in a sink) exhibit temporary slowdown.
-
-  - Certain operations (like large windows) result in a spiky load for their downstream operators: 
-    In the case of windows, the downstream operators may have little to do while the window is being built,
-    and have a load to do when the windows are emitted.
-    The planning for the downstream parallelism needs to take into account how much the windows emit and how
-    fast such a spike needs to be processed.
-
-**Important:** In order to allow for adding resources later, make sure to set the *maximum parallelism* of the
-data stream program to a reasonable number. The maximum parallelism defines how high you can set the programs
-parallelism when re-scaling the program (via a savepoint).
-
-Flink's internal bookkeeping tracks parallel state in the granularity of max-parallelism-many *key groups*.
-Flink's design strives to make it efficient to have a very high value for the maximum parallelism, even if
-executing the program with a low parallelism.
-
-## Compression
-
-Flink offers optional compression (default: off) for all checkpoints and savepoints. Currently, compression always uses 
-the [snappy compression algorithm (version 1.1.4)](https://github.com/xerial/snappy-java) but we are planning to support
-custom compression algorithms in the future. Compression works on the granularity of key-groups in keyed state, i.e.
-each key-group can be decompressed individually, which is important for rescaling. 
-
-Compression can be activated through the `ExecutionConfig`:
-
+## 容量规划
+本节讨论如何确定 Flink 作业应该使用多少资源才能可靠地运行。
+容量规划的基本经验法则是:
+  - 应有足够的能力在恒定*反压*下正常运行。
+    如何检查应用程序是否在反压下运行,详细信息请参阅 [反压监控]({{< ref "docs/ops/monitoring/back_pressure" >}})。
+  - 在无故障时间内无反压运行程序所需的资源之上能够提供一些额外的资源。
+    需要这些资源来“追赶”在应用程序恢复期间积累的输入数据。
+    这通常取决于恢复操作需要多长时间(这取决于在故障转移时需要加载到新 TaskManager 中的状态大小)以及故障恢复的速度。
+    *重要提示*:基准点应该在开启 checkpointing 来建立,因为 checkpointing 会占用一些资源(例如网络带宽)。
+  - 临时反压通常是允许的,在负载峰值、追赶阶段或外部系统(sink 到外部系统)出现临时减速时,这是执行流控制的重要部分。
+  - 在某些操作下(如大窗口)会导致其下游算子的负载激增:
+    在有窗口的情况下,下游算子可能在构建窗口时几乎无事可做,而在触发窗口时有负载要做。
+    下游并行度的规划需要考虑窗口的输出量以及处理这种峰值的速度。
+
+**重要提示**:为了方便以后增加资源,请确保将流应用程序的*最大并行度*设置为一个合理的数字。最大并行度定义了当扩缩容程序时(通过 savepoint )可以设置程序并行度的高度。
+
+Flink 的内部以*键组(key groups)* 的最大并行度为粒度跟踪分布式状态。
+Flink 的设计力求使最大并行度的值达到很高的效率,即使执行程序时并行度很低。
+
+## 压缩
+Flink 为所有 checkpoints 和 savepoints 提供可选的压缩(默认:关闭)。 目前,压缩总是使用 [snappy 压缩算法(版本 1.1.4)](https://github.com/xerial/snappy-java),
+但我们计划在未来支持自定义压缩算法。 压缩作用于 keyed state 下 key-groups 的粒度,即每个 key-groups 可以单独解压缩,这对于重新缩放很重要。
+
+可以通过 `ExecutionConfig` 开启压缩:
 ```java
 ExecutionConfig executionConfig = new ExecutionConfig();
 executionConfig.setUseSnapshotCompression(true);
 ```
 
-<span class="label label-info">Note</span> The compression option has no impact on incremental snapshots, because they are using RocksDB's internal
-format which is always using snappy compression out of the box.
-
-## Task-Local Recovery
-
-### Motivation
+<span class="label label-info">注意</span> 压缩选项对增量快照没有影响,因为它们使用的是 RocksDB 的内部格式,该格式始终使用开箱即用的 snappy 压缩。
 
-In Flink's checkpointing, each task produces a snapshot of its state that is then written to a distributed store. Each task acknowledges
-a successful write of the state to the job manager by sending a handle that describes the location of the state in the distributed store.
-The job manager, in turn, collects the handles from all tasks and bundles them into a checkpoint object.
+## Task 本地恢复
+### 问题引入
+在 Flink 的 checkpointing 中,每个 task 都会生成其状态快照,然后将其写入分布式存储。 每个 task 通过发送一个描述分布式存储中的位置状态的句柄,向 jobmanager 确认状态的成功写入。
+JobManager 反过来收集所有 tasks 的句柄并将它们捆绑到一个 checkpoint 对象中。
 
-In case of recovery, the job manager opens the latest checkpoint object and sends the handles back to the corresponding tasks, which can
-then restore their state from the distributed storage. Using a distributed storage to store state has two important advantages. First, the storage
-is fault tolerant and second, all state in the distributed store is accessible to all nodes and can be easily redistributed (e.g. for rescaling).
+在恢复的情况下,jobmanager 打开最新的 checkpoint 对象并将句柄发送回相应的 tasks,然后可以从分布式存储中恢复它们的状态。 使用分布式存储来存储状态有两个重要的优势。 
+首先,存储是容错的,其次,分布式存储中的所有状态都可以被所有节点访问,并且可以很容易地重新分配(例如,用于重新缩放)。
 
-However, using a remote distributed store has also one big disadvantage: all tasks must read their state from a remote location, over the network.
-In many scenarios, recovery could reschedule failed tasks to the same task manager as in the previous run (of course there are exceptions like machine
-failures), but we still have to read remote state. This can result in *long recovery time for large states*, even if there was only a small failure on
-a single machine.
+但是,使用远程分布式存储也有一个很大的缺点:所有 tasks 都必须通过网络从远程位置读取它们的状态。
+在许多场景中,恢复可能会将失败的 tasks 重新调度到与前一次运行相同的 taskmanager 中(当然也有像机器故障这样的异常),但我们仍然必须读取远程状态。这可能导致*大状态的长时间恢复*,即使在一台机器上只有一个小故障。
 
-### Approach
+### 解决办法
 
-Task-local state recovery targets exactly this problem of long recovery time and the main idea is the following: for every checkpoint, each task
-does not only write task states to the distributed storage, but also keep *a secondary copy of the state snapshot in a storage that is local to
-the task* (e.g. on local disk or in memory). Notice that the primary store for snapshots must still be the distributed store, because local storage
-does not ensure durability under node failures and also does not provide access for other nodes to redistribute state, this functionality still
-requires the primary copy.
+Task 本地状态恢复正是针对这个恢复时间长的问题,其主要思想如下:对于每个 checkpoint ,每个 task 不仅将 task 状态写入分布式存储中,
+而且还在 task 本地存储(例如本地磁盘或内存)中保存状态快照的次要副本。请注意,快照的主存储仍然必须是分布式存储,因为本地存储不能确保节点故障下的持久性,也不能为其他节点提供重新分发状态的访问,所以这个功能仍然需要主副本。
 
-However, for each task that can be rescheduled to the previous location for recovery, we can restore state from the secondary, local
-copy and avoid the costs of reading the state remotely. Given that *many failures are not node failures and node failures typically only affect one
-or very few nodes at a time*, it is very likely that in a recovery most tasks can return to their previous location and find their local state intact.
-This is what makes local recovery effective in reducing recovery time.
+然而,对于每个 task 可以重新调度到以前的位置进行恢复的 task ,我们可以从次要本地状态副本恢复,并避免远程读取状态的成本。考虑到*许多故障不是节点故障,即使节点故障通常一次只影响一个或非常少的节点*,
+在恢复过程中,大多数 task 很可能会重新部署到它们以前的位置,并发现它们的本地状态完好无损。
+这就是 task 本地恢复有效地减少恢复时间的原因。
 
-Please note that this can come at some additional costs per checkpoint for creating and storing the secondary local state copy, depending on the
-chosen state backend and checkpointing strategy. For example, in most cases the implementation will simply duplicate the writes to the distributed
-store to a local file.
+请注意,根据所选的 state backend 和 checkpointing 策略,在每个 checkpoint 创建和存储次要本地状态副本时,可能会有一些额外的成本。
+例如,在大多数情况下,实现只是简单地将对分布式存储的写操作复制到本地文件。
 
 {{< img src="/fig/local_recovery.png" class="center" width=50% alt="Illustration of checkpointing with task-local recovery." >}}
+### 主要(分布式存储)和次要(task 本地)状态快照的关系
+Task 本地状态始终被视为次要副本,checkpoint 状态是分布式存储中的主副本。 这对 checkpointing 和恢复期间的本地状态问题有影响:
 
-### Relationship of primary (distributed store) and secondary (task-local) state snapshots
-
-Task-local state is always considered a secondary copy, the ground truth of the checkpoint state is the primary copy in the distributed store. This
-has implications for problems with local state during checkpointing and recovery:
-
-- For checkpointing, the *primary copy must be successful* and a failure to produce the *secondary, local copy will not fail* the checkpoint. A checkpoint
-will fail if the primary copy could not be created, even if the secondary copy was successfully created.
-
-- Only the primary copy is acknowledged and managed by the job manager, secondary copies are owned by task managers and their life cycles can be
-independent from their primary copies. For example, it is possible to retain a history of the 3 latest checkpoints as primary copies and only keep
-the task-local state of the latest checkpoint.
-
-- For recovery, Flink will always *attempt to restore from task-local state first*, if a matching secondary copy is available. If any problem occurs during
-the recovery from the secondary copy, Flink will *transparently retry to recover the task from the primary copy*. Recovery only fails, if primary
-and the (optional) secondary copy failed. In this case, depending on the configuration Flink could still fall back to an older checkpoint.
-
-- It is possible that the task-local copy contains only parts of the full task state (e.g. exception while writing one local file). In this case,
-Flink will first try to recover local parts locally, non-local state is restored from the primary copy. Primary state must always be complete and is
-a *superset of the task-local state*.
+- 对于 checkpointing ,*主副本必须成功*,并且生成*次要本地副本的失败不会使* checkpoint 失败。 如果无法创建主副本,即使已成功创建次要副本,checkpoint 也会失败。
 
-- Task-local state can have a different format than the primary state, they are not required to be byte identical. For example, it could be even possible
-that the task-local state is an in-memory consisting of heap objects, and not stored in any files.
+- 只有主副本由 jobmanager 确认和管理,次要副本属于 taskmanager ,并且它们的生命周期可以独立于它们的主副本。 例如,可以保留 3 个最新 checkpoints 的历史记录作为主副本,并且只保留最新 checkpoint 的 task 本地状态。
 
-- If a task manager is lost, the local state from all its task is lost.
+- 对于恢复,如果匹配的次要副本可用,Flink 将始终*首先尝试从 task 本地状态恢复*。 如果在次要副本恢复过程中出现任何问题,Flink 将*透明地重试从主副本恢复 task*。 仅当主副本和(可选)次要副本失败时,恢复才会失败。 
+  在这种情况下,根据配置,Flink 仍可能回退到旧的 checkpoint 。
+- Task 本地副本可能仅包含完整 task 状态的一部分(例如,写入一个本地文件时出现异常)。 在这种情况下,Flink 会首先尝试在本地恢复本地部分,非本地状态从主副本恢复。 主状态必须始终是完整的,并且是*task 本地状态的超集*。
 
-### Configuring task-local recovery
+- Task 本地状态可以具有与主状态不同的格式,它们不需要相同字节。 例如,task 本地状态甚至可能是在堆对象组成的内存中,而不是存储在任何文件中。
 
-Task-local recovery is *deactivated by default* and can be activated through Flink's configuration with the key `state.backend.local-recovery` as specified
-in `CheckpointingOptions.LOCAL_RECOVERY`. The value for this setting can either be *true* to enable or *false* (default) to disable local recovery.
+- 如果 taskmanager 丢失,则其所有 task 的本地状态都会丢失。
+### 配置 task 本地恢复
 
-Note that [unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) currently do not support task-local recovery.
+Task 本地恢复*默认禁用*,可以通过 Flink 的 CheckpointingOptions.LOCAL_RECOVERY 配置中指定的键 state.backend.local-recovery 来启用。 此设置的值可以是 *true* 以启用或 *false*(默认)以禁用本地恢复。
 
-### Details on task-local recovery for different state backends
+请注意,[unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) 目前不支持 task 本地恢复。
 
-***Limitation**: Currently, task-local recovery only covers keyed state backends. Keyed state is typically by far the largest part of the state. In the near future, we will
-also cover operator state and timers.*
+### 不同 state backends 的 task 本地恢复的详细信息
 
-The following state backends can support task-local recovery.
+***限制**:目前,task 本地恢复仅涵盖 keyed state backends。 Keyed state 通常是该状态的最大部分。 在不久的将来,我们还将介绍算子状态和计时器(timers)。*
 
-- FsStateBackend: task-local recovery is supported for keyed state. The implementation will duplicate the state to a local file. This can introduce additional write costs
-and occupy local disk space. In the future, we might also offer an implementation that keeps task-local state in memory.
+以下 state backends 可以支持 task 本地恢复。
 
-- RocksDBStateBackend: task-local recovery is supported for keyed state. For *full checkpoints*, state is duplicated to a local file. This can introduce additional write costs
-and occupy local disk space. For *incremental snapshots*, the local state is based on RocksDB's native checkpointing mechanism. This mechanism is also used as the first step
-to create the primary copy, which means that in this case no additional cost is introduced for creating the secondary copy. We simply keep the native checkpoint directory around
-instead of deleting it after uploading to the distributed store. This local copy can share active files with the working directory of RocksDB (via hard links), so for active
-files also no additional disk space is consumed for task-local recovery with incremental snapshots. Using hard links also means that the RocksDB directories must be on
-the same physical device as all the configure local recovery directories that can be used to store local state, or else establishing hard links can fail (see FLINK-10954).
-Currently, this also prevents using local recovery when RocksDB directories are configured to be located on more than one physical device.
+- FsStateBackend:keyed state 支持 task 本地恢复。 该实现会将状态复制到本地文件。 这会引入额外的写入成本并占用本地磁盘空间。 将来,我们可能还会提供一种将 task 本地状态保存在内存中的实现。
 
-### Allocation-preserving scheduling
+- RocksDBStateBackend:支持 keyed state 的 task 本地恢复。对于*全量 checkpoints*,状态被复制到本地文件。这会引入额外的写入成本并占用本地磁盘空间。对于*增量快照*,本地状态基于 RocksDB 的原生 checkpointing 机制。
+  这种机制也被用作创建主副本的第一步,这意味着在这种情况下,创建次要副本不会引入额外的成本。我们只是保留本地 checkpoint 目录,
+  而不是在上传到分布式存储后将其删除。这个本地副本可以与 RocksDB 的工作目录共享现有文件(通过硬链接),因此对于现有文件,增量快照的 task 本地恢复也不会消耗额外的磁盘空间。
+  使用硬链接还意味着 RocksDB 目录必须与所有可用于存储本地状态和本地恢复目录位于同一节点上,否则建立硬链接可能会失败(参见 FLINK-10954)。
+  目前,当 RocksDB 目录配置为多个节点上时,这也会阻止使用本地恢复。

Review Comment:
   “当 RocksDB 目录配置为多个节点上时” -> "当 RocksDB 目录配置在多个物理设备上时"



##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -26,122 +26,81 @@ under the License.
 
 # 大状态与 Checkpoint 调优
 
-This page gives a guide how to configure and tune applications that use large state.
+本文提供了如何配置和调整使用大状态的应用程序指南。
 
-## Overview
+## 概述
 
-For Flink applications to run reliably at large scale, two conditions must be fulfilled:
+Flink 应用要想在大规模场景下可靠地运行,必须要满足如下两个条件:
 
-  - The application needs to be able to take checkpoints reliably
+  - 应用程序需要能够可靠地创建 checkpoints
+  - 在应用故障后,需要有足够的资源追赶数据输入流
 
-  - The resources need to be sufficient catch up with the input data streams after a failure
+第一部分讨论如何大规模获得良好性能的 checkpoints 。
+后一部分解释了一些关于要规划使用多少资源的最佳实践。
 
-The first sections discuss how to get well performing checkpoints at scale.
-The last section explains some best practices concerning planning how many resources to use.
 
+## 监控状态和 Checkpoints
 
-## Monitoring State and Checkpoints
+监控 checkpoint 行为最简单的方法是通过 UI 的 checkpoint 部分。 [监控 Checkpoint]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}) 的文档说明了如何查看可用的 checkpoint 指标。
 
-The easiest way to monitor checkpoint behavior is via the UI's checkpoint section. The documentation
-for [checkpoint monitoring]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}) shows how to access the available checkpoint
-metrics.
+这两个指标(均通过 Task 级别 [Checkpointing 指标]({{< ref "docs/ops/metrics" >}}#checkpointing) 展示)
+以及在 [监控 Checkpoint]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}))中,当查看 checkpoint 详细信息时,特别有趣的是:
 
-The two numbers (both exposed via Task level [metrics]({{< ref "docs/ops/metrics" >}}#checkpointing)
-and in the [web interface]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}})) that are of particular interest when scaling
-up checkpoints are:
+  - 算子收到第一个 checkpoint barrier 的时间。当触发 checkpoint 的延迟时间一直很高时,这意味着 *checkpoint barrier* 需要很长时间才能从 source 到达 operators。 这通常表明系统处于反压下运行。

Review Comment:
   “延迟时间” 改成 “花费时间” 是不是更加准确



##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -166,149 +125,100 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 }
 ```
 
-## Capacity Planning
-
-This section discusses how to decide how many resources should be used for a Flink job to run reliably.
-The basic rules of thumb for capacity planning are:
-
-  - Normal operation should have enough capacity to not operate under constant *back pressure*.
-    See [back pressure monitoring]({{< ref "docs/ops/monitoring/back_pressure" >}}) for details on how to check whether the application runs under back pressure.
-
-  - Provision some extra resources on top of the resources needed to run the program back-pressure-free during failure-free time.
-    These resources are needed to "catch up" with the input data that accumulated during the time the application
-    was recovering.
-    How much that should be depends on how long recovery operations usually take (which depends on the size of the state
-    that needs to be loaded into the new TaskManagers on a failover) and how fast the scenario requires failures to recover.
-
-    *Important*: The base line should to be established with checkpointing activated, because checkpointing ties up
-    some amount of resources (such as network bandwidth).
-
-  - Temporary back pressure is usually okay, and an essential part of execution flow control during load spikes,
-    during catch-up phases, or when external systems (that are written to in a sink) exhibit temporary slowdown.
-
-  - Certain operations (like large windows) result in a spiky load for their downstream operators: 
-    In the case of windows, the downstream operators may have little to do while the window is being built,
-    and have a load to do when the windows are emitted.
-    The planning for the downstream parallelism needs to take into account how much the windows emit and how
-    fast such a spike needs to be processed.
-
-**Important:** In order to allow for adding resources later, make sure to set the *maximum parallelism* of the
-data stream program to a reasonable number. The maximum parallelism defines how high you can set the programs
-parallelism when re-scaling the program (via a savepoint).
-
-Flink's internal bookkeeping tracks parallel state in the granularity of max-parallelism-many *key groups*.
-Flink's design strives to make it efficient to have a very high value for the maximum parallelism, even if
-executing the program with a low parallelism.
-
-## Compression
-
-Flink offers optional compression (default: off) for all checkpoints and savepoints. Currently, compression always uses 
-the [snappy compression algorithm (version 1.1.4)](https://github.com/xerial/snappy-java) but we are planning to support
-custom compression algorithms in the future. Compression works on the granularity of key-groups in keyed state, i.e.
-each key-group can be decompressed individually, which is important for rescaling. 
-
-Compression can be activated through the `ExecutionConfig`:
-
+## 容量规划
+本节讨论如何确定 Flink 作业应该使用多少资源才能可靠地运行。
+容量规划的基本经验法则是:
+  - 应有足够的能力在恒定*反压*下正常运行。

Review Comment:
   应该有足够的资源保障正常运行时不出现反压



##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -166,149 +125,100 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 }
 ```
 
-## Capacity Planning
-
-This section discusses how to decide how many resources should be used for a Flink job to run reliably.
-The basic rules of thumb for capacity planning are:
-
-  - Normal operation should have enough capacity to not operate under constant *back pressure*.
-    See [back pressure monitoring]({{< ref "docs/ops/monitoring/back_pressure" >}}) for details on how to check whether the application runs under back pressure.
-
-  - Provision some extra resources on top of the resources needed to run the program back-pressure-free during failure-free time.
-    These resources are needed to "catch up" with the input data that accumulated during the time the application
-    was recovering.
-    How much that should be depends on how long recovery operations usually take (which depends on the size of the state
-    that needs to be loaded into the new TaskManagers on a failover) and how fast the scenario requires failures to recover.
-
-    *Important*: The base line should to be established with checkpointing activated, because checkpointing ties up
-    some amount of resources (such as network bandwidth).
-
-  - Temporary back pressure is usually okay, and an essential part of execution flow control during load spikes,
-    during catch-up phases, or when external systems (that are written to in a sink) exhibit temporary slowdown.
-
-  - Certain operations (like large windows) result in a spiky load for their downstream operators: 
-    In the case of windows, the downstream operators may have little to do while the window is being built,
-    and have a load to do when the windows are emitted.
-    The planning for the downstream parallelism needs to take into account how much the windows emit and how
-    fast such a spike needs to be processed.
-
-**Important:** In order to allow for adding resources later, make sure to set the *maximum parallelism* of the
-data stream program to a reasonable number. The maximum parallelism defines how high you can set the programs
-parallelism when re-scaling the program (via a savepoint).
-
-Flink's internal bookkeeping tracks parallel state in the granularity of max-parallelism-many *key groups*.
-Flink's design strives to make it efficient to have a very high value for the maximum parallelism, even if
-executing the program with a low parallelism.
-
-## Compression
-
-Flink offers optional compression (default: off) for all checkpoints and savepoints. Currently, compression always uses 
-the [snappy compression algorithm (version 1.1.4)](https://github.com/xerial/snappy-java) but we are planning to support
-custom compression algorithms in the future. Compression works on the granularity of key-groups in keyed state, i.e.
-each key-group can be decompressed individually, which is important for rescaling. 
-
-Compression can be activated through the `ExecutionConfig`:
-
+## 容量规划
+本节讨论如何确定 Flink 作业应该使用多少资源才能可靠地运行。
+容量规划的基本经验法则是:
+  - 应有足够的能力在恒定*反压*下正常运行。
+    如何检查应用程序是否在反压下运行,详细信息请参阅 [反压监控]({{< ref "docs/ops/monitoring/back_pressure" >}})。
+  - 在无故障时间内无反压运行程序所需的资源之上能够提供一些额外的资源。
+    需要这些资源来“追赶”在应用程序恢复期间积累的输入数据。
+    这通常取决于恢复操作需要多长时间(这取决于在故障转移时需要加载到新 TaskManager 中的状态大小)以及故障恢复的速度。
+    *重要提示*:基准点应该在开启 checkpointing 来建立,因为 checkpointing 会占用一些资源(例如网络带宽)。
+  - 临时反压通常是允许的,在负载峰值、追赶阶段或外部系统(sink 到外部系统)出现临时减速时,这是执行流控制的重要部分。
+  - 在某些操作下(如大窗口)会导致其下游算子的负载激增:
+    在有窗口的情况下,下游算子可能在构建窗口时几乎无事可做,而在触发窗口时有负载要做。
+    下游并行度的规划需要考虑窗口的输出量以及处理这种峰值的速度。
+
+**重要提示**:为了方便以后增加资源,请确保将流应用程序的*最大并行度*设置为一个合理的数字。最大并行度定义了当扩缩容程序时(通过 savepoint )可以设置程序并行度的高度。
+
+Flink 的内部以*键组(key groups)* 的最大并行度为粒度跟踪分布式状态。
+Flink 的设计力求使最大并行度的值达到很高的效率,即使执行程序时并行度很低。
+
+## 压缩
+Flink 为所有 checkpoints 和 savepoints 提供可选的压缩(默认:关闭)。 目前,压缩总是使用 [snappy 压缩算法(版本 1.1.4)](https://github.com/xerial/snappy-java),
+但我们计划在未来支持自定义压缩算法。 压缩作用于 keyed state 下 key-groups 的粒度,即每个 key-groups 可以单独解压缩,这对于重新缩放很重要。
+
+可以通过 `ExecutionConfig` 开启压缩:
 ```java
 ExecutionConfig executionConfig = new ExecutionConfig();
 executionConfig.setUseSnapshotCompression(true);
 ```
 
-<span class="label label-info">Note</span> The compression option has no impact on incremental snapshots, because they are using RocksDB's internal
-format which is always using snappy compression out of the box.
-
-## Task-Local Recovery
-
-### Motivation
+<span class="label label-info">注意</span> 压缩选项对增量快照没有影响,因为它们使用的是 RocksDB 的内部格式,该格式始终使用开箱即用的 snappy 压缩。
 
-In Flink's checkpointing, each task produces a snapshot of its state that is then written to a distributed store. Each task acknowledges
-a successful write of the state to the job manager by sending a handle that describes the location of the state in the distributed store.
-The job manager, in turn, collects the handles from all tasks and bundles them into a checkpoint object.
+## Task 本地恢复
+### 问题引入
+在 Flink 的 checkpointing 中,每个 task 都会生成其状态快照,然后将其写入分布式存储。 每个 task 通过发送一个描述分布式存储中的位置状态的句柄,向 jobmanager 确认状态的成功写入。
+JobManager 反过来收集所有 tasks 的句柄并将它们捆绑到一个 checkpoint 对象中。
 
-In case of recovery, the job manager opens the latest checkpoint object and sends the handles back to the corresponding tasks, which can
-then restore their state from the distributed storage. Using a distributed storage to store state has two important advantages. First, the storage
-is fault tolerant and second, all state in the distributed store is accessible to all nodes and can be easily redistributed (e.g. for rescaling).
+在恢复的情况下,jobmanager 打开最新的 checkpoint 对象并将句柄发送回相应的 tasks,然后可以从分布式存储中恢复它们的状态。 使用分布式存储来存储状态有两个重要的优势。 
+首先,存储是容错的,其次,分布式存储中的所有状态都可以被所有节点访问,并且可以很容易地重新分配(例如,用于重新缩放)。
 
-However, using a remote distributed store has also one big disadvantage: all tasks must read their state from a remote location, over the network.
-In many scenarios, recovery could reschedule failed tasks to the same task manager as in the previous run (of course there are exceptions like machine
-failures), but we still have to read remote state. This can result in *long recovery time for large states*, even if there was only a small failure on
-a single machine.
+但是,使用远程分布式存储也有一个很大的缺点:所有 tasks 都必须通过网络从远程位置读取它们的状态。
+在许多场景中,恢复可能会将失败的 tasks 重新调度到与前一次运行相同的 taskmanager 中(当然也有像机器故障这样的异常),但我们仍然必须读取远程状态。这可能导致*大状态的长时间恢复*,即使在一台机器上只有一个小故障。
 
-### Approach
+### 解决办法
 
-Task-local state recovery targets exactly this problem of long recovery time and the main idea is the following: for every checkpoint, each task
-does not only write task states to the distributed storage, but also keep *a secondary copy of the state snapshot in a storage that is local to
-the task* (e.g. on local disk or in memory). Notice that the primary store for snapshots must still be the distributed store, because local storage
-does not ensure durability under node failures and also does not provide access for other nodes to redistribute state, this functionality still
-requires the primary copy.
+Task 本地状态恢复正是针对这个恢复时间长的问题,其主要思想如下:对于每个 checkpoint ,每个 task 不仅将 task 状态写入分布式存储中,
+而且还在 task 本地存储(例如本地磁盘或内存)中保存状态快照的次要副本。请注意,快照的主存储仍然必须是分布式存储,因为本地存储不能确保节点故障下的持久性,也不能为其他节点提供重新分发状态的访问,所以这个功能仍然需要主副本。
 
-However, for each task that can be rescheduled to the previous location for recovery, we can restore state from the secondary, local
-copy and avoid the costs of reading the state remotely. Given that *many failures are not node failures and node failures typically only affect one
-or very few nodes at a time*, it is very likely that in a recovery most tasks can return to their previous location and find their local state intact.
-This is what makes local recovery effective in reducing recovery time.
+然而,对于每个 task 可以重新调度到以前的位置进行恢复的 task ,我们可以从次要本地状态副本恢复,并避免远程读取状态的成本。考虑到*许多故障不是节点故障,即使节点故障通常一次只影响一个或非常少的节点*,
+在恢复过程中,大多数 task 很可能会重新部署到它们以前的位置,并发现它们的本地状态完好无损。
+这就是 task 本地恢复有效地减少恢复时间的原因。
 
-Please note that this can come at some additional costs per checkpoint for creating and storing the secondary local state copy, depending on the
-chosen state backend and checkpointing strategy. For example, in most cases the implementation will simply duplicate the writes to the distributed
-store to a local file.
+请注意,根据所选的 state backend 和 checkpointing 策略,在每个 checkpoint 创建和存储次要本地状态副本时,可能会有一些额外的成本。
+例如,在大多数情况下,实现只是简单地将对分布式存储的写操作复制到本地文件。
 
 {{< img src="/fig/local_recovery.png" class="center" width=50% alt="Illustration of checkpointing with task-local recovery." >}}
+### 主要(分布式存储)和次要(task 本地)状态快照的关系
+Task 本地状态始终被视为次要副本,checkpoint 状态是分布式存储中的主副本。 这对 checkpointing 和恢复期间的本地状态问题有影响:

Review Comment:
   Task 本地状态始终被视为次要副本,checkpoint 状态始终以分布式存储中的副本为主



##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -166,149 +125,100 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 }
 ```
 
-## Capacity Planning
-
-This section discusses how to decide how many resources should be used for a Flink job to run reliably.
-The basic rules of thumb for capacity planning are:
-
-  - Normal operation should have enough capacity to not operate under constant *back pressure*.
-    See [back pressure monitoring]({{< ref "docs/ops/monitoring/back_pressure" >}}) for details on how to check whether the application runs under back pressure.
-
-  - Provision some extra resources on top of the resources needed to run the program back-pressure-free during failure-free time.
-    These resources are needed to "catch up" with the input data that accumulated during the time the application
-    was recovering.
-    How much that should be depends on how long recovery operations usually take (which depends on the size of the state
-    that needs to be loaded into the new TaskManagers on a failover) and how fast the scenario requires failures to recover.
-
-    *Important*: The base line should to be established with checkpointing activated, because checkpointing ties up
-    some amount of resources (such as network bandwidth).
-
-  - Temporary back pressure is usually okay, and an essential part of execution flow control during load spikes,
-    during catch-up phases, or when external systems (that are written to in a sink) exhibit temporary slowdown.
-
-  - Certain operations (like large windows) result in a spiky load for their downstream operators: 
-    In the case of windows, the downstream operators may have little to do while the window is being built,
-    and have a load to do when the windows are emitted.
-    The planning for the downstream parallelism needs to take into account how much the windows emit and how
-    fast such a spike needs to be processed.
-
-**Important:** In order to allow for adding resources later, make sure to set the *maximum parallelism* of the
-data stream program to a reasonable number. The maximum parallelism defines how high you can set the programs
-parallelism when re-scaling the program (via a savepoint).
-
-Flink's internal bookkeeping tracks parallel state in the granularity of max-parallelism-many *key groups*.
-Flink's design strives to make it efficient to have a very high value for the maximum parallelism, even if
-executing the program with a low parallelism.
-
-## Compression
-
-Flink offers optional compression (default: off) for all checkpoints and savepoints. Currently, compression always uses 
-the [snappy compression algorithm (version 1.1.4)](https://github.com/xerial/snappy-java) but we are planning to support
-custom compression algorithms in the future. Compression works on the granularity of key-groups in keyed state, i.e.
-each key-group can be decompressed individually, which is important for rescaling. 
-
-Compression can be activated through the `ExecutionConfig`:
-
+## 容量规划
+本节讨论如何确定 Flink 作业应该使用多少资源才能可靠地运行。
+容量规划的基本经验法则是:
+  - 应有足够的能力在恒定*反压*下正常运行。
+    如何检查应用程序是否在反压下运行,详细信息请参阅 [反压监控]({{< ref "docs/ops/monitoring/back_pressure" >}})。
+  - 在无故障时间内无反压运行程序所需的资源之上能够提供一些额外的资源。
+    需要这些资源来“追赶”在应用程序恢复期间积累的输入数据。
+    这通常取决于恢复操作需要多长时间(这取决于在故障转移时需要加载到新 TaskManager 中的状态大小)以及故障恢复的速度。

Review Comment:
   “故障转移” -> "故障恢复"



##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -26,122 +26,81 @@ under the License.
 
 # 大状态与 Checkpoint 调优
 
-This page gives a guide how to configure and tune applications that use large state.
+本文提供了如何配置和调整使用大状态的应用程序指南。
 
-## Overview
+## 概述
 
-For Flink applications to run reliably at large scale, two conditions must be fulfilled:
+Flink 应用要想在大规模场景下可靠地运行,必须要满足如下两个条件:
 
-  - The application needs to be able to take checkpoints reliably
+  - 应用程序需要能够可靠地创建 checkpoints
+  - 在应用故障后,需要有足够的资源追赶数据输入流
 
-  - The resources need to be sufficient catch up with the input data streams after a failure
+第一部分讨论如何大规模获得良好性能的 checkpoints 。
+后一部分解释了一些关于要规划使用多少资源的最佳实践。
 
-The first sections discuss how to get well performing checkpoints at scale.
-The last section explains some best practices concerning planning how many resources to use.
 
+## 监控状态和 Checkpoints
 
-## Monitoring State and Checkpoints
+监控 checkpoint 行为最简单的方法是通过 UI 的 checkpoint 部分。 [监控 Checkpoint]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}) 的文档说明了如何查看可用的 checkpoint 指标。
 
-The easiest way to monitor checkpoint behavior is via the UI's checkpoint section. The documentation
-for [checkpoint monitoring]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}) shows how to access the available checkpoint
-metrics.
+这两个指标(均通过 Task 级别 [Checkpointing 指标]({{< ref "docs/ops/metrics" >}}#checkpointing) 展示)
+以及在 [监控 Checkpoint]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}))中,当查看 checkpoint 详细信息时,特别有趣的是:
 
-The two numbers (both exposed via Task level [metrics]({{< ref "docs/ops/metrics" >}}#checkpointing)
-and in the [web interface]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}})) that are of particular interest when scaling
-up checkpoints are:
+  - 算子收到第一个 checkpoint barrier 的时间。当触发 checkpoint 的延迟时间一直很高时,这意味着 *checkpoint barrier* 需要很长时间才能从 source 到达 operators。 这通常表明系统处于反压下运行。
 
-  - The time until operators receive their first checkpoint barrier
-    When the time to trigger the checkpoint is constantly very high, it means that the *checkpoint barriers* need a long
-    time to travel from the source to the operators. That typically indicates that the system is operating under a
-    constant backpressure.
+  - Alignment Duration,为处理第一个和最后一个 checkpoint barrier 之间的时间。在 unaligned checkpoints 下,`exactly-once` 和 `at-least-once` checkpoints 的 subtasks 处理来自上游 subtasks 的所有数据,且没有任何中断。
+    然而,对于 aligned `exactly-once` checkpoints,已经收到 checkpoint barrier 的通道被阻止继续发送数据,直到所有剩余的通道都赶上并接收它们的 checkpoint barrier(对齐时间)。
 
-  - The alignment duration, which is defined as the time between receiving first and the last checkpoint barrier.
-    During unaligned `exactly-once` checkpoints and `at-least-once` checkpoints subtasks are processing all of the
-    data from the upstream subtasks without any interruptions. However with aligned `exactly-once` checkpoints,
-    the channels that have already received a checkpoint barrier are blocked from sending further data until
-    all of the remaining channels catch up and receive theirs checkpoint barriers (alignment time).
+理想情况下,这两个值都应该很低 - 较高的数值意味着由于存在反压(没有足够的资源来处理传入的记录),导致 checkpoint barriers 在作业中的移动速度较慢,这也可以通过处理记录的端到端延迟在增加来观察到。
+请注意,在出现瞬态反压、数据倾斜或网络问题时,这些数值偶尔会很高。
 
-Both of those values should ideally be low - higher amounts means that checkpoint barriers traveling through the job graph
-slowly, due to some back-pressure (not enough resources to process the incoming records). This can also be observed
-via increased end-to-end latency of processed records. Note that those numbers can be occasionally high in the presence of
-a transient backpressure, data skew, or network issues.
+[Unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) 可用于加快传播时间的 checkpoint barriers。 但是请注意,这并不能解决导致反压的根本问题(端到端记录延迟仍然很高)。
 
-[Unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) can be used to speed up the propagation time
-of the checkpoint barriers. However please note, that this does not solve the underlying problem that's causing the backpressure
-in the first place (and end-to-end records latency will remain high).
+## Checkpoint 调优
+应用程序可以配置定期触发 checkpoints。 当 checkpoint 完成时间超过 checkpoint 间隔时,在正在进行的 checkpoint 完成之前,不会触发下一个 checkpoint 。默认情况下,一旦正在进行的 checkpoint 完成,将立即触发下一个 checkpoint 。
 
-## Tuning Checkpointing
+当 checkpoints 完成的时间经常超过 checkpoints 基本间隔时(例如,因为状态比计划的更大,或者 checkpoints 所在的存储系统访问暂时变慢),
+系统不断地进行 checkpoints(一旦完成,新的 checkpoints 就会立即启动)。这可能意味着过多的资源被不断地束缚在 checkpointing 中,并且 checkpoint 算子进行得缓慢。 此行为对使用 checkpointed 状态的流式应用程序的影响较小,但仍可能对整体应用程序性能产生影响。
 
-Checkpoints are triggered at regular intervals that applications can configure. When a checkpoint takes longer
-to complete than the checkpoint interval, the next checkpoint is not triggered before the in-progress checkpoint
-completes. By default the next checkpoint will then be triggered immediately once the ongoing checkpoint completes.
-
-When checkpoints end up frequently taking longer than the base interval (for example because state
-grew larger than planned, or the storage where checkpoints are stored is temporarily slow),
-the system is constantly taking checkpoints (new ones are started immediately once ongoing once finish).
-That can mean that too many resources are constantly tied up in checkpointing and that the operators make too
-little progress. This behavior has less impact on streaming applications that use asynchronously checkpointed state,
-but may still have an impact on overall application performance.
-
-To prevent such a situation, applications can define a *minimum duration between checkpoints*:
+为了防止这种情况,应用程序可以定义 checkpoints 之间的*最小间隔时间*:
 
 `StreamExecutionEnvironment.getCheckpointConfig().setMinPauseBetweenCheckpoints(milliseconds)`
 
-This duration is the minimum time interval that must pass between the end of the latest checkpoint and the beginning
-of the next. The figure below illustrates how this impacts checkpointing.
+此间隔时间是指从最近一个 checkpoint 结束到下一个 checkpoint 开始之间必须经过的最小时间间隔。下图说明了这如何影响 checkpointing 。
 
 {{< img src="/fig/checkpoint_tuning.svg" class="center" width="80%" alt="Illustration how the minimum-time-between-checkpoints parameter affects checkpointing behavior." >}}
 
-*Note:* Applications can be configured (via the `CheckpointConfig`) to allow multiple checkpoints to be in progress at
-the same time. For applications with large state in Flink, this often ties up too many resources into the checkpointing.
-When a savepoint is manually triggered, it may be in process concurrently with an ongoing checkpoint.
-
-
-## Tuning RocksDB
-
-The state storage workhorse of many large scale Flink streaming applications is the *RocksDB State Backend*.
-The backend scales well beyond main memory and reliably stores large [keyed state]({{< ref "docs/dev/datastream/fault-tolerance/state" >}}).
-
-RocksDB's performance can vary with configuration, this section outlines some best-practices for tuning jobs that use the RocksDB State Backend.
-
-### Incremental Checkpoints
-
-When it comes to reducing the time that checkpoints take, activating incremental checkpoints should be one of the first considerations.
-Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, because incremental checkpoints only record the changes compared to the previous completed checkpoint, instead of producing a full, self-contained backup of the state backend.
-
-See [Incremental Checkpoints in RocksDB]({{< ref "docs/ops/state/state_backends" >}}#incremental-checkpoints) for more background information.
-
-### Timers in RocksDB or on JVM Heap
-
-Timers are stored in RocksDB by default, which is the more robust and scalable choice.
-
-When performance-tuning jobs that have few timers only (no windows, not using timers in ProcessFunction), putting those timers on the heap can increase performance.
-Use this feature carefully, as heap-based timers may increase checkpointing times and naturally cannot scale beyond memory.
-
-See [this section]({{< ref "docs/ops/state/state_backends" >}}#timers-heap-vs-rocksdb) for details on how to configure heap-based timers.
-
-### Tuning RocksDB Memory
-
-The performance of the RocksDB State Backend much depends on the amount of memory that it has available. To increase performance, adding memory can help a lot, or adjusting to which functions memory goes.
-
-By default, the RocksDB State Backend uses Flink's managed memory budget for RocksDBs buffers and caches (`state.backend.rocksdb.memory.managed: true`). Please refer to the [RocksDB Memory Management]({{< ref "docs/ops/state/state_backends" >}}#memory-management) for background on how that mechanism works.
-
-To tune memory-related performance issues, the following steps may be helpful:
-
-  - The first step to try and increase performance should be to increase the amount of managed memory. This usually improves the situation a lot, without opening up the complexity of tuning low-level RocksDB options.
-    
-    Especially with large container/process sizes, much of the total memory can typically go to RocksDB, unless the application logic requires a lot of JVM heap itself. The default managed memory fraction *(0.4)* is conservative and can often be increased when using TaskManagers with multi-GB process sizes.
-
-  - The number of write buffers in RocksDB depends on the number of states you have in your application (states across all operators in the pipeline). Each state corresponds to one ColumnFamily, which needs its own write buffers. Hence, applications with many states typically need more memory for the same performance.
-
-  - You can try and compare the performance of RocksDB with managed memory to RocksDB with per-column-family memory by setting `state.backend.rocksdb.memory.managed: false`. Especially to test against a baseline (assuming no- or gracious container memory limits) or to test for regressions compared to earlier versions of Flink, this can be useful.
-  
-    Compared to the managed memory setup (constant memory pool), not using managed memory means that RocksDB allocates memory proportional to the number of states in the application (memory footprint changes with application changes). As a rule of thumb, the non-managed mode has (unless ColumnFamily options are applied) an upper bound of roughly "140MB * num-states-across-all-tasks * num-slots". Timers count as state as well!
-
-  - If your application has many states and you see frequent MemTable flushes (write-side bottleneck), but you cannot give more memory you can increase the ratio of memory going to the write buffers (`state.backend.rocksdb.memory.write-buffer-ratio`). See [RocksDB Memory Management]({{< ref "docs/ops/state/state_backends" >}}#memory-management) for details.
-
-  - An advanced option (*expert mode*) to reduce the number of MemTable flushes in setups with many states, is to tune RocksDB's ColumnFamily options (arena block size, max background flush threads, etc.) via a `RocksDBOptionsFactory`:
-
+*注意:* 可以配置应用程序(通过`CheckpointConfig`)允许同时进行多个 checkpoints 。 对于 Flink 中状态较大的应用程序,这通常会将过多的资源使用在 checkpointing 上。
+当手动触发 savepoint 时,它可能与正在进行的 checkpoint 同时进行。
+
+## RocksDB 调优
+许多大型 Flink 流应用程序的状态存储主要是 *RocksDB State Backend*。
+后端的扩展能力远远超出了主内存,并且可靠地存储了大的 [keyed state]({{< ref "docs/dev/datastream/fault-tolerance/state" >}})。

Review Comment:
   该backend在主内存之上提供了很好的拓展能力,并且可靠地存储了大的 [keyed state]({{< ref "docs/dev/datastream/fault-tolerance/state" >}})。



##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -26,122 +26,81 @@ under the License.
 
 # 大状态与 Checkpoint 调优
 
-This page gives a guide how to configure and tune applications that use large state.
+本文提供了如何配置和调整使用大状态的应用程序指南。
 
-## Overview
+## 概述
 
-For Flink applications to run reliably at large scale, two conditions must be fulfilled:
+Flink 应用要想在大规模场景下可靠地运行,必须要满足如下两个条件:
 
-  - The application needs to be able to take checkpoints reliably
+  - 应用程序需要能够可靠地创建 checkpoints
+  - 在应用故障后,需要有足够的资源追赶数据输入流
 
-  - The resources need to be sufficient catch up with the input data streams after a failure
+第一部分讨论如何大规模获得良好性能的 checkpoints 。
+后一部分解释了一些关于要规划使用多少资源的最佳实践。
 
-The first sections discuss how to get well performing checkpoints at scale.
-The last section explains some best practices concerning planning how many resources to use.
 
+## 监控状态和 Checkpoints
 
-## Monitoring State and Checkpoints
+监控 checkpoint 行为最简单的方法是通过 UI 的 checkpoint 部分。 [监控 Checkpoint]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}) 的文档说明了如何查看可用的 checkpoint 指标。
 
-The easiest way to monitor checkpoint behavior is via the UI's checkpoint section. The documentation
-for [checkpoint monitoring]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}) shows how to access the available checkpoint
-metrics.
+这两个指标(均通过 Task 级别 [Checkpointing 指标]({{< ref "docs/ops/metrics" >}}#checkpointing) 展示)
+以及在 [监控 Checkpoint]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}}))中,当查看 checkpoint 详细信息时,特别有趣的是:
 
-The two numbers (both exposed via Task level [metrics]({{< ref "docs/ops/metrics" >}}#checkpointing)
-and in the [web interface]({{< ref "docs/ops/monitoring/checkpoint_monitoring" >}})) that are of particular interest when scaling
-up checkpoints are:
+  - 算子收到第一个 checkpoint barrier 的时间。当触发 checkpoint 的延迟时间一直很高时,这意味着 *checkpoint barrier* 需要很长时间才能从 source 到达 operators。 这通常表明系统处于反压下运行。
 
-  - The time until operators receive their first checkpoint barrier
-    When the time to trigger the checkpoint is constantly very high, it means that the *checkpoint barriers* need a long
-    time to travel from the source to the operators. That typically indicates that the system is operating under a
-    constant backpressure.
+  - Alignment Duration,为处理第一个和最后一个 checkpoint barrier 之间的时间。在 unaligned checkpoints 下,`exactly-once` 和 `at-least-once` checkpoints 的 subtasks 处理来自上游 subtasks 的所有数据,且没有任何中断。
+    然而,对于 aligned `exactly-once` checkpoints,已经收到 checkpoint barrier 的通道被阻止继续发送数据,直到所有剩余的通道都赶上并接收它们的 checkpoint barrier(对齐时间)。
 
-  - The alignment duration, which is defined as the time between receiving first and the last checkpoint barrier.
-    During unaligned `exactly-once` checkpoints and `at-least-once` checkpoints subtasks are processing all of the
-    data from the upstream subtasks without any interruptions. However with aligned `exactly-once` checkpoints,
-    the channels that have already received a checkpoint barrier are blocked from sending further data until
-    all of the remaining channels catch up and receive theirs checkpoint barriers (alignment time).
+理想情况下,这两个值都应该很低 - 较高的数值意味着由于存在反压(没有足够的资源来处理传入的记录),导致 checkpoint barriers 在作业中的移动速度较慢,这也可以通过处理记录的端到端延迟在增加来观察到。
+请注意,在出现瞬态反压、数据倾斜或网络问题时,这些数值偶尔会很高。
 
-Both of those values should ideally be low - higher amounts means that checkpoint barriers traveling through the job graph
-slowly, due to some back-pressure (not enough resources to process the incoming records). This can also be observed
-via increased end-to-end latency of processed records. Note that those numbers can be occasionally high in the presence of
-a transient backpressure, data skew, or network issues.
+[Unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) 可用于加快传播时间的 checkpoint barriers。 但是请注意,这并不能解决导致反压的根本问题(端到端记录延迟仍然很高)。
 
-[Unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) can be used to speed up the propagation time
-of the checkpoint barriers. However please note, that this does not solve the underlying problem that's causing the backpressure
-in the first place (and end-to-end records latency will remain high).
+## Checkpoint 调优
+应用程序可以配置定期触发 checkpoints。 当 checkpoint 完成时间超过 checkpoint 间隔时,在正在进行的 checkpoint 完成之前,不会触发下一个 checkpoint 。默认情况下,一旦正在进行的 checkpoint 完成,将立即触发下一个 checkpoint 。
 
-## Tuning Checkpointing
+当 checkpoints 完成的时间经常超过 checkpoints 基本间隔时(例如,因为状态比计划的更大,或者 checkpoints 所在的存储系统访问暂时变慢),
+系统不断地进行 checkpoints(一旦完成,新的 checkpoints 就会立即启动)。这可能意味着过多的资源被不断地束缚在 checkpointing 中,并且 checkpoint 算子进行得缓慢。 此行为对使用 checkpointed 状态的流式应用程序的影响较小,但仍可能对整体应用程序性能产生影响。
 
-Checkpoints are triggered at regular intervals that applications can configure. When a checkpoint takes longer
-to complete than the checkpoint interval, the next checkpoint is not triggered before the in-progress checkpoint
-completes. By default the next checkpoint will then be triggered immediately once the ongoing checkpoint completes.
-
-When checkpoints end up frequently taking longer than the base interval (for example because state
-grew larger than planned, or the storage where checkpoints are stored is temporarily slow),
-the system is constantly taking checkpoints (new ones are started immediately once ongoing once finish).
-That can mean that too many resources are constantly tied up in checkpointing and that the operators make too
-little progress. This behavior has less impact on streaming applications that use asynchronously checkpointed state,
-but may still have an impact on overall application performance.
-
-To prevent such a situation, applications can define a *minimum duration between checkpoints*:
+为了防止这种情况,应用程序可以定义 checkpoints 之间的*最小间隔时间*:
 
 `StreamExecutionEnvironment.getCheckpointConfig().setMinPauseBetweenCheckpoints(milliseconds)`
 
-This duration is the minimum time interval that must pass between the end of the latest checkpoint and the beginning
-of the next. The figure below illustrates how this impacts checkpointing.
+此间隔时间是指从最近一个 checkpoint 结束到下一个 checkpoint 开始之间必须经过的最小时间间隔。下图说明了这如何影响 checkpointing 。
 
 {{< img src="/fig/checkpoint_tuning.svg" class="center" width="80%" alt="Illustration how the minimum-time-between-checkpoints parameter affects checkpointing behavior." >}}
 
-*Note:* Applications can be configured (via the `CheckpointConfig`) to allow multiple checkpoints to be in progress at
-the same time. For applications with large state in Flink, this often ties up too many resources into the checkpointing.
-When a savepoint is manually triggered, it may be in process concurrently with an ongoing checkpoint.
-
-
-## Tuning RocksDB
-
-The state storage workhorse of many large scale Flink streaming applications is the *RocksDB State Backend*.
-The backend scales well beyond main memory and reliably stores large [keyed state]({{< ref "docs/dev/datastream/fault-tolerance/state" >}}).
-
-RocksDB's performance can vary with configuration, this section outlines some best-practices for tuning jobs that use the RocksDB State Backend.
-
-### Incremental Checkpoints
-
-When it comes to reducing the time that checkpoints take, activating incremental checkpoints should be one of the first considerations.
-Incremental checkpoints can dramatically reduce the checkpointing time in comparison to full checkpoints, because incremental checkpoints only record the changes compared to the previous completed checkpoint, instead of producing a full, self-contained backup of the state backend.
-
-See [Incremental Checkpoints in RocksDB]({{< ref "docs/ops/state/state_backends" >}}#incremental-checkpoints) for more background information.
-
-### Timers in RocksDB or on JVM Heap
-
-Timers are stored in RocksDB by default, which is the more robust and scalable choice.
-
-When performance-tuning jobs that have few timers only (no windows, not using timers in ProcessFunction), putting those timers on the heap can increase performance.
-Use this feature carefully, as heap-based timers may increase checkpointing times and naturally cannot scale beyond memory.
-
-See [this section]({{< ref "docs/ops/state/state_backends" >}}#timers-heap-vs-rocksdb) for details on how to configure heap-based timers.
-
-### Tuning RocksDB Memory
-
-The performance of the RocksDB State Backend much depends on the amount of memory that it has available. To increase performance, adding memory can help a lot, or adjusting to which functions memory goes.
-
-By default, the RocksDB State Backend uses Flink's managed memory budget for RocksDBs buffers and caches (`state.backend.rocksdb.memory.managed: true`). Please refer to the [RocksDB Memory Management]({{< ref "docs/ops/state/state_backends" >}}#memory-management) for background on how that mechanism works.
-
-To tune memory-related performance issues, the following steps may be helpful:
-
-  - The first step to try and increase performance should be to increase the amount of managed memory. This usually improves the situation a lot, without opening up the complexity of tuning low-level RocksDB options.
-    
-    Especially with large container/process sizes, much of the total memory can typically go to RocksDB, unless the application logic requires a lot of JVM heap itself. The default managed memory fraction *(0.4)* is conservative and can often be increased when using TaskManagers with multi-GB process sizes.
-
-  - The number of write buffers in RocksDB depends on the number of states you have in your application (states across all operators in the pipeline). Each state corresponds to one ColumnFamily, which needs its own write buffers. Hence, applications with many states typically need more memory for the same performance.
-
-  - You can try and compare the performance of RocksDB with managed memory to RocksDB with per-column-family memory by setting `state.backend.rocksdb.memory.managed: false`. Especially to test against a baseline (assuming no- or gracious container memory limits) or to test for regressions compared to earlier versions of Flink, this can be useful.
-  
-    Compared to the managed memory setup (constant memory pool), not using managed memory means that RocksDB allocates memory proportional to the number of states in the application (memory footprint changes with application changes). As a rule of thumb, the non-managed mode has (unless ColumnFamily options are applied) an upper bound of roughly "140MB * num-states-across-all-tasks * num-slots". Timers count as state as well!
-
-  - If your application has many states and you see frequent MemTable flushes (write-side bottleneck), but you cannot give more memory you can increase the ratio of memory going to the write buffers (`state.backend.rocksdb.memory.write-buffer-ratio`). See [RocksDB Memory Management]({{< ref "docs/ops/state/state_backends" >}}#memory-management) for details.
-
-  - An advanced option (*expert mode*) to reduce the number of MemTable flushes in setups with many states, is to tune RocksDB's ColumnFamily options (arena block size, max background flush threads, etc.) via a `RocksDBOptionsFactory`:
-
+*注意:* 可以配置应用程序(通过`CheckpointConfig`)允许同时进行多个 checkpoints 。 对于 Flink 中状态较大的应用程序,这通常会将过多的资源使用在 checkpointing 上。
+当手动触发 savepoint 时,它可能与正在进行的 checkpoint 同时进行。
+
+## RocksDB 调优
+许多大型 Flink 流应用程序的状态存储主要是 *RocksDB State Backend*。
+后端的扩展能力远远超出了主内存,并且可靠地存储了大的 [keyed state]({{< ref "docs/dev/datastream/fault-tolerance/state" >}})。
+
+RocksDB 的性能可能因配置而异,本节讲述了一些使用 RocksDB State Backend 调优作业的最佳实践。
+
+### 增量 Checkpoint
+在减少 checkpoints 花费的时间方面,开启增量 checkpoints 应该是首要考虑因素。
+与完整 checkpoints 相比,增量 checkpoints 可以显着减少 checkpointing 时间,因为增量 checkpoints 仅存储与先前完成的 checkpoint 不同的增量文件,而不是存储全量数据备份。
+有关更多背景信息,请参阅 [RocksDB 中的增量 Checkpoints]({{< ref "docs/ops/state/state_backends" >}}#incremental-checkpoints)。
+### RocksDB 或 JVM 堆中的计时器
+
+计时器(Timer) 默认存储在 RocksDB 中,这是更健壮和可扩展的选择。
+当性能调优作业只有少量计时器(没有窗口,且在 ProcessFunction 中不使用计时器)时,将这些计时器放在堆中可以提高性能。
+请谨慎使用此功能,因为基于堆的计时器可能会增加 checkpointing 时间,并且自然无法扩展到内存之外。
+有关如何配置基于堆的计时器的详细信息,请参阅 [计时器(内存 vs. RocksDB)]({{< ref "docs/ops/state/state_backends" >}}#timers-heap-vs-rocksdb)。
+
+### RocksDB 内存调优
+RocksDB State Backend 的性能在很大程度上取决于它可用的内存量。为了提高性能,增加内存会有很大的帮助,或者调整内存的功能。
+默认情况下,RocksDB State Backend 将 Flink 的托管内存用于 RocksDB 的缓冲区和缓存(`State.Backend.RocksDB.memory.managed:true`)。请参考 [RocksDB 内存管理]({{< ref "docs/ops/state/state_backends" >}}#memory-management) 了解该机制的工作原理。
+关于 RocksDB 内存调优相关的性能问题,如下步骤可能会有所帮助:
+  - 尝试提高性能的第一步应该是增加托管内存的大小。这通常会大大改善这种情况,而不会增加调整 RocksDB 低级选项的复杂性。

Review Comment:
   而不是通过调整 RocksDB 底层参数引入复杂性



##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -166,149 +125,100 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 }
 ```
 
-## Capacity Planning
-
-This section discusses how to decide how many resources should be used for a Flink job to run reliably.
-The basic rules of thumb for capacity planning are:
-
-  - Normal operation should have enough capacity to not operate under constant *back pressure*.
-    See [back pressure monitoring]({{< ref "docs/ops/monitoring/back_pressure" >}}) for details on how to check whether the application runs under back pressure.
-
-  - Provision some extra resources on top of the resources needed to run the program back-pressure-free during failure-free time.
-    These resources are needed to "catch up" with the input data that accumulated during the time the application
-    was recovering.
-    How much that should be depends on how long recovery operations usually take (which depends on the size of the state
-    that needs to be loaded into the new TaskManagers on a failover) and how fast the scenario requires failures to recover.
-
-    *Important*: The base line should to be established with checkpointing activated, because checkpointing ties up
-    some amount of resources (such as network bandwidth).
-
-  - Temporary back pressure is usually okay, and an essential part of execution flow control during load spikes,
-    during catch-up phases, or when external systems (that are written to in a sink) exhibit temporary slowdown.
-
-  - Certain operations (like large windows) result in a spiky load for their downstream operators: 
-    In the case of windows, the downstream operators may have little to do while the window is being built,
-    and have a load to do when the windows are emitted.
-    The planning for the downstream parallelism needs to take into account how much the windows emit and how
-    fast such a spike needs to be processed.
-
-**Important:** In order to allow for adding resources later, make sure to set the *maximum parallelism* of the
-data stream program to a reasonable number. The maximum parallelism defines how high you can set the programs
-parallelism when re-scaling the program (via a savepoint).
-
-Flink's internal bookkeeping tracks parallel state in the granularity of max-parallelism-many *key groups*.
-Flink's design strives to make it efficient to have a very high value for the maximum parallelism, even if
-executing the program with a low parallelism.
-
-## Compression
-
-Flink offers optional compression (default: off) for all checkpoints and savepoints. Currently, compression always uses 
-the [snappy compression algorithm (version 1.1.4)](https://github.com/xerial/snappy-java) but we are planning to support
-custom compression algorithms in the future. Compression works on the granularity of key-groups in keyed state, i.e.
-each key-group can be decompressed individually, which is important for rescaling. 
-
-Compression can be activated through the `ExecutionConfig`:
-
+## 容量规划
+本节讨论如何确定 Flink 作业应该使用多少资源才能可靠地运行。
+容量规划的基本经验法则是:
+  - 应有足够的能力在恒定*反压*下正常运行。
+    如何检查应用程序是否在反压下运行,详细信息请参阅 [反压监控]({{< ref "docs/ops/monitoring/back_pressure" >}})。
+  - 在无故障时间内无反压运行程序所需的资源之上能够提供一些额外的资源。
+    需要这些资源来“追赶”在应用程序恢复期间积累的输入数据。
+    这通常取决于恢复操作需要多长时间(这取决于在故障转移时需要加载到新 TaskManager 中的状态大小)以及故障恢复的速度。
+    *重要提示*:基准点应该在开启 checkpointing 来建立,因为 checkpointing 会占用一些资源(例如网络带宽)。
+  - 临时反压通常是允许的,在负载峰值、追赶阶段或外部系统(sink 到外部系统)出现临时减速时,这是执行流控制的重要部分。
+  - 在某些操作下(如大窗口)会导致其下游算子的负载激增:
+    在有窗口的情况下,下游算子可能在构建窗口时几乎无事可做,而在触发窗口时有负载要做。
+    下游并行度的规划需要考虑窗口的输出量以及处理这种峰值的速度。
+
+**重要提示**:为了方便以后增加资源,请确保将流应用程序的*最大并行度*设置为一个合理的数字。最大并行度定义了当扩缩容程序时(通过 savepoint )可以设置程序并行度的高度。
+
+Flink 的内部以*键组(key groups)* 的最大并行度为粒度跟踪分布式状态。
+Flink 的设计力求使最大并行度的值达到很高的效率,即使执行程序时并行度很低。
+
+## 压缩
+Flink 为所有 checkpoints 和 savepoints 提供可选的压缩(默认:关闭)。 目前,压缩总是使用 [snappy 压缩算法(版本 1.1.4)](https://github.com/xerial/snappy-java),
+但我们计划在未来支持自定义压缩算法。 压缩作用于 keyed state 下 key-groups 的粒度,即每个 key-groups 可以单独解压缩,这对于重新缩放很重要。
+
+可以通过 `ExecutionConfig` 开启压缩:
 ```java
 ExecutionConfig executionConfig = new ExecutionConfig();
 executionConfig.setUseSnapshotCompression(true);
 ```
 
-<span class="label label-info">Note</span> The compression option has no impact on incremental snapshots, because they are using RocksDB's internal
-format which is always using snappy compression out of the box.
-
-## Task-Local Recovery
-
-### Motivation
+<span class="label label-info">注意</span> 压缩选项对增量快照没有影响,因为它们使用的是 RocksDB 的内部格式,该格式始终使用开箱即用的 snappy 压缩。
 
-In Flink's checkpointing, each task produces a snapshot of its state that is then written to a distributed store. Each task acknowledges
-a successful write of the state to the job manager by sending a handle that describes the location of the state in the distributed store.
-The job manager, in turn, collects the handles from all tasks and bundles them into a checkpoint object.
+## Task 本地恢复
+### 问题引入
+在 Flink 的 checkpointing 中,每个 task 都会生成其状态快照,然后将其写入分布式存储。 每个 task 通过发送一个描述分布式存储中的位置状态的句柄,向 jobmanager 确认状态的成功写入。
+JobManager 反过来收集所有 tasks 的句柄并将它们捆绑到一个 checkpoint 对象中。
 
-In case of recovery, the job manager opens the latest checkpoint object and sends the handles back to the corresponding tasks, which can
-then restore their state from the distributed storage. Using a distributed storage to store state has two important advantages. First, the storage
-is fault tolerant and second, all state in the distributed store is accessible to all nodes and can be easily redistributed (e.g. for rescaling).
+在恢复的情况下,jobmanager 打开最新的 checkpoint 对象并将句柄发送回相应的 tasks,然后可以从分布式存储中恢复它们的状态。 使用分布式存储来存储状态有两个重要的优势。 
+首先,存储是容错的,其次,分布式存储中的所有状态都可以被所有节点访问,并且可以很容易地重新分配(例如,用于重新缩放)。
 
-However, using a remote distributed store has also one big disadvantage: all tasks must read their state from a remote location, over the network.
-In many scenarios, recovery could reschedule failed tasks to the same task manager as in the previous run (of course there are exceptions like machine
-failures), but we still have to read remote state. This can result in *long recovery time for large states*, even if there was only a small failure on
-a single machine.
+但是,使用远程分布式存储也有一个很大的缺点:所有 tasks 都必须通过网络从远程位置读取它们的状态。
+在许多场景中,恢复可能会将失败的 tasks 重新调度到与前一次运行相同的 taskmanager 中(当然也有像机器故障这样的异常),但我们仍然必须读取远程状态。这可能导致*大状态的长时间恢复*,即使在一台机器上只有一个小故障。
 
-### Approach
+### 解决办法
 
-Task-local state recovery targets exactly this problem of long recovery time and the main idea is the following: for every checkpoint, each task
-does not only write task states to the distributed storage, but also keep *a secondary copy of the state snapshot in a storage that is local to
-the task* (e.g. on local disk or in memory). Notice that the primary store for snapshots must still be the distributed store, because local storage
-does not ensure durability under node failures and also does not provide access for other nodes to redistribute state, this functionality still
-requires the primary copy.
+Task 本地状态恢复正是针对这个恢复时间长的问题,其主要思想如下:对于每个 checkpoint ,每个 task 不仅将 task 状态写入分布式存储中,
+而且还在 task 本地存储(例如本地磁盘或内存)中保存状态快照的次要副本。请注意,快照的主存储仍然必须是分布式存储,因为本地存储不能确保节点故障下的持久性,也不能为其他节点提供重新分发状态的访问,所以这个功能仍然需要主副本。
 
-However, for each task that can be rescheduled to the previous location for recovery, we can restore state from the secondary, local
-copy and avoid the costs of reading the state remotely. Given that *many failures are not node failures and node failures typically only affect one
-or very few nodes at a time*, it is very likely that in a recovery most tasks can return to their previous location and find their local state intact.
-This is what makes local recovery effective in reducing recovery time.
+然而,对于每个 task 可以重新调度到以前的位置进行恢复的 task ,我们可以从次要本地状态副本恢复,并避免远程读取状态的成本。考虑到*许多故障不是节点故障,即使节点故障通常一次只影响一个或非常少的节点*,
+在恢复过程中,大多数 task 很可能会重新部署到它们以前的位置,并发现它们的本地状态完好无损。
+这就是 task 本地恢复有效地减少恢复时间的原因。
 
-Please note that this can come at some additional costs per checkpoint for creating and storing the secondary local state copy, depending on the
-chosen state backend and checkpointing strategy. For example, in most cases the implementation will simply duplicate the writes to the distributed
-store to a local file.
+请注意,根据所选的 state backend 和 checkpointing 策略,在每个 checkpoint 创建和存储次要本地状态副本时,可能会有一些额外的成本。
+例如,在大多数情况下,实现只是简单地将对分布式存储的写操作复制到本地文件。
 
 {{< img src="/fig/local_recovery.png" class="center" width=50% alt="Illustration of checkpointing with task-local recovery." >}}
+### 主要(分布式存储)和次要(task 本地)状态快照的关系
+Task 本地状态始终被视为次要副本,checkpoint 状态是分布式存储中的主副本。 这对 checkpointing 和恢复期间的本地状态问题有影响:
 
-### Relationship of primary (distributed store) and secondary (task-local) state snapshots
-
-Task-local state is always considered a secondary copy, the ground truth of the checkpoint state is the primary copy in the distributed store. This
-has implications for problems with local state during checkpointing and recovery:
-
-- For checkpointing, the *primary copy must be successful* and a failure to produce the *secondary, local copy will not fail* the checkpoint. A checkpoint
-will fail if the primary copy could not be created, even if the secondary copy was successfully created.
-
-- Only the primary copy is acknowledged and managed by the job manager, secondary copies are owned by task managers and their life cycles can be
-independent from their primary copies. For example, it is possible to retain a history of the 3 latest checkpoints as primary copies and only keep
-the task-local state of the latest checkpoint.
-
-- For recovery, Flink will always *attempt to restore from task-local state first*, if a matching secondary copy is available. If any problem occurs during
-the recovery from the secondary copy, Flink will *transparently retry to recover the task from the primary copy*. Recovery only fails, if primary
-and the (optional) secondary copy failed. In this case, depending on the configuration Flink could still fall back to an older checkpoint.
-
-- It is possible that the task-local copy contains only parts of the full task state (e.g. exception while writing one local file). In this case,
-Flink will first try to recover local parts locally, non-local state is restored from the primary copy. Primary state must always be complete and is
-a *superset of the task-local state*.
+- 对于 checkpointing ,*主副本必须成功*,并且生成*次要本地副本的失败不会使* checkpoint 失败。 如果无法创建主副本,即使已成功创建次要副本,checkpoint 也会失败。
 
-- Task-local state can have a different format than the primary state, they are not required to be byte identical. For example, it could be even possible
-that the task-local state is an in-memory consisting of heap objects, and not stored in any files.
+- 只有主副本由 jobmanager 确认和管理,次要副本属于 taskmanager ,并且它们的生命周期可以独立于它们的主副本。 例如,可以保留 3 个最新 checkpoints 的历史记录作为主副本,并且只保留最新 checkpoint 的 task 本地状态。
 
-- If a task manager is lost, the local state from all its task is lost.
+- 对于恢复,如果匹配的次要副本可用,Flink 将始终*首先尝试从 task 本地状态恢复*。 如果在次要副本恢复过程中出现任何问题,Flink 将*透明地重试从主副本恢复 task*。 仅当主副本和(可选)次要副本失败时,恢复才会失败。 
+  在这种情况下,根据配置,Flink 仍可能回退到旧的 checkpoint 。
+- Task 本地副本可能仅包含完整 task 状态的一部分(例如,写入一个本地文件时出现异常)。 在这种情况下,Flink 会首先尝试在本地恢复本地部分,非本地状态从主副本恢复。 主状态必须始终是完整的,并且是*task 本地状态的超集*。
 
-### Configuring task-local recovery
+- Task 本地状态可以具有与主状态不同的格式,它们不需要相同字节。 例如,task 本地状态甚至可能是在堆对象组成的内存中,而不是存储在任何文件中。
 
-Task-local recovery is *deactivated by default* and can be activated through Flink's configuration with the key `state.backend.local-recovery` as specified
-in `CheckpointingOptions.LOCAL_RECOVERY`. The value for this setting can either be *true* to enable or *false* (default) to disable local recovery.
+- 如果 taskmanager 丢失,则其所有 task 的本地状态都会丢失。
+### 配置 task 本地恢复
 
-Note that [unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) currently do not support task-local recovery.
+Task 本地恢复*默认禁用*,可以通过 Flink 的 CheckpointingOptions.LOCAL_RECOVERY 配置中指定的键 state.backend.local-recovery 来启用。 此设置的值可以是 *true* 以启用或 *false*(默认)以禁用本地恢复。
 
-### Details on task-local recovery for different state backends
+请注意,[unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) 目前不支持 task 本地恢复。
 
-***Limitation**: Currently, task-local recovery only covers keyed state backends. Keyed state is typically by far the largest part of the state. In the near future, we will
-also cover operator state and timers.*
+### 不同 state backends 的 task 本地恢复的详细信息
 
-The following state backends can support task-local recovery.
+***限制**:目前,task 本地恢复仅涵盖 keyed state backends。 Keyed state 通常是该状态的最大部分。 在不久的将来,我们还将介绍算子状态和计时器(timers)。*

Review Comment:
   “我们还将介绍算子状态和计时器”  -> “我们还将支持算子状态和计时器”



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] liuzhuang2017 commented on pull request #19413: [FLINK-16078] [docs-zh] Translate "Tuning Checkpoints and Large State"

Posted by GitBox <gi...@apache.org>.
liuzhuang2017 commented on PR #19413:
URL: https://github.com/apache/flink/pull/19413#issuecomment-1099175434

   @Myasuka ,Hi ,I am modifying this pr according to your suggestion . Can you help me rewiew this pr if you have free time? Thank you very much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] liuzhuang2017 commented on pull request #19413: [FLINK-16078] [docs-zh] Translate "Tuning Checkpoints and Large State…

Posted by GitBox <gi...@apache.org>.
liuzhuang2017 commented on PR #19413:
URL: https://github.com/apache/flink/pull/19413#issuecomment-1094655528

   @Myasuka ,Thank you for your review, I will update this pr soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] liuzhuang2017 commented on a diff in pull request #19413: [FLINK-16078] [docs-zh] Translate "Tuning Checkpoints and Large State…

Posted by GitBox <gi...@apache.org>.
liuzhuang2017 commented on code in PR #19413:
URL: https://github.com/apache/flink/pull/19413#discussion_r847981768


##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -166,149 +125,101 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 }
 ```
 
-## Capacity Planning
-
-This section discusses how to decide how many resources should be used for a Flink job to run reliably.
-The basic rules of thumb for capacity planning are:
-
-  - Normal operation should have enough capacity to not operate under constant *back pressure*.
-    See [back pressure monitoring]({{< ref "docs/ops/monitoring/back_pressure" >}}) for details on how to check whether the application runs under back pressure.
-
-  - Provision some extra resources on top of the resources needed to run the program back-pressure-free during failure-free time.
-    These resources are needed to "catch up" with the input data that accumulated during the time the application
-    was recovering.
-    How much that should be depends on how long recovery operations usually take (which depends on the size of the state
-    that needs to be loaded into the new TaskManagers on a failover) and how fast the scenario requires failures to recover.
-
-    *Important*: The base line should to be established with checkpointing activated, because checkpointing ties up
-    some amount of resources (such as network bandwidth).
-
-  - Temporary back pressure is usually okay, and an essential part of execution flow control during load spikes,
-    during catch-up phases, or when external systems (that are written to in a sink) exhibit temporary slowdown.
+## 容量规划
+本节讨论如何确定 Flink 作业应该使用多少资源才能可靠地运行。
+容量规划的基本经验法则是:
+  - 正常运行应该有足够的能力在恒定的*反压*下运行。

Review Comment:
   这里我提了一个 hotfix pr,在 https://github.com/apache/flink/pull/19429,帮忙看看,感谢。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] Myasuka commented on a diff in pull request #19413: [FLINK-16078] [docs-zh] Translate "Tuning Checkpoints and Large State…

Posted by GitBox <gi...@apache.org>.
Myasuka commented on code in PR #19413:
URL: https://github.com/apache/flink/pull/19413#discussion_r847908442


##########
docs/content.zh/docs/ops/state/large_state_tuning.md:
##########
@@ -166,149 +125,101 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 }
 ```
 
-## Capacity Planning
-
-This section discusses how to decide how many resources should be used for a Flink job to run reliably.
-The basic rules of thumb for capacity planning are:
-
-  - Normal operation should have enough capacity to not operate under constant *back pressure*.
-    See [back pressure monitoring]({{< ref "docs/ops/monitoring/back_pressure" >}}) for details on how to check whether the application runs under back pressure.
-
-  - Provision some extra resources on top of the resources needed to run the program back-pressure-free during failure-free time.
-    These resources are needed to "catch up" with the input data that accumulated during the time the application
-    was recovering.
-    How much that should be depends on how long recovery operations usually take (which depends on the size of the state
-    that needs to be loaded into the new TaskManagers on a failover) and how fast the scenario requires failures to recover.
-
-    *Important*: The base line should to be established with checkpointing activated, because checkpointing ties up
-    some amount of resources (such as network bandwidth).
-
-  - Temporary back pressure is usually okay, and an essential part of execution flow control during load spikes,
-    during catch-up phases, or when external systems (that are written to in a sink) exhibit temporary slowdown.
+## 容量规划
+本节讨论如何确定 Flink 作业应该使用多少资源才能可靠地运行。
+容量规划的基本经验法则是:
+  - 正常运行应该有足够的能力在恒定的*反压*下运行。
+    如何检查应用程序是否在反压下运行的详细信息,请参阅 [反压监控]({{< ref "docs/ops/monitoring/back_pressure" >}})。
+  - 在无故障时间内无反压程序运行所需资源之上提供一些额外的资源。
+    需要这些资源来“赶上”在应用程序恢复期间积累的输入数据。
+    这通常取决于恢复操作需要多长时间(这取决于需要在故障转移时加载到新 TaskManager 中的状态大小)以及故障恢复的速度。
+    *重要*:基线应该在开启 checkpointing 的情况下建立,因为 checkpointing 会占用一些资源(例如网络带宽)。
+  - 临时反压通常是可以的,在负载峰值、追赶阶段或外部系统(写入接收器中)出现临时减速时,这是执行流控制的重要部分。
 
-  - Certain operations (like large windows) result in a spiky load for their downstream operators: 
-    In the case of windows, the downstream operators may have little to do while the window is being built,
-    and have a load to do when the windows are emitted.
-    The planning for the downstream parallelism needs to take into account how much the windows emit and how
-    fast such a spike needs to be processed.
+  - 某些操作(如大窗口)会导致其下游算子的负载激增:
+    在窗口的情况下,下游算子可能在构建窗口时几乎无事可做,而在窗口发出时有负载要做。
+    下游并行度的设置需要考虑到窗口输出多少以及需要以多快的速度处理这种峰值。
 
-**Important:** In order to allow for adding resources later, make sure to set the *maximum parallelism* of the
-data stream program to a reasonable number. The maximum parallelism defines how high you can set the programs
-parallelism when re-scaling the program (via a savepoint).
+**重要**:为了方便以后添加资源,请务必将数据流程序的*最大并行度*设置为合理的数字。 最大并行度定义了在重新缩放程序时(通过 savepoint )可以设置程序并行度的高度。
 
-Flink's internal bookkeeping tracks parallel state in the granularity of max-parallelism-many *key groups*.
-Flink's design strives to make it efficient to have a very high value for the maximum parallelism, even if
-executing the program with a low parallelism.
+Flink 的内部以多个*键组(key groups)* 的最大并行度为粒度跟踪并行状态。
+Flink 的设计力求使最大并行度的值达到很高的效率,即使执行程序时并行度很低。
 
-## Compression
-
-Flink offers optional compression (default: off) for all checkpoints and savepoints. Currently, compression always uses 
-the [snappy compression algorithm (version 1.1.4)](https://github.com/xerial/snappy-java) but we are planning to support
-custom compression algorithms in the future. Compression works on the granularity of key-groups in keyed state, i.e.
-each key-group can be decompressed individually, which is important for rescaling. 
-
-Compression can be activated through the `ExecutionConfig`:
+## 压缩
+Flink 为所有 checkpoints 和 savepoints 提供可选的压缩(默认:关闭)。 目前,压缩总是使用 [snappy 压缩算法(版本 1.1.4)](https://github.com/xerial/snappy-java),
+但我们计划在未来支持自定义压缩算法。 压缩作用于 keyed state 下 key-groups 的粒度,即每个 key-groups 可以单独解压缩,这对于重新缩放很重要。
 
+可以通过 `ExecutionConfig` 开启压缩:
 ```java
 ExecutionConfig executionConfig = new ExecutionConfig();
 executionConfig.setUseSnapshotCompression(true);
 ```
 
-<span class="label label-info">Note</span> The compression option has no impact on incremental snapshots, because they are using RocksDB's internal
-format which is always using snappy compression out of the box.
-
-## Task-Local Recovery
+<span class="label label-info">注意</span> 压缩选项对增量快照没有影响,因为它们使用的是 RocksDB 的内部格式,该格式始终使用开箱即用的 snappy 压缩。
 
-### Motivation
+## Task 本地恢复
+### 问题引入
+在 Flink 的 checkpointing 中,每个 task 都会生成其状态快照,然后将其写入分布式存储。 每个 task 通过发送一个描述分布式存储中的位置状态的句柄,向 jobmanager 确认状态的成功写入。
+JobManager 反过来收集所有 tasks 的句柄并将它们捆绑到一个 checkpoint 对象中。
 
-In Flink's checkpointing, each task produces a snapshot of its state that is then written to a distributed store. Each task acknowledges
-a successful write of the state to the job manager by sending a handle that describes the location of the state in the distributed store.
-The job manager, in turn, collects the handles from all tasks and bundles them into a checkpoint object.
+在恢复的情况下,jobmanager 打开最新的 checkpoint 对象并将句柄发送回相应的 tasks,然后可以从分布式存储中恢复它们的状态。 使用分布式存储来存储状态有两个重要的优势。 
+首先,存储是容错的,其次,分布式存储中的所有状态都可以被所有节点访问,并且可以很容易地重新分配(例如,用于重新缩放)。
 
-In case of recovery, the job manager opens the latest checkpoint object and sends the handles back to the corresponding tasks, which can
-then restore their state from the distributed storage. Using a distributed storage to store state has two important advantages. First, the storage
-is fault tolerant and second, all state in the distributed store is accessible to all nodes and can be easily redistributed (e.g. for rescaling).
+但是,使用远程分布式存储也有一个很大的缺点:所有 tasks 都必须通过网络从远程位置读取它们的状态。
+在许多场景中,恢复可能会将失败的 tasks 重新调度到与前一次运行相同的 taskmanager 中(当然也有像机器故障这样的异常),但我们仍然必须读取远程状态。这可能导致*大状态的长时间恢复*,即使在一台机器上只有一个小故障。
 
-However, using a remote distributed store has also one big disadvantage: all tasks must read their state from a remote location, over the network.
-In many scenarios, recovery could reschedule failed tasks to the same task manager as in the previous run (of course there are exceptions like machine
-failures), but we still have to read remote state. This can result in *long recovery time for large states*, even if there was only a small failure on
-a single machine.
+### 解决办法
 
-### Approach
+Task 本地状态恢复正是针对这个恢复时间长的问题,其主要思想如下:对于每个 checkpoint ,每个 task 不仅将 task 状态写入分布式存储中,
+而且还在 task 本地存储(例如本地磁盘或内存)中保存状态快照的次要副本。请注意,快照的主存储仍然必须是分布式存储,因为本地存储不能确保节点故障下的持久性,也不能为其他节点提供重新分发状态的访问,所以这个功能仍然需要主副本。
 
-Task-local state recovery targets exactly this problem of long recovery time and the main idea is the following: for every checkpoint, each task
-does not only write task states to the distributed storage, but also keep *a secondary copy of the state snapshot in a storage that is local to
-the task* (e.g. on local disk or in memory). Notice that the primary store for snapshots must still be the distributed store, because local storage
-does not ensure durability under node failures and also does not provide access for other nodes to redistribute state, this functionality still
-requires the primary copy.
+然而,对于每个 task 可以重新调度到以前的位置进行恢复的 task ,我们可以从次要本地状态副本恢复,并避免远程读取状态的成本。考虑到*许多故障不是节点故障,节点故障通常一次只影响一个或非常少的节点*,
+在恢复过程中,大多数 task 很可能会返回到它们以前的位置,并发现它们的本地状态完好无损。
+这就是 task 本地恢复有效地减少恢复时间的原因。
 
-However, for each task that can be rescheduled to the previous location for recovery, we can restore state from the secondary, local
-copy and avoid the costs of reading the state remotely. Given that *many failures are not node failures and node failures typically only affect one
-or very few nodes at a time*, it is very likely that in a recovery most tasks can return to their previous location and find their local state intact.
-This is what makes local recovery effective in reducing recovery time.
-
-Please note that this can come at some additional costs per checkpoint for creating and storing the secondary local state copy, depending on the
-chosen state backend and checkpointing strategy. For example, in most cases the implementation will simply duplicate the writes to the distributed
-store to a local file.
+请注意,根据所选的 state backend 和 checkpointing 策略,在每个 checkpoint 创建和存储次要本地状态副本时,可能会有一些额外的成本。
+例如,在大多数情况下,实现只是简单地将对分布式存储的写操作复制到本地文件。
 
 {{< img src="/fig/local_recovery.png" class="center" width=50% alt="Illustration of checkpointing with task-local recovery." >}}
+### 主要(分布式存储)和次要(task 本地)状态快照的关系
+Task 本地状态始终被视为次要副本,checkpoint 状态是分布式存储中的主副本。 这对 checkpointing 和恢复期间的本地状态问题有影响:
 
-### Relationship of primary (distributed store) and secondary (task-local) state snapshots
-
-Task-local state is always considered a secondary copy, the ground truth of the checkpoint state is the primary copy in the distributed store. This
-has implications for problems with local state during checkpointing and recovery:
-
-- For checkpointing, the *primary copy must be successful* and a failure to produce the *secondary, local copy will not fail* the checkpoint. A checkpoint
-will fail if the primary copy could not be created, even if the secondary copy was successfully created.
-
-- Only the primary copy is acknowledged and managed by the job manager, secondary copies are owned by task managers and their life cycles can be
-independent from their primary copies. For example, it is possible to retain a history of the 3 latest checkpoints as primary copies and only keep
-the task-local state of the latest checkpoint.
-
-- For recovery, Flink will always *attempt to restore from task-local state first*, if a matching secondary copy is available. If any problem occurs during
-the recovery from the secondary copy, Flink will *transparently retry to recover the task from the primary copy*. Recovery only fails, if primary
-and the (optional) secondary copy failed. In this case, depending on the configuration Flink could still fall back to an older checkpoint.
-
-- It is possible that the task-local copy contains only parts of the full task state (e.g. exception while writing one local file). In this case,
-Flink will first try to recover local parts locally, non-local state is restored from the primary copy. Primary state must always be complete and is
-a *superset of the task-local state*.
+- 对于 checkpointing ,*主副本必须成功*,并且生成*次要本地副本的失败不会使* checkpoint 失败。 如果无法创建主副本,即使已成功创建次要副本,checkpoint 也会失败。
 
-- Task-local state can have a different format than the primary state, they are not required to be byte identical. For example, it could be even possible
-that the task-local state is an in-memory consisting of heap objects, and not stored in any files.
+- 只有主副本由 jobmanager 确认和管理,次要副本属于 taskmanager ,并且它们的生命周期可以独立于它们的主副本。 例如,可以保留 3 个最新 checkpoints 的历史记录作为主副本,并且只保留最新 checkpoint 的 task 本地状态。
 
-- If a task manager is lost, the local state from all its task is lost.
+- 对于恢复,如果匹配的次要副本可用,Flink 将始终*首先尝试从 task 本地状态恢复*。 如果在次要副本恢复过程中出现任何问题,Flink 将*透明地重试从主副本恢复 task*。 仅当主副本和(可选)次要副本失败时,恢复才会失败。 
+  在这种情况下,根据配置,Flink 仍可能回退到旧的 checkpoint 。
+- Task 本地副本可能仅包含完整 task 状态的一部分(例如,写入一个本地文件时出现异常)。 在这种情况下,Flink 会首先尝试在本地恢复本地部分,非本地状态从主副本恢复。 主状态必须始终是完整的,并且是*task 本地状态的超集*。
 
-### Configuring task-local recovery
+- Task 本地状态可以具有与主状态不同的格式,它们不需要相同字节。 例如,task 本地状态甚至可能是在堆对象组成的内存中,而不是存储在任何文件中。
 
-Task-local recovery is *deactivated by default* and can be activated through Flink's configuration with the key `state.backend.local-recovery` as specified
-in `CheckpointingOptions.LOCAL_RECOVERY`. The value for this setting can either be *true* to enable or *false* (default) to disable local recovery.
+- 如果 taskmanager 丢失,则其所有 task 的本地状态都会丢失。
+### 配置 task 本地恢复
 
-Note that [unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) currently do not support task-local recovery.
+Task 本地恢复*默认禁用*,可以通过 Flink 的 CheckpointingOptions.LOCAL_RECOVERY 配置中指定的键 state.backend.local-recovery 来启用。 此设置的值可以是 *true* 以启用或 *false*(默认)以禁用本地恢复。
 
-### Details on task-local recovery for different state backends
+请注意,[unaligned checkpoints]({{< ref "docs/ops/state/checkpoints" >}}#unaligned-checkpoints) 目前不支持 task 本地恢复。
 
-***Limitation**: Currently, task-local recovery only covers keyed state backends. Keyed state is typically by far the largest part of the state. In the near future, we will
-also cover operator state and timers.*
+### 不同 state backends 的 task 本地恢复的详细信息
 
-The following state backends can support task-local recovery.
+***限制**:目前,task 本地恢复仅涵盖 keyed state backends。 Keyed state 通常是该状态的最大部分。 在不久的将来,我们还将介绍算子状态和计时器(timers)。*

Review Comment:
   嗯,这里弄成斜体了,如果是这样子看的话,确实不存在多星号的问题



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org