You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ta...@apache.org on 2022/03/24 11:53:02 UTC

[flink] branch release-1.15 updated (588ac8b -> 6ea644d)

This is an automated email from the ASF dual-hosted git repository.

tangyun pushed a change to branch release-1.15
in repository https://gitbox.apache.org/repos/asf/flink.git.


    from 588ac8b  [hotfix][runtime] Adds missing configuration parameter to cleanup error log message
     new 6e5240b  [FLINK-26700][docs] Document restore mode in chinese
     new 6ea644d  [hotfix][docs] correct checkpoint directory in restore mode document

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/content.zh/docs/ops/state/savepoints.md | 47 +++++++++++++++++++++++++++-
 docs/content/docs/ops/state/savepoints.md    |  4 +--
 2 files changed, 48 insertions(+), 3 deletions(-)

[flink] 01/02: [FLINK-26700][docs] Document restore mode in chinese

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tangyun pushed a commit to branch release-1.15
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 6e5240bd7811cac48e0ce66b792f76c97de1b57c
Author: wangfeifan <zo...@163.com>
AuthorDate: Wed Mar 23 00:57:47 2022 +0800

    [FLINK-26700][docs] Document restore mode in chinese
    
    Co-authored-by: Yun Tang <my...@live.com>
---
 docs/content.zh/docs/ops/state/savepoints.md | 47 +++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/docs/content.zh/docs/ops/state/savepoints.md b/docs/content.zh/docs/ops/state/savepoints.md
index 51785b6..d445060 100644
--- a/docs/content.zh/docs/ops/state/savepoints.md
+++ b/docs/content.zh/docs/ops/state/savepoints.md
@@ -157,10 +157,55 @@ $ bin/flink run -s :savepointPath [:runArgs]
 
 默认情况下,resume 操作将尝试将 Savepoint 的所有状态映射回你要还原的程序。 如果删除了运算符,则可以通过 `--allowNonRestoredState`(short:`-n`)选项跳过无法映射到新程序的状态:
 
+#### Restore 模式
+
+`Restore 模式` 决定了在 restore 之后谁拥有Savepoint 或者 [externalized checkpoint]({{< ref "docs/ops/state/checkpoints" >}}/#resuming-from-a-retained-checkpoint)的文件的所有权。在这种语境下 Savepoint 和 externalized checkpoint 的行为相似。
+这里我们将它们都称为“快照”,除非另有明确说明。
+
+如前所述,restore 模式决定了谁来接管我们从中恢复的快照文件的所有权。快照可被用户或者 Flink 自身拥有。如果快照归用户所有,Flink 不会删除其中的文件,而且 Flink 不能依赖该快照中文件的存在,因为它可能在 Flink 的控制之外被删除。
+
+每种 restore 模式都有特定的用途。尽管如此,我们仍然认为默认的 *NO_CLAIM* 模式在大多数情况下是一个很好的折中方案,因为它在提供明确的所有权归属的同时只给恢复后第一个 checkpoint 带来较小的代价。
+
+你可以通过如下方式指定 restore 模式:
 ```shell
-$ bin/flink run -s :savepointPath -n [:runArgs]
+$ bin/flink run -s :savepointPath -restoreMode :mode -n [:runArgs]
 ```
 
+**NO_CLAIM (默认的)**
+
+在 *NO_CLAIM* 模式下,Flink 不会接管快照的所有权。它会将快照的文件置于用户的控制之中,并且永远不会删除其中的任何文件。该模式下可以从同一个快照上启动多个作业。
+
+为保证 Flink 不会依赖于该快照的任何文件,它会强制第一个(成功的) checkpoint 为全量 checkpoint 而不是增量的。这仅对`state.backend: rocksdb` 有影响,因为其他 backend 总是创建全量 checkpoint。
+
+一旦第一个全量的 checkpoint 完成后,所有后续的 checkpoint 会照常创建。所以,一旦一个 checkpoint 成功制作,就可以删除原快照。在此之前不能删除原快照,因为没有任何完成的 checkpoint,Flink 会在故障时尝试从初始的快照恢复。
+
+<div style="text-align: center">
+  {{< img src="/fig/restore-mode-no_claim.svg" alt="NO_CLAIM restore mode" width="70%" >}}
+</div>
+
+**CLAIM**
+
+另一个可选的模式是 *CLAIM* 模式。该模式下 Flink 将声称拥有快照的所有权,并且本质上将其作为 checkpoint 对待:控制其生命周期并且可能会在其永远不会被用于恢复的时候删除它。因此,手动删除快照和从同一个快照上启动两个作业都是不安全的。Flink 会保持[配置数量]({{< ref "docs/dev/datastream/fault-tolerance/checkpointing" >}}/#state-checkpoints-num-retained)的 checkpoint。
+
+<div style="text-align: center">
+  {{< img src="/fig/restore-mode-claim.svg" alt="CLAIM restore mode" width="70%" >}}
+</div>
+
+{{< hint info >}}
+**注意:**
+1. Retained checkpoints 被存储在 `<checkpoint_dir>/<job_id>/chk_<x>` 这样的目录中。Flink 不会接管 `<checkpoint_dir>/<job_id>` 目录的所有权,而只会接管 `chk_<x>` 的所有权。Flink 不会删除旧作业的目录。
+
+2. [Native](#savepoint-format) 格式支持增量的 RocksDB savepoints。对于这些 savepoints,Flink 将所有 SST 存储在 savepoints 目录中。这意味着这些 savepoints 是自包含和目录可移动的。然而,在 CLAIM 模式下恢复时,后续的 checkpoints 可能会复用一些 SST 文件,这反过来会阻止在 savepoints 被清理时删除 savepoints 目录。 Flink 之后运行期间可能会删除复用的SST 文件,但不会删除 savepoints 目录。因此,如果在 CLAIM 模式下恢复,Flink 可能会留下一个空的 savepoints 目录。
+{{< /hint >}}
+
+**LEGACY**
+
+Legacy 模式是 Flink 在 1.15 之前的工作方式。该模式下 Flink 永远不会删除初始恢复的 checkpoint。同时,用户也不清楚是否可以删除它。导致该的问题原因是, Flink 会在用来恢复的 checkpoint 之上创建增量的 checkpoint,因此后续的 checkpoint 都有可能会依赖于用于恢复的那个 checkpoint。总而言之,恢复的 checkpoint 的所有权没有明确的界定。
+
+<div style="text-align: center">
+  {{< img src="/fig/restore-mode-legacy.svg" alt="LEGACY restore mode" width="70%" >}}
+</div>
+
 ### 删除 Savepoint
 
 ```shell

[flink] 02/02: [hotfix][docs] correct checkpoint directory in restore mode document

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tangyun pushed a commit to branch release-1.15
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 6ea644d463531e5309122dbcbc1dd681e24fd4ce
Author: wangfeifan <zo...@163.com>
AuthorDate: Wed Mar 23 23:43:16 2022 +0800

    [hotfix][docs] correct checkpoint directory in restore mode document
---
 docs/content.zh/docs/ops/state/savepoints.md | 2 +-
 docs/content/docs/ops/state/savepoints.md    | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/content.zh/docs/ops/state/savepoints.md b/docs/content.zh/docs/ops/state/savepoints.md
index d445060..87a2890 100644
--- a/docs/content.zh/docs/ops/state/savepoints.md
+++ b/docs/content.zh/docs/ops/state/savepoints.md
@@ -193,7 +193,7 @@ $ bin/flink run -s :savepointPath -restoreMode :mode -n [:runArgs]
 
 {{< hint info >}}
 **注意:**
-1. Retained checkpoints 被存储在 `<checkpoint_dir>/<job_id>/chk_<x>` 这样的目录中。Flink 不会接管 `<checkpoint_dir>/<job_id>` 目录的所有权,而只会接管 `chk_<x>` 的所有权。Flink 不会删除旧作业的目录。
+1. Retained checkpoints 被存储在 `<checkpoint_dir>/<job_id>/chk-<x>` 这样的目录中。Flink 不会接管 `<checkpoint_dir>/<job_id>` 目录的所有权,而只会接管 `chk-<x>` 的所有权。Flink 不会删除旧作业的目录。
 
 2. [Native](#savepoint-format) 格式支持增量的 RocksDB savepoints。对于这些 savepoints,Flink 将所有 SST 存储在 savepoints 目录中。这意味着这些 savepoints 是自包含和目录可移动的。然而,在 CLAIM 模式下恢复时,后续的 checkpoints 可能会复用一些 SST 文件,这反过来会阻止在 savepoints 被清理时删除 savepoints 目录。 Flink 之后运行期间可能会删除复用的SST 文件,但不会删除 savepoints 目录。因此,如果在 CLAIM 模式下恢复,Flink 可能会留下一个空的 savepoints 目录。
 {{< /hint >}}
diff --git a/docs/content/docs/ops/state/savepoints.md b/docs/content/docs/ops/state/savepoints.md
index 3058b83..a24c4db 100644
--- a/docs/content/docs/ops/state/savepoints.md
+++ b/docs/content/docs/ops/state/savepoints.md
@@ -256,8 +256,8 @@ of checkpoints.
 
 {{< hint info >}}
 **Attention:**
-1. Retained checkpoints are stored in a path like `<checkpoint_dir>/<job_id>/chk_<x>`. Flink does not
-take ownership of the `<checkpoint_dir>/<job_id>` directory, but only the `chk_<x>`. The directory
+1. Retained checkpoints are stored in a path like `<checkpoint_dir>/<job_id>/chk-<x>`. Flink does not
+take ownership of the `<checkpoint_dir>/<job_id>` directory, but only the `chk-<x>`. The directory
 of the old job will not be deleted by Flink
 
 2. [Native](#savepoint-format) format supports incremental RocksDB savepoints. For those savepoints Flink puts all