You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by ch...@apache.org on 2017/06/25 06:44:46 UTC

[03/21] flink git commit: [FLINK-6784][docs] update externalized checkpoints documentation

[FLINK-6784][docs] update externalized checkpoints documentation

This closes #4033.


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/e1269ed8
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/e1269ed8
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/e1269ed8

Branch: refs/heads/master
Commit: e1269ed8fa2d66978e2816697d6e3fe3dfabf7f6
Parents: d713ed1
Author: Nico Kruber <ni...@data-artisans.com>
Authored: Wed May 31 16:22:22 2017 +0200
Committer: zentol <ch...@apache.org>
Committed: Fri Jun 23 14:14:29 2017 +0200

----------------------------------------------------------------------
 docs/dev/stream/checkpointing.md | 21 ++++++++++++++
 docs/setup/checkpoints.md        | 54 +++++++++++++++++++++++++++++++----
 2 files changed, 70 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/e1269ed8/docs/dev/stream/checkpointing.md
----------------------------------------------------------------------
diff --git a/docs/dev/stream/checkpointing.md b/docs/dev/stream/checkpointing.md
index 3a0a1ae..4fe06c1 100644
--- a/docs/dev/stream/checkpointing.md
+++ b/docs/dev/stream/checkpointing.md
@@ -72,6 +72,8 @@ Other parameters for checkpointing include:
 
     This option cannot be used when a minimum time between checkpoints is defined.
 
+  - *externalized checkpoints*: You can configure periodic checkpoints to be persisted externally. Externalized checkpoints write their meta data out to persistent storage and are *not* automatically cleaned up when the job fails. This way, you will have a checkpoint around to resume from if your job fails. There are more details in the [deployment notes on externalized checkpoints](../../setup/checkpoints.html#externalized-checkpoints).
+
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
 {% highlight java %}
@@ -93,6 +95,9 @@ env.getCheckpointConfig().setCheckpointTimeout(60000);
 
 // allow only one checkpoint to be in progress at the same time
 env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
+
+// enable externalized checkpoints which are retained after job cancellation
+env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
 {% endhighlight %}
 </div>
 <div data-lang="scala" markdown="1">
@@ -119,6 +124,22 @@ env.getCheckpointConfig.setMaxConcurrentCheckpoints(1)
 </div>
 </div>
 
+### Related Config Options
+
+Some more parameters and/or defaults may be set via `conf/flink-conf.yaml` (see [configuration](config.html) for a full guide):
+
+- `state.backend`: The backend that will be used to store operator state checkpoints if checkpointing is enabled. Supported backends:
+   -  `jobmanager`: In-memory state, backup to JobManager's/ZooKeeper's memory. Should be used only for minimal state (Kafka offsets) or testing and local debugging.
+   -  `filesystem`: State is in-memory on the TaskManagers, and state snapshots are stored in a file system. Supported are all filesystems supported by Flink, for example HDFS, S3, ...
+
+- `state.backend.fs.checkpointdir`: Directory for storing checkpoints in a Flink supported filesystem. Note: State backend must be accessible from the JobManager, use `file://` only for local setups.
+
+- `state.backend.rocksdb.checkpointdir`:  The local directory for storing RocksDB files, or a list of directories separated by the systems directory delimiter (for example ‘:’ (colon) on Linux/Unix). (DEFAULT value is `taskmanager.tmp.dirs`)
+
+- `state.checkpoints.dir`: The target directory for meta data of [externalized checkpoints](../../setup/checkpoints.html#externalized-checkpoints).
+
+- `state.checkpoints.num-retained`: The number of completed checkpoint instances to retain. Having more than one allows recovery fallback to an earlier checkpoints if the latest checkpoint is corrupt. (Default: 1)
+
 {% top %}
 
 

http://git-wip-us.apache.org/repos/asf/flink/blob/e1269ed8/docs/setup/checkpoints.md
----------------------------------------------------------------------
diff --git a/docs/setup/checkpoints.md b/docs/setup/checkpoints.md
index 65d14b7..0070cb1 100644
--- a/docs/setup/checkpoints.md
+++ b/docs/setup/checkpoints.md
@@ -28,12 +28,22 @@ under the License.
 
 ## Overview
 
-TBD
+Checkpoints make state in Flink fault tolerant by allowing state and the
+corresponding stream positions to be recovered, thereby giving the application
+the same semantics as a failure-free execution.
 
+See [Checkpointing](../dev/stream/checkpointing.html) for how to enable and
+configure checkpoints for your program.
 
 ## Externalized Checkpoints
 
-You can configure periodic checkpoints to be persisted externally. Externalized checkpoints write their meta data out to persistent storage and are *not* automatically cleaned up when the job fails. This way, you will have a checkpoint around to resume from if your job fails.
+Checkpoints are by default not persisted externally and are only used to
+resume a job from failures. They are deleted when a program is cancelled.
+You can, however, configure periodic checkpoints to be persisted externally
+similarly to [savepoints](savepoints.html). These *externalized checkpoints*
+write their meta data out to persistent storage and are *not* automatically
+cleaned up when the job fails. This way, you will have a checkpoint around
+to resume from if your job fails.
 
 ```java
 CheckpointConfig config = env.getCheckpointConfig();
@@ -46,12 +56,46 @@ The `ExternalizedCheckpointCleanup` mode configures what happens with externaliz
 
 - **`ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION`**: Delete the externalized checkpoint when the job is cancelled. The checkpoint state will only be available if the job fails.
 
-The **target directory** for the checkpoint is determined from the default checkpoint directory configuration. This is configured via the configuration key `state.checkpoints.dir`, which should point to the desired target directory:
+### Directory Structure
+
+Similarly to [savepoints](savepoints.html), an externalized checkpoint consists
+of a meta data file and, depending on the state back-end, some additional data
+files. The **target directory** for the externalized checkpoint's meta data is
+determined from the configuration key `state.checkpoints.dir` which, currently,
+can only be set via the configuration files.
 
 ```
 state.checkpoints.dir: hdfs:///checkpoints/
 ```
 
-This directory will then contain the checkpoint meta data required to restore the checkpoint. The actual checkpoint files will still be available in their configured directory. You currently can only set this via the configuration files.
+This directory will then contain the checkpoint meta data required to restore
+the checkpoint. For the `MemoryStateBackend`, this meta data file will be
+self-contained and no further files are needed.
+
+`FsStateBackend` and `RocksDBStateBackend` write separate data files
+and only write the paths to these files into the meta data file. These data
+files are stored at the path given to the state back-end during construction.
+
+```java
+env.setStateBackend(new RocksDBStateBackend("hdfs:///checkpoints-data/");
+```
+
+### Difference to Savepoints
+
+Externalized checkpoints have a few differences from [savepoints](savepoints.html). They
+- use a state backend specific (low-level) data format,
+- may be incremental,
+- do not support Flink specific features like rescaling.
+
+### Resuming from an externalized checkpoint
 
-Follow the [savepoint guide]({{ site.baseurl }}/setup/cli.html#savepoints) when you want to resume from a specific checkpoint.
+A job may be resumed from an externalized checkpoint just as from a savepoint
+by using the checkpoint's meta data file instead (see the
+[savepoint restore guide](cli.html#restore-a-savepoint)). Note that if the
+meta data file is not self-contained, the jobmanager needs to have access to
+the data files it refers to (see [Directory Structure](#directory-structure)
+above).
+
+```sh
+$ bin/flink run -s :checkpointMetaDataPath [:runArgs]
+```