You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by se...@apache.org on 2017/01/16 10:55:27 UTC

[04/10] flink git commit: [hotfix] [docs] Move section about internal snapshot implementation from 'state_backends.md' to 'stream_checkpointing.md'

[hotfix] [docs] Move section about internal snapshot implementation from 'state_backends.md' to 'stream_checkpointing.md'


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/ac193d6a
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/ac193d6a
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/ac193d6a

Branch: refs/heads/release-1.2
Commit: ac193d6a94ddb7e0fb0b86879ae25d979be11496
Parents: a562e3d
Author: Stephan Ewen <se...@apache.org>
Authored: Tue Jan 10 09:55:25 2017 +0100
Committer: Stephan Ewen <se...@apache.org>
Committed: Mon Jan 16 11:53:14 2017 +0100

----------------------------------------------------------------------
 docs/internals/state_backends.md       | 12 ------------
 docs/internals/stream_checkpointing.md | 14 +++++++++++++-
 2 files changed, 13 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/ac193d6a/docs/internals/state_backends.md
----------------------------------------------------------------------
diff --git a/docs/internals/state_backends.md b/docs/internals/state_backends.md
index f6a4cc7..11d46ed 100644
--- a/docs/internals/state_backends.md
+++ b/docs/internals/state_backends.md
@@ -69,15 +69,3 @@ Examples are "ValueState", "ListState", etc. Flink's runtime encodes the states
 *Raw State* is state that users and operators keep in their own data structures. When checkpointed, they only write a sequence of bytes into
 the checkpoint. Flink knows nothing about the state's data structures and sees only the raw bytes.
 
-
-## Checkpointing Procedure
-
-When operator snapshots are taken, there are two parts: the **synchronous** and the **asynchronous** parts.
-
-Operators and state backends provide their snapshots as a Java `FutureTask`. That task contains the state where the *synchronous* part
-is completed and the *asynchronous* part is pending. The asynchronous part is then executed by a background thread for that checkpoint.
-
-Operators that checkpoint purely synchronously return an already completed `FutureTask`.
-If an asynchronous operation needs to be performed, it is executed in the `run()` method of that `FutureTask`.
-
-The tasks are cancelable, in order to release streams and other resource consuming handles.

http://git-wip-us.apache.org/repos/asf/flink/blob/ac193d6a/docs/internals/stream_checkpointing.md
----------------------------------------------------------------------
diff --git a/docs/internals/stream_checkpointing.md b/docs/internals/stream_checkpointing.md
index 75493ca..e8b3e46 100644
--- a/docs/internals/stream_checkpointing.md
+++ b/docs/internals/stream_checkpointing.md
@@ -138,7 +138,7 @@ in *at least once* mode.
 
 Note that the above described mechanism implies that operators stop processing input records while they are storing a snapshot of their state in the *state backend*. This *synchronous* state snapshot introduces a delay every time a snapshot is taken.
 
-It is possible to let an operator continue processing while it stores its state snapshot, effectively letting the state snapshots happen *asynchronously* in the background. To do that, the operator must be able to produce a state object that should be stored in a way such that further modifications to the operator state do not affect that state object.
+It is possible to let an operator continue processing while it stores its state snapshot, effectively letting the state snapshots happen *asynchronously* in the background. To do that, the operator must be able to produce a state object that should be stored in a way such that further modifications to the operator state do not affect that state object. An example for that are *copy-on-write* style data structures, such as used for example in RocksDB.
 
 After receiving the checkpoint barriers on its inputs, the operator starts the asynchronous snapshot copying of its state. It immediately emits the barrier to its outputs and continues with the regular stream processing. Once the background copy process has completed, it acknowledges the checkpoint to the checkpoint coordinator (the JobManager). The checkpoint is now only complete after all sinks received the barriers and all stateful operators acknowledged their completed backup (which may be later than the barriers reaching the sinks).
 
@@ -152,3 +152,15 @@ entire distributed dataflow, and gives each operator the state that was snapshot
 stream from position <i>S<sub>k</sub></i>. For example in Apache Kafka, that means telling the consumer to start fetching from offset <i>S<sub>k</sub></i>.
 
 If state was snapshotted incrementally, the operators start with the state of the latest full snapshot and then apply a series of incremental snapshot updates to that state.
+
+## Operator Snapshot Implementation
+
+When operator snapshots are taken, there are two parts: the **synchronous** and the **asynchronous** parts.
+
+Operators and state backends provide their snapshots as a Java `FutureTask`. That task contains the state where the *synchronous* part
+is completed and the *asynchronous* part is pending. The asynchronous part is then executed by a background thread for that checkpoint.
+
+Operators that checkpoint purely synchronously return an already completed `FutureTask`.
+If an asynchronous operation needs to be performed, it is executed in the `run()` method of that `FutureTask`.
+
+The tasks are cancelable, in order to release streams and other resource consuming handles.