You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/01/26 08:14:41 UTC

[GitHub] [flink] infoverload commented on a change in pull request #18431: [FLINK-25024][docs] Add Changelog backend docs

infoverload commented on a change in pull request #18431:
URL: https://github.com/apache/flink/pull/18431#discussion_r792382240



##########
File path: docs/content/docs/ops/metrics.md
##########
@@ -1203,6 +1203,59 @@ Note that for failed checkpoints, metrics are updated on a best efforts basis an
 ### RocksDB
 Certain RocksDB native metrics are available but disabled by default, you can find full documentation [here]({{< ref "docs/deployment/config" >}}#rocksdb-native-metrics)
 
+### State changelog

Review comment:
       ```suggestion
   ### State Changelog
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref "docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The last one (upload time) can be decreased by [Incremental checkpoints]({{< ref "#incremental-checkpoints" >}}).
+However, most Incremental State Backends perform some form of compaction periodically, which results in re-uploading the
+old state in addition to the new changes. In large deployments, the probability of at least one task uploading lots of
+data tend to be very high in every checkpoint.

Review comment:
       ```suggestion
   Upload time can be decreased by [incremental checkpoints]({{< ref "#incremental-checkpoints" >}}).
   However, most incremental state backends perform some form of compaction periodically, which results in re-uploading the
   old state in addition to the new changes. In large deployments, the probability of at least one task uploading lots of
   data tends to be very high in every checkpoint.
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref "docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The last one (upload time) can be decreased by [Incremental checkpoints]({{< ref "#incremental-checkpoints" >}}).
+However, most Incremental State Backends perform some form of compaction periodically, which results in re-uploading the
+old state in addition to the new changes. In large deployments, the probability of at least one task uploading lots of
+data tend to be very high in every checkpoint.
+
+With Changelog enabled, Flink uploads state changes continuously, forming a changelog. On checkpoint, only the relevant
+part of this changelog needs to be uploaded. Independently, configured state backend is snapshotted in the
+background periodically. Upon successful upload, changelog is truncated.
+
+As a result, asynchronous phase duration is reduced, as well as synchronous phase - because no data needs to be flushed
+to disk. In particular, long-tail latency is improved.
+
+On the flip side, resource usage is higher:
+
+- more files are created on DFS
+- more IO bandwidth is used to upload state changes
+- more CPU used to serialize state changes
+- more memory used by Task Managers to buffer state changes
+- todo: more details after testing, maybe link to blogpost
+
+Recovery time is another thing to consider. Depending on the `state.backend.changelog.periodic-materialize.interval`,
+changelog can become lengthy and replaying it may take more time. However, recovery time combined with checkpoint
+duration will likely be still lower than in non-changelog setup, providing lower end-to-end latency even in failover
+case.
+
+For more details, see [FLIP-158](https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints).
+
+### Installation
+
+Changelog jars are included into the standard Flink distribution.
+
+Please make sure to [add]({{< ref "docs/deployment/filesystems/overview" >}}) the necessary filesystem plugins.
+
+### Configuration
+
+An example configuration in yaml:
+```yaml
+state.backend.changelog.enabled: true
+state.backend.changelog.storage: filesystem # currently, only filesystem and memory (for tests) are supported
+dstl.dfs.base-path: s3://<bucket-name> # similar to state.checkpoints.dir
+```
+
+Please keep the following defaults (see [limitations](#limitations)):
+```yaml
+execution.checkpointing.max-concurrent-checkpoints: 1
+state.backend.local-recovery: false
+```
+
+Please refer to [configuration reference]({{< ref "docs/deployment/config#state-changelog-options" >}}) for other options.
+
+Changelog can also be enabled or disabled per-job programmatically:

Review comment:
       ```suggestion
   Changelog can also be enabled or disabled per job programmatically:
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref "docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The last one (upload time) can be decreased by [Incremental checkpoints]({{< ref "#incremental-checkpoints" >}}).
+However, most Incremental State Backends perform some form of compaction periodically, which results in re-uploading the
+old state in addition to the new changes. In large deployments, the probability of at least one task uploading lots of
+data tend to be very high in every checkpoint.
+
+With Changelog enabled, Flink uploads state changes continuously, forming a changelog. On checkpoint, only the relevant
+part of this changelog needs to be uploaded. Independently, configured state backend is snapshotted in the
+background periodically. Upon successful upload, changelog is truncated.
+
+As a result, asynchronous phase duration is reduced, as well as synchronous phase - because no data needs to be flushed
+to disk. In particular, long-tail latency is improved.
+
+On the flip side, resource usage is higher:
+
+- more files are created on DFS
+- more IO bandwidth is used to upload state changes
+- more CPU used to serialize state changes
+- more memory used by Task Managers to buffer state changes
+- todo: more details after testing, maybe link to blogpost
+
+Recovery time is another thing to consider. Depending on the `state.backend.changelog.periodic-materialize.interval`,
+changelog can become lengthy and replaying it may take more time. However, recovery time combined with checkpoint
+duration will likely be still lower than in non-changelog setup, providing lower end-to-end latency even in failover
+case.
+
+For more details, see [FLIP-158](https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints).
+
+### Installation
+
+Changelog jars are included into the standard Flink distribution.
+
+Please make sure to [add]({{< ref "docs/deployment/filesystems/overview" >}}) the necessary filesystem plugins.

Review comment:
       ```suggestion
   Changelog JARs are included into the standard Flink distribution.
   
   Make sure to [add]({{< ref "docs/deployment/filesystems/overview" >}}) the necessary filesystem plugins.
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref "docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by Asynchronous snapshots

Review comment:
       ```suggestion
   2. Snapshot creation time (so-called synchronous phase), addressed by asynchronous snapshots
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref "docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The last one (upload time) can be decreased by [Incremental checkpoints]({{< ref "#incremental-checkpoints" >}}).
+However, most Incremental State Backends perform some form of compaction periodically, which results in re-uploading the
+old state in addition to the new changes. In large deployments, the probability of at least one task uploading lots of
+data tend to be very high in every checkpoint.
+
+With Changelog enabled, Flink uploads state changes continuously, forming a changelog. On checkpoint, only the relevant
+part of this changelog needs to be uploaded. Independently, configured state backend is snapshotted in the
+background periodically. Upon successful upload, changelog is truncated.
+
+As a result, asynchronous phase duration is reduced, as well as synchronous phase - because no data needs to be flushed
+to disk. In particular, long-tail latency is improved.
+
+On the flip side, resource usage is higher:
+
+- more files are created on DFS
+- more IO bandwidth is used to upload state changes
+- more CPU used to serialize state changes
+- more memory used by Task Managers to buffer state changes
+- todo: more details after testing, maybe link to blogpost
+
+Recovery time is another thing to consider. Depending on the `state.backend.changelog.periodic-materialize.interval`,
+changelog can become lengthy and replaying it may take more time. However, recovery time combined with checkpoint
+duration will likely be still lower than in non-changelog setup, providing lower end-to-end latency even in failover
+case.

Review comment:
       ```suggestion
   Recovery time is another thing to consider. Depending on the `state.backend.changelog.periodic-materialize.interval` setting,
   the changelog can become lengthy and replaying it may take more time. However, recovery time combined with checkpoint
   duration will likely still be lower than in non-changelog setups, providing lower end-to-end latency even in failover
   case.
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref "docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The last one (upload time) can be decreased by [Incremental checkpoints]({{< ref "#incremental-checkpoints" >}}).
+However, most Incremental State Backends perform some form of compaction periodically, which results in re-uploading the
+old state in addition to the new changes. In large deployments, the probability of at least one task uploading lots of
+data tend to be very high in every checkpoint.
+
+With Changelog enabled, Flink uploads state changes continuously, forming a changelog. On checkpoint, only the relevant
+part of this changelog needs to be uploaded. Independently, configured state backend is snapshotted in the
+background periodically. Upon successful upload, changelog is truncated.
+
+As a result, asynchronous phase duration is reduced, as well as synchronous phase - because no data needs to be flushed
+to disk. In particular, long-tail latency is improved.
+
+On the flip side, resource usage is higher:
+
+- more files are created on DFS
+- more IO bandwidth is used to upload state changes
+- more CPU used to serialize state changes
+- more memory used by Task Managers to buffer state changes
+- todo: more details after testing, maybe link to blogpost
+
+Recovery time is another thing to consider. Depending on the `state.backend.changelog.periodic-materialize.interval`,
+changelog can become lengthy and replaying it may take more time. However, recovery time combined with checkpoint
+duration will likely be still lower than in non-changelog setup, providing lower end-to-end latency even in failover
+case.
+
+For more details, see [FLIP-158](https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints).
+
+### Installation
+
+Changelog jars are included into the standard Flink distribution.
+
+Please make sure to [add]({{< ref "docs/deployment/filesystems/overview" >}}) the necessary filesystem plugins.
+
+### Configuration
+
+An example configuration in yaml:
+```yaml
+state.backend.changelog.enabled: true
+state.backend.changelog.storage: filesystem # currently, only filesystem and memory (for tests) are supported
+dstl.dfs.base-path: s3://<bucket-name> # similar to state.checkpoints.dir
+```
+
+Please keep the following defaults (see [limitations](#limitations)):
+```yaml
+execution.checkpointing.max-concurrent-checkpoints: 1
+state.backend.local-recovery: false
+```
+
+Please refer to [configuration reference]({{< ref "docs/deployment/config#state-changelog-options" >}}) for other options.
+
+Changelog can also be enabled or disabled per-job programmatically:
+{{< tabs  >}}
+{{< tab "Java" >}}
+```java
+StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
+env.enableChangelogStateBackend(true);
+```
+{{< /tab >}}
+{{< tab "Scala" >}}
+```scala
+val env = StreamExecutionEnvironment.getExecutionEnvironment()
+env.enableChangelogStateBackend(true)
+```
+{{< /tab >}}
+{{< tab "Python" >}}
+```python
+env = StreamExecutionEnvironment.get_execution_environment()
+env.enable_changelog_statebackend(true)
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+### Monitoring
+
+Available metrics are listed [here]({{< ref "docs/ops/metrics#changelog" >}}).
+
+In the UI, if a task is back-pressured by writing state changes, it will be shown as busy (red).

Review comment:
       ```suggestion
   If a task is backpressured by writing state changes, it will be shown as busy (red) in the UI.
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}

Review comment:
       ```suggestion
   {{< hint warning >}} This feature is in experimental status. {{< /hint >}}
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref "docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The last one (upload time) can be decreased by [Incremental checkpoints]({{< ref "#incremental-checkpoints" >}}).
+However, most Incremental State Backends perform some form of compaction periodically, which results in re-uploading the
+old state in addition to the new changes. In large deployments, the probability of at least one task uploading lots of
+data tend to be very high in every checkpoint.
+
+With Changelog enabled, Flink uploads state changes continuously, forming a changelog. On checkpoint, only the relevant
+part of this changelog needs to be uploaded. Independently, configured state backend is snapshotted in the
+background periodically. Upon successful upload, changelog is truncated.
+
+As a result, asynchronous phase duration is reduced, as well as synchronous phase - because no data needs to be flushed
+to disk. In particular, long-tail latency is improved.
+
+On the flip side, resource usage is higher:
+
+- more files are created on DFS
+- more IO bandwidth is used to upload state changes
+- more CPU used to serialize state changes
+- more memory used by Task Managers to buffer state changes
+- todo: more details after testing, maybe link to blogpost
+
+Recovery time is another thing to consider. Depending on the `state.backend.changelog.periodic-materialize.interval`,
+changelog can become lengthy and replaying it may take more time. However, recovery time combined with checkpoint
+duration will likely be still lower than in non-changelog setup, providing lower end-to-end latency even in failover
+case.
+
+For more details, see [FLIP-158](https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints).
+
+### Installation
+
+Changelog jars are included into the standard Flink distribution.
+
+Please make sure to [add]({{< ref "docs/deployment/filesystems/overview" >}}) the necessary filesystem plugins.
+
+### Configuration
+
+An example configuration in yaml:

Review comment:
       ```suggestion
   Here is an example configuration in YAML:
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref "docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The last one (upload time) can be decreased by [Incremental checkpoints]({{< ref "#incremental-checkpoints" >}}).
+However, most Incremental State Backends perform some form of compaction periodically, which results in re-uploading the
+old state in addition to the new changes. In large deployments, the probability of at least one task uploading lots of
+data tend to be very high in every checkpoint.
+
+With Changelog enabled, Flink uploads state changes continuously, forming a changelog. On checkpoint, only the relevant
+part of this changelog needs to be uploaded. Independently, configured state backend is snapshotted in the
+background periodically. Upon successful upload, changelog is truncated.
+
+As a result, asynchronous phase duration is reduced, as well as synchronous phase - because no data needs to be flushed
+to disk. In particular, long-tail latency is improved.
+
+On the flip side, resource usage is higher:
+
+- more files are created on DFS
+- more IO bandwidth is used to upload state changes
+- more CPU used to serialize state changes
+- more memory used by Task Managers to buffer state changes
+- todo: more details after testing, maybe link to blogpost
+
+Recovery time is another thing to consider. Depending on the `state.backend.changelog.periodic-materialize.interval`,
+changelog can become lengthy and replaying it may take more time. However, recovery time combined with checkpoint
+duration will likely be still lower than in non-changelog setup, providing lower end-to-end latency even in failover
+case.
+
+For more details, see [FLIP-158](https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints).
+
+### Installation
+
+Changelog jars are included into the standard Flink distribution.
+
+Please make sure to [add]({{< ref "docs/deployment/filesystems/overview" >}}) the necessary filesystem plugins.
+
+### Configuration
+
+An example configuration in yaml:
+```yaml
+state.backend.changelog.enabled: true
+state.backend.changelog.storage: filesystem # currently, only filesystem and memory (for tests) are supported
+dstl.dfs.base-path: s3://<bucket-name> # similar to state.checkpoints.dir
+```
+
+Please keep the following defaults (see [limitations](#limitations)):
+```yaml
+execution.checkpointing.max-concurrent-checkpoints: 1
+state.backend.local-recovery: false
+```
+
+Please refer to [configuration reference]({{< ref "docs/deployment/config#state-changelog-options" >}}) for other options.

Review comment:
       ```suggestion
   Please refer to the [configuration section]({{< ref "docs/deployment/config#state-changelog-options" >}}) for other options.
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.

Review comment:
       ```suggestion
   Changelog is a feature that aims to decrease checkpointing time and, therefore, end-to-end latency in exactly-once mode.
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref "docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The last one (upload time) can be decreased by [Incremental checkpoints]({{< ref "#incremental-checkpoints" >}}).
+However, most Incremental State Backends perform some form of compaction periodically, which results in re-uploading the
+old state in addition to the new changes. In large deployments, the probability of at least one task uploading lots of
+data tend to be very high in every checkpoint.
+
+With Changelog enabled, Flink uploads state changes continuously, forming a changelog. On checkpoint, only the relevant
+part of this changelog needs to be uploaded. Independently, configured state backend is snapshotted in the
+background periodically. Upon successful upload, changelog is truncated.

Review comment:
       ```suggestion
   With Changelog enabled, Flink uploads state changes continuously and forms a changelog. On checkpoint, only the relevant
   part of this changelog needs to be uploaded. The configured state backend is snapshotted in the
   background periodically. Upon successful upload, the changelog is truncated.
   ```

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,129 @@ public class MyOptionsFactory implements ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref "docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref "docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The last one (upload time) can be decreased by [Incremental checkpoints]({{< ref "#incremental-checkpoints" >}}).
+However, most Incremental State Backends perform some form of compaction periodically, which results in re-uploading the
+old state in addition to the new changes. In large deployments, the probability of at least one task uploading lots of
+data tend to be very high in every checkpoint.
+
+With Changelog enabled, Flink uploads state changes continuously, forming a changelog. On checkpoint, only the relevant
+part of this changelog needs to be uploaded. Independently, configured state backend is snapshotted in the
+background periodically. Upon successful upload, changelog is truncated.
+
+As a result, asynchronous phase duration is reduced, as well as synchronous phase - because no data needs to be flushed
+to disk. In particular, long-tail latency is improved.
+
+On the flip side, resource usage is higher:

Review comment:
       ```suggestion
   However, resource usage is higher:
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org