You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by pn...@apache.org on 2022/01/19 07:31:49 UTC
[flink] branch release-1.14 updated: [FLINK-25650][docs] Added "Interplay with long-running record processing" limit in unaligned checkpoint documentation

This is an automated email from the ASF dual-hosted git repository.

pnowojski pushed a commit to branch release-1.14
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/release-1.14 by this push:
     new 43b073e  [FLINK-25650][docs] Added "Interplay with long-running record processing" limit in unaligned checkpoint documentation
43b073e is described below

commit 43b073e8571a0e1100eac30a5021d1f98bc7d5e3
Author: Anton Kalashnikov <ka...@yandex.ru>
AuthorDate: Thu Jan 13 15:56:46 2022 +0100

    [FLINK-25650][docs] Added "Interplay with long-running record processing" limit in unaligned checkpoint documentation
---
 .../docs/ops/state/checkpointing_under_backpressure.md    | 15 +++++++++++++++
 .../docs/ops/state/checkpointing_under_backpressure.md    | 15 +++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/docs/content.zh/docs/ops/state/checkpointing_under_backpressure.md b/docs/content.zh/docs/ops/state/checkpointing_under_backpressure.md
index 14277f4..8f25567 100644
--- a/docs/content.zh/docs/ops/state/checkpointing_under_backpressure.md
+++ b/docs/content.zh/docs/ops/state/checkpointing_under_backpressure.md
@@ -146,6 +146,21 @@ aligned checkpoints. If your operator depends on the latest watermark being alwa
 workaround is to store the watermark in the operator state. In that case, watermarks should be
 stored per key group in a union state to support rescaling.
 
+#### Interplay with long-running record processing
+
+Despite that unaligned checkpoints barriers are able to overtake all other records in the queue.
+The handling of this barrier still can be delayed if the current record takes a lot of time to be processed.
+This situation can occur when firing many timers all at once, for example in windowed operations.
+Second problematic scenario might occur when system is being blocked waiting for more than one
+network buffer availability when processing a single input record. Flink can not interrupt processing of
+a single input record, and unaligned checkpoints have to wait for the currently processed record to be
+fully processed. This can cause problems in two scenarios. Either as a result of serialisation of a large
+record that doesn't fit into single network buffer or in a flatMap operation, that produces many output
+records for one input record. In such scenarios back pressure can block unaligned checkpoints until all
+the network buffers required to process the single input record are available.
+It also can happen in any other situation when the processing of the single record takes a while.
+As result, the time of the checkpoint can be higher than expected or it can vary.
+
 #### Certain data distribution patterns are not checkpointed
 
 There are types of connections with properties that are impossible to keep with channel data stored
diff --git a/docs/content/docs/ops/state/checkpointing_under_backpressure.md b/docs/content/docs/ops/state/checkpointing_under_backpressure.md
index 14277f4..15d2f9a 100644
--- a/docs/content/docs/ops/state/checkpointing_under_backpressure.md
+++ b/docs/content/docs/ops/state/checkpointing_under_backpressure.md
@@ -146,6 +146,21 @@ aligned checkpoints. If your operator depends on the latest watermark being alwa
 workaround is to store the watermark in the operator state. In that case, watermarks should be
 stored per key group in a union state to support rescaling.
 
+#### Interplay with long-running record processing
+
+Despite that unaligned checkpoints barriers are able to overtake all other records in the queue. 
+The handling of this barrier still can be delayed if the current record takes a lot of time to be processed. 
+This situation can occur when firing many timers all at once, for example in windowed operations.
+Second problematic scenario might occur when system is being blocked waiting for more than one
+network buffer availability when processing a single input record. Flink can not interrupt processing of
+a single input record, and unaligned checkpoints have to wait for the currently processed record to be
+fully processed. This can cause problems in two scenarios. Either as a result of serialisation of a large
+record that doesn't fit into single network buffer or in a flatMap operation, that produces many output
+records for one input record. In such scenarios back pressure can block unaligned checkpoints until all
+the network buffers required to process the single input record are available.
+It also can happen in any other situation when the processing of the single record takes a while. 
+As result, the time of the checkpoint can be higher than expected or it can vary.
+
 #### Certain data distribution patterns are not checkpointed
 
 There are types of connections with properties that are impossible to keep with channel data stored