You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Nikola Hrusov <n....@gmail.com> on 2022/04/19 13:08:16 UTC

Flink checkpointing configuration

Hi,

I have a question regarding flink checkpointing configuration.

I am obtaining my knowledge from the official docs here:
https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/config/
and running Flink 1.14.4

I would like to be able to do a checkpoint every 10 minutes which at least
10 minutes pause between checkpoints. Thus I have set the following
properties:

execution.checkpointing.interval: 10 min
execution.checkpointing.min-pause: 10 min


And that works for the positive scenarios where my job runs fine. However,
when we have a checkpoint timeout it seems that the min-pause is not
applied, e.g.:

t - checkpoint #1 starts
t+10min - checkpoints #1 fails due to timeout (execution.checkpointing.timeout
defaults to 10min)
t+10min - checkpoint #2 starts


I would expect (and want to achieve):

t - checkpoint #1 starts
t+10min - checkpoints #1 fails due to timeout (execution.checkpointing.timeout
defaults to 10min)
t+20min (t+10min(timeout)+10min(min-pause) - checkpoint #2 starts



I expect that because:
- the checkpoint #1 at t+10min did not succeed, but it finished at t+10min.
I expect the min-pause to start counting from there.
- if checkpoint #1 failed with timeout it's very unlikely checkpoint #2
which starts immediately after the failed checkpoint #1 to succeed


At this point I am not sure whether I do not understand the docs and how I
should configure my job.

When I set the configuration like so:

execution.checkpointing.interval: 10 min
execution.checkpointing.min-pause: 15 min


Then I get checkpoints every 15 min instead.

Can someone help me understand the docs better and configure my job? Thanks

Regards
,
Nikola