You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Nikola Hrusov <n....@gmail.com> on 2022/04/19 13:08:16 UTC
Flink checkpointing configuration
Hi,
I have a question regarding flink checkpointing configuration.
I am obtaining my knowledge from the official docs here:
https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/config/
and running Flink 1.14.4
I would like to be able to do a checkpoint every 10 minutes which at least
10 minutes pause between checkpoints. Thus I have set the following
properties:
execution.checkpointing.interval: 10 min
execution.checkpointing.min-pause: 10 min
And that works for the positive scenarios where my job runs fine. However,
when we have a checkpoint timeout it seems that the min-pause is not
applied, e.g.:
t - checkpoint #1 starts
t+10min - checkpoints #1 fails due to timeout (execution.checkpointing.timeout
defaults to 10min)
t+10min - checkpoint #2 starts
I would expect (and want to achieve):
t - checkpoint #1 starts
t+10min - checkpoints #1 fails due to timeout (execution.checkpointing.timeout
defaults to 10min)
t+20min (t+10min(timeout)+10min(min-pause) - checkpoint #2 starts
I expect that because:
- the checkpoint #1 at t+10min did not succeed, but it finished at t+10min.
I expect the min-pause to start counting from there.
- if checkpoint #1 failed with timeout it's very unlikely checkpoint #2
which starts immediately after the failed checkpoint #1 to succeed
At this point I am not sure whether I do not understand the docs and how I
should configure my job.
When I set the configuration like so:
execution.checkpointing.interval: 10 min
execution.checkpointing.min-pause: 15 min
Then I get checkpoints every 15 min instead.
Can someone help me understand the docs better and configure my job? Thanks
Regards
,
Nikola