You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Matthias (Jira)" <ji...@apache.org> on 2021/04/27 16:25:00 UTC
[jira] [Created] (FLINK-22494) Avoid discarding checkpoints in case
of failure
Matthias created FLINK-22494:
--------------------------------
Summary: Avoid discarding checkpoints in case of failure
Key: FLINK-22494
URL: https://issues.apache.org/jira/browse/FLINK-22494
Project: Flink
Issue Type: Improvement
Components: Runtime / Checkpointing, Runtime / Coordination
Affects Versions: 1.13.0, 1.14.0, 1.12.3
Reporter: Matthias
Fix For: 1.14.0, 1.13.1, 1.12.4
Both {{StateHandleStore}} implementations (i.e. {{KubernetesStateHandleStore}} and {{ZooKeeperStateHandleStore}}) discard checkpoints if the checkpoint metadata wasn't written to the backend.
This does not cover the cases where the data was actually written to the backend but the call failed anyway (e.g. due to network issues). In such a case, we might end up having a pointer in the backend pointing to a checkpoint that was discarded.
Instead of discarding the checkpoint data in this case, we might want to keep it for this specific use case. Otherwise, we might run into Exceptions when recovering from the Checkpoint later on.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)