You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Roman Khachatryan (Jira)" <ji...@apache.org> on 2022/07/18 23:42:00 UTC

[jira] [Created] (FLINK-28597) Empty checkpoint folders not deleted on job cancellation if their shared state is still in use

Roman Khachatryan created FLINK-28597:
-----------------------------------------

             Summary: Empty checkpoint folders not deleted on job cancellation if their shared state is still in use
                 Key: FLINK-28597
                 URL: https://issues.apache.org/jira/browse/FLINK-28597
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 1.16.0
            Reporter: Roman Khachatryan
            Assignee: Roman Khachatryan
             Fix For: 1.16.0


After FLINK-25872, SharedStateRegistry registers all state handles, including private ones.
Once the state isn't use AND the checkpoint is subsumed, it will actually be discarded.
This is done to prevent premature deletion when recovering in CLAIM mode:
1. RocksDB native savepoint folder (shared state is stored in chk-xx folder so it might fail the deletion)
2. Initial non-changelog checkpoint when switching to changelog-based checkpoints (private state of the initial checkpoint might be included into later checkpoints and its deletion would invalidate them)

Additionally, checkpoint folders are not deleted for a longer time which might be confusing.
In case of a crash, more folders will remain.

cc: [~Yanfei Lei], [~ym]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)