You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yun Tang (Jira)" <ji...@apache.org> on 2020/01/04 17:33:00 UTC

[jira] [Commented] (FLINK-15012) Checkpoint directory not cleaned up

    [ https://issues.apache.org/jira/browse/FLINK-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008102#comment-17008102 ] 

Yun Tang commented on FLINK-15012:
----------------------------------

The appropriate place to clean the left checkpoint directories when {{CheckpointCoordinator}} called {{shutdown(JobStatus)}}.
 # {{CheckpointCoordinator#shutdown(JobStatus)}} would only be called when all tasks have reached a terminal state. This should prevent almost all cases that task manager would still write to checkpoint directory.
 # Currently, we would clean pending checkpoints which are in-process checkpoints and clean completed checkpoint within completed checkpoint store. The only thing left is to clean checkpoint base, {{shared}} and {{taskowned}} checkpoint folders which should be the responsibility of {{CheckpointStorageCoordinatorView}} since it creates them when initialization. We could introduce new interface to {{CheckpointStorageCoordinatorView}} and call it during shutdown of checkpoint coordinator to resolve this.
 # Once we would clean the base checkpoint directory, we should also add documentation to tell users not to store savepoint under that folder if {{NEVER_RETAIN_AFTER_TERMINATION}} is enabled.

> Checkpoint directory not cleaned up
> -----------------------------------
>
>                 Key: FLINK-15012
>                 URL: https://issues.apache.org/jira/browse/FLINK-15012
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.9.1
>            Reporter: Nico Kruber
>            Assignee: Yun Tang
>            Priority: Major
>
> I started a Flink cluster with 2 TMs using {{start-cluster.sh}} and the following config (in addition to the default {{flink-conf.yaml}})
> {code:java}
> state.checkpoints.dir: file:///path/to/checkpoints/
> state.backend: rocksdb {code}
> After submitting a jobwith checkpoints enabled (every 5s), checkpoints show up, e.g.
> {code:java}
> bb969f842bbc0ecc3b41b7fbe23b047b/
> ├── chk-2
> │   ├── 238969e1-6949-4b12-98e7-1411c186527c
> │   ├── 2702b226-9cfc-4327-979d-e5508ab2e3d5
> │   ├── 4c51cb24-6f71-4d20-9d4c-65ed6e826949
> │   ├── e706d574-c5b2-467a-8640-1885ca252e80
> │   └── _metadata
> ├── shared
> └── taskowned {code}
> If I shut down the cluster via {{stop-cluster.sh}}, these files will remain on disk and not be cleaned up.
> In contrast, if I cancel the job, at least {{chk-2}} will be deleted, but still leaving the (empty) directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)