You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Aljoscha Krettek (JIRA)" <ji...@apache.org> on 2017/07/19 13:37:01 UTC

[jira] [Commented] (FLINK-7229) Flink doesn't deleted old checkpoint

    [ https://issues.apache.org/jira/browse/FLINK-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093081#comment-16093081 ] 

Aljoscha Krettek commented on FLINK-7229:
-----------------------------------------

Cleanup of checkpoint data only works if the file system is accessible from the JobManager. This is the case for a distributed file system, such as HDFS or when a file system is network mounted, for example NFS.

In your case, the checkpoint would reside locally on the TaskManager and the JobManager has no way of deleting those files.

Does that describe your problem?

> Flink doesn't deleted old checkpoint
> ------------------------------------
>
>                 Key: FLINK-7229
>                 URL: https://issues.apache.org/jira/browse/FLINK-7229
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.3.1
>         Environment: Six Flink nodes running on Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-121-generic x86_64)
>            Reporter: Jason Zhou
>
> I have a 6-node Flink cluster where one contains jobmanager and the others have only taskmanagers. All taskmanagers have the following config :
> state.backend: rocksdb
> state.backend.rocksdb.checkpointdir: file:///opt/flink/data/local-checkpoints
> state.backend.fs.checkpointdir: file:///opt/flink/data/local-checkpoints
> state.checkpoints.dir: file:///opt/flink/data/glusterfs/external-checkpoints
> state.checkpoints.num-retained: 3
> And both checkpoints and external-checkpoints are enabled in the application.  
> After running the application for a while, I can see that Flink do retains metadata for 3 checkpoints in external-checkpoints. However, for the local-checkpoint, only one node(with jobmanager) retains 3 checkpoints and the others don't delete old checkpoints. This causes the issue when the other nodes are junked with checkpoints and I have to manually clean them up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)