You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "Kostas Kloudas (JIRA)" <ji...@apache.org> on 2018/12/10 09:03:00 UTC

[jira] [Created] (FLINK-11116) Clean-up temporary files that upon recovery, they belong to no checkpoint.

Kostas Kloudas created FLINK-11116:
--------------------------------------

             Summary: Clean-up temporary files that upon recovery, they belong to no checkpoint.
                 Key: FLINK-11116
                 URL: https://issues.apache.org/jira/browse/FLINK-11116
             Project: Flink
          Issue Type: Improvement
          Components: filesystem-connector
    Affects Versions: 1.7.0
            Reporter: Kostas Kloudas
            Assignee: Kostas Kloudas
             Fix For: 1.7.1


In order to guarantee exactly-once semantics, the streaming file sink is implementing a two-phase commit protocol when writing files to the filesystem.

Initially data is written to in-progress files. These files are then put into "pending" state when they are completed (based on the rolling policy), and they are finally committed when the checkpoint that put them in the "pending" state is acknowledged as complete.

The above shows that in the case that we have:
1) checkpoints A, B, C coming 
2) checkpoint A being acknowledged and 
3) failure

Then we may have files that do not belong to any checkpoint (because B and C were not considered successful). These files are currently not cleaned up.

This issue aims at cleaning up these files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)