You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Kostas Kloudas (JIRA)" <ji...@apache.org> on 2018/11/21 14:37:00 UTC

[jira] [Created] (FLINK-10963) Cleanup small objects uploaded to S3 as independent objects

Kostas Kloudas created FLINK-10963:
--------------------------------------

             Summary: Cleanup small objects uploaded to S3 as independent objects
                 Key: FLINK-10963
                 URL: https://issues.apache.org/jira/browse/FLINK-10963
             Project: Flink
          Issue Type: Sub-task
          Components: filesystem-connector
    Affects Versions: 1.7.0
            Reporter: Kostas Kloudas
            Assignee: Kostas Kloudas
             Fix For: 1.7.1


The S3 {{RecoverableWriter}} uses the Multipart Upload (MPU) Feature of S3 in order to upload the different part files. This means that a large part is split in chunks of at least 5MB which are uploaded independently, whenever each one of them is ready.

This 5MB minimum size requires special handling of parts that are less than 5MB when a checkpoint barrier arrives. These small files are uploaded as independent objects (not associated with an active MPU). This way, when Flink needs to restore, it simply downloads them and resumes writing to them.

These small objects are currently not cleaned up, thus leading to wasted space on S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)