You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2018/07/04 20:15:00 UTC

[jira] [Created] (FLINK-9752) Add an S3 RecoverableWriter

Stephan Ewen created FLINK-9752:
-----------------------------------

             Summary: Add an S3 RecoverableWriter
                 Key: FLINK-9752
                 URL: https://issues.apache.org/jira/browse/FLINK-9752
             Project: Flink
          Issue Type: Sub-task
          Components: Streaming Connectors
            Reporter: Stephan Ewen
            Assignee: Kostas Kloudas


S3 offers persistence only when uploads are complete. That means at the end of simple uploads and uploads of parts of a MultiPartUpload.

We should implement a RecoverableWriter for S3 that does a MultiPartUpload with a Part per checkpoint.
Recovering the reader needs the MultiPartUploadID and the list of ETags of previous parts.

We need additional staging of data in Flink state to work around the fact that
 - Parts in a MultiPartUpload must be at least 5MB
 - Part sizes must be known up front. (Note that data can still be streamed in the upload)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)