You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Gyula Fora (Jira)" <ji...@apache.org> on 2019/08/26 08:06:00 UTC

[jira] [Created] (FLINK-13852) Support storing in-progress/pending files in different directories (StreamingFileSink)

Gyula Fora created FLINK-13852:
----------------------------------

             Summary: Support storing in-progress/pending files in different directories (StreamingFileSink)
                 Key: FLINK-13852
                 URL: https://issues.apache.org/jira/browse/FLINK-13852
             Project: Flink
          Issue Type: New Feature
          Components: Connectors / FileSystem
            Reporter: Gyula Fora


Currently in-progress and pending files are stored in the same directory as the final output file. This can be problematic depending on the usage of the final output files. One example would be loading the data to hive where we can only load all files in a certain directory.

I suggest we allow specifying a Pending/Inprogress base path where we create the same bucketing structure as the final files to store only the non-final files.

To support this we need to extend the RecoverableWriter interface with a new open method for example:

RecoverableFsDataOutputStream open(Path path, Path tmpPath) throws IOException;



--
This message was sent by Atlassian Jira
(v8.3.2#803003)