You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Thomas Weise (Jira)" <ji...@apache.org> on 2020/01/04 11:43:00 UTC

[jira] [Resolved] (FLINK-13027) StreamingFileSink bulk-encoded writer supports file rolling upon customized events

     [ https://issues.apache.org/jira/browse/FLINK-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Weise resolved FLINK-13027.
----------------------------------
    Fix Version/s: 1.10.0
       Resolution: Fixed

> StreamingFileSink bulk-encoded writer supports file rolling upon customized events
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-13027
>                 URL: https://issues.apache.org/jira/browse/FLINK-13027
>             Project: Flink
>          Issue Type: New Feature
>          Components: API / DataStream
>            Reporter: Ying Xu
>            Assignee: Ying Xu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When writing in bulk-encoded format such as Parquet, StreamingFileSink only supports OnCheckpointRollingPolicy, which rolls file at checkpointing time.    
> In many scenarios, it is beneficial that the sink can roll file upon certain events, for example, when the file size reaches a limit. Such a rolling policy can also potentially alleviate some of the side effects of OnCheckpointRollingPolicy, e.g.,, most of the heavy liftings including file uploading all happen at the checkpoint time.  
> Specifically, this Jira calls for a new rolling policy that rolls file: 
>  # whenever a customized event happens, e.g., the file size reaches certain limit. 
>  # whenever a checkpoint happens. This is needed for providing exactly-once guarantees when writing bulk-encoded files. 
> Users of this rolling policy need to be aware that the customized event and the next checkpoint epoch may be close to each other, thus may yield a tiny file per checkpoint at the worst. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)