You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Flink Jira Bot (Jira)" <ji...@apache.org> on 2022/01/07 10:41:00 UTC

[jira] [Updated] (FLINK-5284) Make output of bucketing sink compatible with other processing framework like mapreduce

     [ https://issues.apache.org/jira/browse/FLINK-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flink Jira Bot updated FLINK-5284:
----------------------------------
      Labels: auto-deprioritized-major auto-deprioritized-minor auto-unassigned  (was: auto-deprioritized-major auto-unassigned stale-minor)
    Priority: Not a Priority  (was: Minor)

This issue was labeled "stale-minor" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Minor, please raise the priority and ask a committer to assign you the issue or revive the public discussion.


> Make output of bucketing sink compatible with other processing framework like mapreduce
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-5284
>                 URL: https://issues.apache.org/jira/browse/FLINK-5284
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>            Reporter: Wenlong Lyu
>            Priority: Not a Priority
>              Labels: auto-deprioritized-major, auto-deprioritized-minor, auto-unassigned
>
> Currently bucketing sink cannot move the in-progress and pending files to final output when the stream finished, and when recovering, the current output file will contain some invalid content, which can only be identified by the file-length meta file. These make the final output of the job incompatible to other processing framework like mapreduce. There are two things to do to solve the problem:
> 1. add direct output option to bucketing sink, which writes output to the final file, and delete/truncate the some file when fail over. direct output will be quite useful specially for finite stream job, which can enable user to migrate there batch job to streaming, taking advantage of features such as checkpointing.
> 2. add truncate by copy option to enable bucketing sink to resize output file by copying content valid in current file instead of creating a length meta file. truncate by copy will make some more extra IO operation, but can make the output more clean.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)