You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2023/03/01 00:30:00 UTC

[jira] [Updated] (HUDI-5095) Flink: Stores a special watermark(flag) to identify the current progress of writing data

     [ https://issues.apache.org/jira/browse/HUDI-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raymond Xu updated HUDI-5095:
-----------------------------
    Priority: Critical  (was: Major)

> Flink: Stores a special watermark(flag) to identify the current progress of writing data
> ----------------------------------------------------------------------------------------
>
>                 Key: HUDI-5095
>                 URL: https://issues.apache.org/jira/browse/HUDI-5095
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: flink, flink-sql
>            Reporter: Forward Xu
>            Assignee: yuemeng
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: image-2022-10-26-16-37-07-343.png
>
>
> In some cases where we need a flag to measure the progress of data writing, I think it is a reasonable way to store the watermark as an attribute of the hudi commit metadata.
> One of our scenarios is that Flink writes data to Hudi table in real time, and then we use this Hudi table to support batch computation, so we need a flag to evaluate whether its partition data is complete.
> For example, job1 is scheduled every hour. At 2022-01-19 02:01:00, job1 starts to check whether the partition (20220119/01) of hudi_table1 is completed (Flink writes data to hudi_table1 in real time). When the watermark properties of hudi_table1‘s commit metadata are higher than 2022- 01-19 02:05:00 Update (5 minutes out of order), we consider partition(20220119/01) as completed and we can safely execute Hive or Flink sql for batch computation. (basically insert table2 select xx from hudi_table1...)
> !image-2022-10-26-16-37-07-343.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)