You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/27 12:17:18 UTC

[GitHub] [hudi] danny0405 commented on a change in pull request #3002: [HUDI-1923] Support exactly-once

danny0405 commented on a change in pull request #3002:
URL: https://github.com/apache/hudi/pull/3002#discussion_r640558504



##########
File path: hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteFunction.java
##########
@@ -145,6 +148,27 @@
    */
   private transient TotalSizeTracer tracer;
 
+  /**
+   * Whether write in exactly-once semantics.
+   */
+  private boolean exactlyOnce;
+
+  /**
+   * Flag saying whether the write task is waiting for the checkpoint success notification
+   * after it finished a checkpoint.
+   *
+   * <p>The flag is needed because the write task does not block during the waiting time interval,
+   * some data buckets still flush out with old instant time. There are two cases that the flush may produce
+   * corrupted files if the old instant is committed successfully:
+   * 1) the write handle was writing data but interrupted, left a corrupted parquet file;
+   * 2) the write handle finished the write but was not closed, left an empty parquet file.
+   *
+   * <p>To solve, when this flag was set to true, we block the data flushing thus the #processElement method,
+   * the flag was reset to true if the task receives the checkpoint success event or the latest inflight instant
+   * time changed(the last instant committed successfully).
+   */

Review comment:
       Oops, here is a mistake `the flag was reset to true` => `the flag was reset to false`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org