You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/11/10 23:31:17 UTC

[GitHub] [hudi] vinothchandar commented on a change in pull request #2216: [HUDI-1357] Added a check to ensure no records are lost during updates.

vinothchandar commented on a change in pull request #2216:
URL: https://github.com/apache/hudi/pull/2216#discussion_r520939276



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieWriteStat.java
##########
@@ -49,6 +49,12 @@
    */
   private String prevCommit;
 
+  /**
+   * Total number of records written to the previous version of the file slice.
+   * If inflight commit is c2, then number of records present in f1_w1_c1.parquet.
+   */
+  private long oldNumWrites;

Review comment:
       as far as I can tell, we only use this within HoodieMergeHandle.  Can we avoid adding the extra member here and simply use a local variable? I am trying to understand the use-case for logging this in stat.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##########
@@ -117,6 +117,10 @@
   public static final String MAX_CONSISTENCY_CHECKS_PROP = "hoodie.consistency.check.max_checks";
   public static int DEFAULT_MAX_CONSISTENCY_CHECKS = 7;
 
+  // Data loss check before commits
+  private static final String DATALOSS_CHECK_ENABLED = "hoodie.dataloss.check.enabled";

Review comment:
       let's name this specific to real purpose like. `hoodie.merge.data.validation.enabled` , avoiding the calling this loss checking etc, which can be rather disconcerting to users, when they read this. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java
##########
@@ -261,6 +262,22 @@ public static BloomFilter readBloomFilterFromParquetMetadata(Configuration confi
     return records;
   }
 
+  /**
+   * Returns the number of records in the parquet file.
+   *
+   * @param conf Configuration
+   * @param parquetFilePath path of the file
+   */
+  public static long getRowCount(Configuration conf, Path parquetFilePath) {
+    ParquetMetadata footer;
+    long rowCount = 0;
+    footer = readMetadata(conf, parquetFilePath);

Review comment:
       sweet. I was going to suggest this. you are ahead!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org