You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/09 01:28:55 UTC

[GitHub] [hudi] nsivabalan commented on a change in pull request #4446: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

nsivabalan commented on a change in pull request #4446:
URL: https://github.com/apache/hudi/pull/4446#discussion_r802192950



##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##########
@@ -70,9 +71,13 @@
    */
   private int totalBuckets = 0;
   /**
-   * Stat for the current workload. Helps in determining inserts, upserts etc.
+   * Stat for the input workload. Describe the workload before being assigned buckets.
    */
-  private WorkloadProfile profile;
+  private WorkloadProfile inputProfile;
+  /**
+   * Stat for the execution workload. Describe the workload after being assigned buckets.
+   */
+  private HashMap<String, WorkloadStat> executionProfile = new HashMap<>();

Review comment:
       as suggested above, can we try to move this within WorkloadProfile only. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -182,14 +182,28 @@ public abstract void preCompact(
         .withOperationField(config.allowOperationMetadataField())
         .withPartition(operation.getPartitionPath())
         .build();
-    if (!scanner.iterator().hasNext()) {
-      scanner.close();
-      return new ArrayList<>();
-    }
 
     Option<HoodieBaseFile> oldDataFileOpt =
         operation.getBaseFile(metaClient.getBasePath(), operation.getPartitionPath());
 
+    // Considering following scenario: if all log blocks in this fileSlice is rollback, it returns an empty scanner.
+    // But in this case, we need to give it a base file. Otherwise, it will lose base file in following fileSlice.
+    if (!scanner.iterator().hasNext()) {
+      if (!oldDataFileOpt.isPresent()) {
+        scanner.close();
+        return new ArrayList<>();
+      } else {
+        // TODO: we may directly rename original parquet file if there is not evolution/devolution of schema

Review comment:
       yes. it's not feasible to know upfront if all log blocks are invalid while planning compaction. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/Partitioner.java
##########
@@ -18,10 +18,14 @@
 
 package org.apache.hudi.table.action.commit;
 
+import org.apache.hudi.table.WorkloadProfile;
+
 import java.io.Serializable;
 
 public interface Partitioner extends Serializable {
   int getNumPartitions();
 
   int getPartition(Object key);
+
+  WorkloadProfile getExecutionWorkloadProfile();
 }

Review comment:
       wondering if we can augment existing WorkloadProfile only to add another instance of HashMap<String, WorkloadStat> partitionPathStatMap.
   
   ```
     /**
      * Input workload profile.
      */
     protected final HashMap<String, WorkloadStat> inputPartitionPathStatMap;
     /**
      * Execution workload profile.
      */
     protected final HashMap<String, WorkloadStat> outputPartitionPathStatMap;
   ```
   
   Let the UpsertPartitioner update the execution profile (a.k.a outputPartitionPathStatMap ) in the passed in workload profile itself. 
   And so CommitActionExecutors does not need to poll partitionor for execution profile since they already have an handle to WorkloadProfile.
   
   
   

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -182,14 +182,28 @@ public abstract void preCompact(
         .withOperationField(config.allowOperationMetadataField())
         .withPartition(operation.getPartitionPath())
         .build();
-    if (!scanner.iterator().hasNext()) {
-      scanner.close();
-      return new ArrayList<>();
-    }
 
     Option<HoodieBaseFile> oldDataFileOpt =
         operation.getBaseFile(metaClient.getBasePath(), operation.getPartitionPath());
 
+    // Considering following scenario: if all log blocks in this fileSlice is rollback, it returns an empty scanner.
+    // But in this case, we need to give it a base file. Otherwise, it will lose base file in following fileSlice.
+    if (!scanner.iterator().hasNext()) {
+      if (!oldDataFileOpt.isPresent()) {
+        scanner.close();
+        return new ArrayList<>();
+      } else {
+        // TODO: we may directly rename original parquet file if there is not evolution/devolution of schema

Review comment:
       yes, your understanding is right. 
   what i meant by "do these need to be cleaned up " is, why have commented out code from L197 to L203? do you want to remove them. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org