You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/10 03:44:18 UTC

[GitHub] [hudi] guanziyue commented on a change in pull request #4446: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

guanziyue commented on a change in pull request #4446:
URL: https://github.com/apache/hudi/pull/4446#discussion_r780879568



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -182,14 +182,28 @@ public abstract void preCompact(
         .withOperationField(config.allowOperationMetadataField())
         .withPartition(operation.getPartitionPath())
         .build();
-    if (!scanner.iterator().hasNext()) {
-      scanner.close();
-      return new ArrayList<>();
-    }
 
     Option<HoodieBaseFile> oldDataFileOpt =
         operation.getBaseFile(metaClient.getBasePath(), operation.getPartitionPath());
 
+    // Considering following scenario: if all log blocks in this fileSlice is rollback, it returns an empty scanner.
+    // But in this case, we need to give it a base file. Otherwise, it will lose base file in following fileSlice.
+    if (!scanner.iterator().hasNext()) {
+      if (!oldDataFileOpt.isPresent()) {
+        scanner.close();
+        return new ArrayList<>();
+      } else {
+        // TODO: we may directly rename original parquet file if there is not evolution/devolution of schema

Review comment:
       Correct me if I misunderstand you question. The reason why we try to generate a new base file here rather than end up this compaction operation is that any upsert occurs after compaction plan generated will use compaction commit time as new log file base commit time. Such a fileSlice is comprised by new log file and basefile generated by compaction. If hoodieCompactor didn't generate a basefile for this fileSlice, Filegroup will lose all data in baseFile in new and following Fileslices.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org