You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/16 01:34:14 UTC

[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4957: [HUDI-3406] Rollback incorrectly relying on FS listing instead of Com…

alexeykudinkin commented on a change in pull request #4957:
URL: https://github.com/apache/hudi/pull/4957#discussion_r827539154



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -132,7 +138,17 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
     return fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
   }
 
-  private FileStatus[] getBaseAndLogFilesToBeDeleted(String commit, String partitionPath, FileSystem fs) throws IOException {
+  private FileStatus[] getBaseAndLogFilesToBeDeleted(HoodieInstant instantToRollback, String partitionPath, FileSystem fs) throws IOException {

Review comment:
       There's no need to pass `fs` in, it can be accessed from `metaClient`

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##########
@@ -146,6 +147,25 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst
     return Option.of(TimelineMetadataUtils.deserializeRequestedReplaceMetadata(requestedContent.get()));
   }
 
+  public static Option<HoodieCommitMetadata> getHoodieCommitMetadata(HoodieTableMetaClient metaClient, HoodieInstant instant) throws IOException {
+    HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();
+    HoodieTimeline timeline = activeTimeline.getCommitsTimeline().filterCompletedInstants();
+    return getHoodieCommitMetadata(timeline, Option.of(instant));
+  }
+
+  public static Option<HoodieCommitMetadata> getHoodieCommitMetadata(HoodieTimeline timeline, Option<HoodieInstant> hoodieInstant) throws IOException {

Review comment:
       There's no point in passing instant as `Option` -- we can't fetch the metadata if we don't know the instant

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -132,7 +138,17 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
     return fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
   }
 
-  private FileStatus[] getBaseAndLogFilesToBeDeleted(String commit, String partitionPath, FileSystem fs) throws IOException {
+  private FileStatus[] getBaseAndLogFilesToBeDeleted(HoodieInstant instantToRollback, String partitionPath, FileSystem fs) throws IOException {
+    Option<HoodieCommitMetadata> commitMetadataOptional = getHoodieCommitMetadata(metaClient, instantToRollback);
+    if (commitMetadataOptional.isPresent() && instantToRollback.isCompleted()
+        && !WriteOperationType.UNKNOWN.equals(commitMetadataOptional.get().getOperationType())) {
+      return getBaseAndLogFilesFromCommitMetadataToBeDeleted(commitMetadataOptional.get(), partitionPath, fs);
+    } else {
+      return getBaseAndLogFilesFromFileListToBeDeleted(instantToRollback.getTimestamp(), partitionPath, fs);
+    }
+  }
+
+  private FileStatus[] getBaseAndLogFilesFromFileListToBeDeleted(String commit, String partitionPath, FileSystem fs) throws IOException {

Review comment:
       Also, no need to pass fs

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -147,4 +163,15 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
     };
     return fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
   }
+
+  private FileStatus[] getBaseAndLogFilesFromCommitMetadataToBeDeleted(HoodieCommitMetadata commitMetadata, String partitionPath, FileSystem fs) throws IOException {

Review comment:
       Let's limit this method to just extract files paths from metadata, and then do the listing in its caller.  We can also shorten its name to just `getFilesFromCommitMetadata`

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##########
@@ -137,6 +147,16 @@ public WriteOperationType getOperationType() {
     return fullPaths;
   }
 
+  public HashMap<String, String> getFileIdAndFullPathsByPartitionPath(String basePath, String partitionPath) {

Review comment:
       In adding new APIs to shared components we should always strive for making sure that the set of APIs is bounded in size and orthogonal (ie non-overlapping b/w each other).
   
   This method heavily overlaps with `getFileIdAndRelativePathsByPartitionPath`, let's keep just the other one

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##########
@@ -146,6 +147,25 @@ public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant hoodieInst
     return Option.of(TimelineMetadataUtils.deserializeRequestedReplaceMetadata(requestedContent.get()));
   }
 
+  public static Option<HoodieCommitMetadata> getHoodieCommitMetadata(HoodieTableMetaClient metaClient, HoodieInstant instant) throws IOException {

Review comment:
       I think it's better to inline this method: since there's already `getHoodieCommitMetadata` method accepting the timeline and instant

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -132,7 +138,17 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
     return fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
   }
 
-  private FileStatus[] getBaseAndLogFilesToBeDeleted(String commit, String partitionPath, FileSystem fs) throws IOException {
+  private FileStatus[] getBaseAndLogFilesToBeDeleted(HoodieInstant instantToRollback, String partitionPath, FileSystem fs) throws IOException {
+    Option<HoodieCommitMetadata> commitMetadataOptional = getHoodieCommitMetadata(metaClient, instantToRollback);
+    if (commitMetadataOptional.isPresent() && instantToRollback.isCompleted()
+        && !WriteOperationType.UNKNOWN.equals(commitMetadataOptional.get().getOperationType())) {
+      return getBaseAndLogFilesFromCommitMetadataToBeDeleted(commitMetadataOptional.get(), partitionPath, fs);
+    } else {
+      return getBaseAndLogFilesFromFileListToBeDeleted(instantToRollback.getTimestamp(), partitionPath, fs);
+    }
+  }
+
+  private FileStatus[] getBaseAndLogFilesFromFileListToBeDeleted(String commit, String partitionPath, FileSystem fs) throws IOException {

Review comment:
       Let's shorten the name to `listFilesToBeDeleted`

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##########
@@ -119,6 +119,16 @@ public void setCompacted(Boolean compacted) {
     return filePaths;
   }
 
+  public HashMap<String, String> getFileIdAndRelativePathsByPartitionPath(String partitionPath) {

Review comment:
       If we pass `partitionPath` as an argument there's no point in returning a Map -- all files will have the same path, we can just return the file-ids




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org