You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/16 13:43:15 UTC

[GitHub] [hudi] xushiyan commented on a change in pull request #4957: [HUDI-3406] Rollback incorrectly relying on FS listing instead of Com…

xushiyan commented on a change in pull request #4957:
URL: https://github.com/apache/hudi/pull/4957#discussion_r828014454



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -145,6 +165,17 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
       }
       return false;
     };
-    return fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
+    return metaClient.getFs().listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
+  }
+
+  private FileStatus[] getFilesFromCommitMetadata(HoodieCommitMetadata commitMetadata, String partitionPath) throws IOException {
+    HashSet<String> fullPaths = commitMetadata.getFullPathsByPartitionPath(metaClient.getBasePath(), partitionPath);
+    List<String> commitPathNames = fullPaths.stream().filter(Objects::nonNull).collect(Collectors.toList());

Review comment:
       can filter null from inside `getFullPathsByPartitionPath` ? so caller does not need to worry about null items.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##########
@@ -145,6 +165,17 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient metaClient, HoodieWriteC
       }
       return false;
     };
-    return fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
+    return metaClient.getFs().listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
+  }
+
+  private FileStatus[] getFilesFromCommitMetadata(HoodieCommitMetadata commitMetadata, String partitionPath) throws IOException {
+    HashSet<String> fullPaths = commitMetadata.getFullPathsByPartitionPath(metaClient.getBasePath(), partitionPath);
+    List<String> commitPathNames = fullPaths.stream().filter(Objects::nonNull).collect(Collectors.toList());
+    if (commitPathNames.isEmpty()) {
+      return new FileStatus[0];
+    } else {
+      Path[] files = commitPathNames.stream().map(Path::new).toArray(Path[]::new);
+      return metaClient.getFs().listStatus(files);
+    }

Review comment:
       i don't think we need to check this; `listStatus()` should handle empty array input.

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestMergeOnReadRollbackActionExecutor.java
##########
@@ -126,7 +125,11 @@ public void testMergeOnReadRollbackActionExecutor(boolean isUsingMarkers) throws
     for (Map.Entry<String, HoodieRollbackPartitionMetadata> entry : rollbackMetadata.entrySet()) {
       HoodieRollbackPartitionMetadata meta = entry.getValue();
       assertTrue(meta.getFailedDeleteFiles() == null || meta.getFailedDeleteFiles().size() == 0);
-      assertTrue(meta.getSuccessDeleteFiles() == null || meta.getSuccessDeleteFiles().size() == 0);
+      if (isUsingMarkers) {
+        assertTrue(meta.getSuccessDeleteFiles() == null || meta.getSuccessDeleteFiles().size() == 0);
+      } else {
+        assertFalse(meta.getSuccessDeleteFiles() == null || meta.getSuccessDeleteFiles().size() == 0);
+      }

Review comment:
       this is existing logic: are we able to fix the assertion conditions by not using `||` here and just check `meta.getSuccessDeleteFiles().size()`?  i'm curious to know why we need to null-check `meta.getSuccessDeleteFiles()` and `meta.getFailedDeleteFiles()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org