You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/12 11:28:40 UTC

[GitHub] [hudi] xushiyan commented on a change in pull request #4489: [HUDI-3135] Fix Delete partitions with metadata table and fix show partitions in spark sql

xushiyan commented on a change in pull request #4489:
URL: https://github.com/apache/hudi/pull/4489#discussion_r825286337



##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkCopyOnWriteTable.java
##########
@@ -18,6 +18,8 @@
 
 package org.apache.hudi.table;
 
+import org.apache.avro.Schema;

Review comment:
       there is no other code change in this file. should avoid re-ordering imports here

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -472,4 +499,21 @@ private boolean isFileSliceNeededForPendingCompaction(FileSlice fileSlice) {
   private boolean isFileGroupInPendingCompaction(HoodieFileGroup fg) {
     return fgIdToPendingCompactionOperations.containsKey(fg.getFileGroupId());
   }
+
+  public boolean isDeletePartitionOperation(Option<HoodieInstant> instantToRetain) {

Review comment:
       it actually looks better to be in some timeline utils class instead of in `CleanPlanner`

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -206,7 +212,13 @@ public CleanPlanner(HoodieEngineContext context, HoodieTable<T, I, K, O> hoodieT
    */
   private List<String> getPartitionPathsForFullCleaning() {
     // Go to brute force mode of scanning all partitions
-    return FSUtils.getAllPartitionPaths(context, config.getMetadataConfig(), config.getBasePath());
+    try {
+      FileSystemBackedTableMetadata fsBackedTableMetadata = new FileSystemBackedTableMetadata(context,
+          context.getHadoopConf(), config.getBasePath(), config.shouldAssumeDatePartitioning());
+      return fsBackedTableMetadata.getAllPartitionPaths();
+    } catch (IOException e) {
+      return Collections.emptyList();

Review comment:
       can you clarify why we have to use the concrete implementation `FileSystemBackedTableMetadata` here? the existing code uses its interface `HoodieTableMetadata` as an abstraction.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -159,12 +163,14 @@ public CleanPlanner(HoodieEngineContext context, HoodieTable<T, I, K, O> hoodieT
                   .deserializeHoodieCleanMetadata(hoodieTable.getActiveTimeline().getInstantDetails(lastClean.get()).get());
           if ((cleanMetadata.getEarliestCommitToRetain() != null)
                   && (cleanMetadata.getEarliestCommitToRetain().length() > 0)) {
-            return getPartitionPathsForIncrementalCleaning(cleanMetadata, instantToRetain);
+            return getPartitionPathsForIncrementalCleaning(cleanMetadata, instantToRetain).stream()
+                .distinct().collect(Collectors.toList());
           }
         }
       }
     }
-    return getPartitionPathsForFullCleaning();
+
+    return getPartitionPathsForFullCleaning().stream().distinct().collect(Collectors.toList());

Review comment:
       why we need the deduplication here? `getPartitionPathsForIncrementalCleaning` already calls `distinct()` plus for both methods, it shouldn't be the caller's responsibility to deduplicate; the APIs should take care of it internally

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -157,10 +157,13 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
                                                                           String instantTime) {
     List<HoodieRecord> records = new ArrayList<>(commitMetadata.getPartitionToWriteStats().size());
 
-    // Add record bearing partitions list
-    ArrayList<String> partitionsList = new ArrayList<>(commitMetadata.getPartitionToWriteStats().keySet());
+    // Add record bearing added partitions list
+    ArrayList<String> partitionsAdded = new ArrayList<>(commitMetadata.getPartitionToWriteStats().keySet());
+
+    // Add record bearing deleted partitions list
+    ArrayList<String> partitionsDeleted = getPartitionsDeleted(commitMetadata);

Review comment:
       should just declare as `List<String>`

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java
##########
@@ -542,8 +542,6 @@ private void transitionState(HoodieInstant fromInstant, HoodieInstant toInstant,
       } else {
         // Ensures old state exists in timeline
         LOG.info("Checking for file exists ?" + new Path(metaClient.getMetaPath(), fromInstant.getFileName()));
-        ValidationUtils.checkArgument(metaClient.getFs().exists(new Path(metaClient.getMetaPath(),
-            fromInstant.getFileName())));

Review comment:
       why this not needed any more?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -472,4 +499,21 @@ private boolean isFileSliceNeededForPendingCompaction(FileSlice fileSlice) {
   private boolean isFileGroupInPendingCompaction(HoodieFileGroup fg) {
     return fgIdToPendingCompactionOperations.containsKey(fg.getFileGroupId());
   }
+
+  public boolean isDeletePartitionOperation(Option<HoodieInstant> instantToRetain) {

Review comment:
       can we make it private if not intend to use else where? also prefer to have a static helper for this, instead of couple it with CleanPlanner




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org