You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/04 21:55:12 UTC

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5222: [HUDI-3782] Fixing table config when any of the index is disabled

nsivabalan commented on code in PR #5222:
URL: https://github.com/apache/hudi/pull/5222#discussion_r842133966


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -377,7 +378,25 @@ public void initTableMetadata() {
       if (initializeFromFilesystem(dataMetaClient, inflightInstantTimestamp)) {
         metrics.ifPresent(m -> m.updateMetrics(HoodieMetadataMetrics.INITIALIZE_STR, timer.endTimer()));
       }
+      return;
     }
+
+    // if metadata table exists, then check if any of the enabled partition types needs to be initialized

Review Comment:
   don't we need to guard this with isMetadataAsyncIndex()? 



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -992,18 +1023,17 @@ private void initialCommit(String createInstantTime, List<MetadataPartitionType>
     }).collect(Collectors.toMap(Pair::getKey, Pair::getValue));
     final Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionToRecordsMap = new HashMap<>();
 
-    // Record which saves the list of all partitions
-    HoodieRecord allPartitionRecord = HoodieMetadataPayload.createPartitionListRecord(partitions);
-    if (partitions.isEmpty()) {
-      // in case of initializing of a fresh table, there won't be any partitions, but we need to make a boostrap commit
-      final HoodieData<HoodieRecord> allPartitionRecordsRDD = engineContext.parallelize(
-          Collections.singletonList(allPartitionRecord), 1);
-      partitionToRecordsMap.put(MetadataPartitionType.FILES, allPartitionRecordsRDD);
-      commit(createInstantTime, partitionToRecordsMap, false);
-      return;
-    }
-
     if (partitionTypes.contains(MetadataPartitionType.FILES)) {
+      // Record which saves the list of all partitions

Review Comment:
   may I know why do we need this if condition. can you help clarify.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##########
@@ -814,14 +820,58 @@ public void maybeDeleteMetadataTable() {
     if (shouldExecuteMetadataTableDeletion()) {
       try {
         LOG.info("Deleting metadata table because it is disabled in writer.");
-        HoodieTableMetadataUtil.deleteMetadataTable(config.getBasePath(), context);
-        clearMetadataTablePartitionsConfig();
+        deleteMetadataTable(config.getBasePath(), context);
+        clearMetadataTablePartitionsConfig(Option.empty(), true);
       } catch (HoodieMetadataException e) {
         throw new HoodieException("Failed to delete metadata table.", e);
       }
     }
   }
 
+  /**
+   * Deletes the metadata partition if the writer disables any metadata index.
+   */
+  public void deleteMetadataIndexIfNecessary() {
+    Stream.of(MetadataPartitionType.values()).forEach(partitionType -> {
+      if (shouldDeleteMetadataIndex(partitionType)) {
+        try {
+          LOG.info("Deleting metadata partition because it is disabled in writer.");
+          if (metadataPartitionExists(metaClient.getBasePath(), context, partitionType)) {
+            deleteMetadataPartition(metaClient.getBasePath(), context, partitionType);
+          }
+          clearMetadataTablePartitionsConfig(Option.of(partitionType), false);
+        } catch (HoodieMetadataException e) {
+          throw new HoodieException("Failed to delete metadata table.", e);

Review Comment:
   can we add info on which partition was failed



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -377,7 +378,25 @@ public void initTableMetadata() {
       if (initializeFromFilesystem(dataMetaClient, inflightInstantTimestamp)) {
         metrics.ifPresent(m -> m.updateMetrics(HoodieMetadataMetrics.INITIALIZE_STR, timer.endTimer()));
       }
+      return;
     }
+
+    // if metadata table exists, then check if any of the enabled partition types needs to be initialized
+    Set<String> inflightAndCompletedPartitions = getInflightAndCompletedMetadataPartitions(dataMetaClient.getTableConfig());
+    List<MetadataPartitionType> partitionsToInit = this.enabledPartitionTypes.stream()
+        .filter(p -> !inflightAndCompletedPartitions.contains(p.getPartitionPath()) && !MetadataPartitionType.FILES.equals(p))
+        .collect(Collectors.toList());
+
+    // if there are no partitions to initialize or there is a pending operation, then don't initialize in this round
+    if (partitionsToInit.isEmpty() || anyPendingDataInstant(dataMetaClient, inflightInstantTimestamp)) {
+      return;
+    }
+
+    String createInstantTime = getInitialCommitInstantTime(dataMetaClient);

Review Comment:
   do we have a test for this. i.e. enable MDT files at commit 0 or commit5.. after few commits, enable col stats and after few commits enable bloom filter. query all 3 partitions, all of them should be intact. 



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -377,7 +378,25 @@ public void initTableMetadata() {
       if (initializeFromFilesystem(dataMetaClient, inflightInstantTimestamp)) {
         metrics.ifPresent(m -> m.updateMetrics(HoodieMetadataMetrics.INITIALIZE_STR, timer.endTimer()));
       }
+      return;
     }
+
+    // if metadata table exists, then check if any of the enabled partition types needs to be initialized

Review Comment:
   I get it. enabledPartitionTypes passed to this method is already trimmed based on isMetadataAsyncIndex



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##########
@@ -814,14 +820,58 @@ public void maybeDeleteMetadataTable() {
     if (shouldExecuteMetadataTableDeletion()) {
       try {
         LOG.info("Deleting metadata table because it is disabled in writer.");
-        HoodieTableMetadataUtil.deleteMetadataTable(config.getBasePath(), context);
-        clearMetadataTablePartitionsConfig();
+        deleteMetadataTable(config.getBasePath(), context);
+        clearMetadataTablePartitionsConfig(Option.empty(), true);
       } catch (HoodieMetadataException e) {
         throw new HoodieException("Failed to delete metadata table.", e);
       }
     }
   }
 
+  /**
+   * Deletes the metadata partition if the writer disables any metadata index.
+   */
+  public void deleteMetadataIndexIfNecessary() {
+    Stream.of(MetadataPartitionType.values()).forEach(partitionType -> {
+      if (shouldDeleteMetadataIndex(partitionType)) {
+        try {
+          LOG.info("Deleting metadata partition because it is disabled in writer.");
+          if (metadataPartitionExists(metaClient.getBasePath(), context, partitionType)) {
+            deleteMetadataPartition(metaClient.getBasePath(), context, partitionType);
+          }
+          clearMetadataTablePartitionsConfig(Option.of(partitionType), false);
+        } catch (HoodieMetadataException e) {
+          throw new HoodieException("Failed to delete metadata table.", e);
+        }
+      }
+    });
+  }
+
+  private boolean shouldDeleteMetadataIndex(MetadataPartitionType partitionType) {

Review Comment:
   shouldDeleteMetadataPartition might be a good name



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##########
@@ -814,14 +820,58 @@ public void maybeDeleteMetadataTable() {
     if (shouldExecuteMetadataTableDeletion()) {
       try {
         LOG.info("Deleting metadata table because it is disabled in writer.");
-        HoodieTableMetadataUtil.deleteMetadataTable(config.getBasePath(), context);
-        clearMetadataTablePartitionsConfig();
+        deleteMetadataTable(config.getBasePath(), context);
+        clearMetadataTablePartitionsConfig(Option.empty(), true);
       } catch (HoodieMetadataException e) {
         throw new HoodieException("Failed to delete metadata table.", e);
       }
     }
   }
 
+  /**
+   * Deletes the metadata partition if the writer disables any metadata index.
+   */
+  public void deleteMetadataIndexIfNecessary() {
+    Stream.of(MetadataPartitionType.values()).forEach(partitionType -> {
+      if (shouldDeleteMetadataIndex(partitionType)) {
+        try {
+          LOG.info("Deleting metadata partition because it is disabled in writer.");
+          if (metadataPartitionExists(metaClient.getBasePath(), context, partitionType)) {
+            deleteMetadataPartition(metaClient.getBasePath(), context, partitionType);
+          }
+          clearMetadataTablePartitionsConfig(Option.of(partitionType), false);
+        } catch (HoodieMetadataException e) {
+          throw new HoodieException("Failed to delete metadata table.", e);
+        }
+      }
+    });
+  }
+
+  private boolean shouldDeleteMetadataIndex(MetadataPartitionType partitionType) {
+    // Only delete metadata table partition when all the following conditions are met:
+    // (1) This is data table.
+    // (2) Index corresponding to this metadata partition is disabled in HoodieWriteConfig.
+    // (3) The completed metadata partitions in table config contains this partition.
+    // NOTE: Inflight metadata partitions are not considered as they could have been inflight due to async indexer.
+    boolean metadataIndexDisabled;
+    switch (partitionType) {
+      // NOTE: FILES partition type is always considered in sync with hoodie.metadata.enable.
+      //       It cannot be the case that metadata is enabled but FILES is disabled.
+      case COLUMN_STATS:
+        metadataIndexDisabled = config.isMetadataTableEnabled() && !config.isMetadataColumnStatsIndexEnabled();

Review Comment:
   can we add a fail fast if condition before switch case. easy to reason about things. 
   if not data table || is metadata is not enabled
      return 
   
   if not, proceed for to line 856. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org