You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/31 11:52:07 UTC

[GitHub] [hudi] codope commented on a change in pull request #5186: [HUDI-3451] Add checks for metadata table init to avoid possible out-of-sync

codope commented on a change in pull request #5186:
URL: https://github.com/apache/hudi/pull/5186#discussion_r839513188



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -208,6 +209,11 @@
       .sinceVersion("0.11.0")
       .withDocumentation("Table checksum is used to guard against partial writes in HDFS. It is added as the last entry in hoodie.properties and then used to validate while reading table config.");
 
+  public static final ConfigProperty<Boolean> METADATA_TABLE_ENABLE = ConfigProperty

Review comment:
       Are we sure there is no need to handle this config in upgrade step? Would it be better to add this config in hoodie.properties when users upgrade with an existing metadata table?

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -197,6 +197,41 @@ public void testMetadataTableBootstrap(HoodieTableType tableType, boolean addRol
     validateMetadata(testTable, true);
   }
 
+  @Test
+  public void testTurnOffMetadataTableAfterEnable() throws Exception {
+    init(COPY_ON_WRITE, true);
+    String instant1 = "0000001";
+    HoodieCommitMetadata hoodieCommitMetadata = doWriteOperationWithMeta(testTable, instant1, INSERT);
+
+    // sync to metadata table
+    HoodieTable table = HoodieSparkTable.create(writeConfig, context, metaClient);
+    Option metadataWriter = table.getMetadataWriter(instant1, Option.of(hoodieCommitMetadata));
+    validateMetadata(testTable, true);
+
+    assertTrue(metadataWriter.isPresent());
+    HoodieTableConfig hoodieTableConfig = new HoodieTableConfig(fs, metaClient.getMetaPath(), writeConfig.getPayloadClass());
+    assertTrue(hoodieTableConfig.getMetadataTableEnable());
+
+    // Turn off  enableMetadata Table
+    HoodieWriteConfig writeConfig2 = HoodieWriteConfig.newBuilder()
+        .withProperties(this.writeConfig.getProps())
+        .withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(false).build())
+        .build();
+
+    metaClient.reloadActiveTimeline();
+    HoodieTable table2 = HoodieSparkTable.create(writeConfig2, context, metaClient);
+    String instant2 = "0000002";
+    HoodieCommitMetadata hoodieCommitMetadata2 = doWriteOperationWithMeta(testTable, instant2, INSERT);
+    Option metadataWriter2 = table2.getMetadataWriter(instant2, Option.of(hoodieCommitMetadata2));
+    assertFalse(metadataWriter2.isPresent());
+    HoodieTableConfig hoodieTableConfig2 = new HoodieTableConfig(fs, metaClient.getMetaPath(), writeConfig2.getPayloadClass());
+
+    // assert hoodie.table.metadata.enable=false
+    assertFalse(hoodieTableConfig2.getMetadataTableEnable());
+    // assert MDT is deleted.
+    assertFalse(metaClient.getFs().exists(new Path(HoodieTableMetadata.getMetadataTableBasePath(writeConfig2.getBasePath()))));

Review comment:
       let's enable again after disable and validate that confg gets updated.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
##########
@@ -781,6 +782,53 @@ public HoodieEngineContext getContext() {
     return Option.empty();
   }
 
+  /**
+   * Initialize hoodie.table.metadata.enable in hoodie.proterties;
+   * Check if current metadata table flag in hoodieWriteConfig is the same as recorded in hoodie.proterties:
+   *  1. If the flag in hoodie.proterties is true but it is false in hoodieWriteConfig, It means users turn off MDT and we need to clean up MDT.
+   *  2. Update hoodie.table.metadata.enable in hoodie.proterties based on HoodieWriteConfig if the value is different.
+   */
+  protected void verifyMetadataTableInTableConfig() {
+    // ignore the hoodie.properties in MDT
+    if (metaClient.getBasePath().contains(HoodieTableMetaClient.METADATA_TABLE_FOLDER_PATH)) {
+      return;
+    }
+
+    try {
+      if (metaClient.getTableConfig().contains(HoodieTableConfig.METADATA_TABLE_ENABLE)) {
+        Boolean enableMDTInTableConfig = metaClient.getTableConfig().getBoolean(HoodieTableConfig.METADATA_TABLE_ENABLE);
+        Path mdtBasePath = new Path(HoodieTableMetadata.getMetadataTableBasePath(config.getBasePath()));
+        // MDT flag in TableConfig is true;
+        // MDT flag in write config is false;
+        // ===> Users turn off MDT, so that we need to clean up this metadata table in case out-of-sync issue.
+        // the condition order in if(xx) is important.
+        if (enableMDTInTableConfig && !config.isMetadataTableEnabled() && metaClient.getFs().exists(mdtBasePath)) {
+          LOG.info("Deleting metadata table because of disabled in writer.");
+          metaClient.getFs().delete(mdtBasePath, true);
+        }
+
+        // update table config if necessary
+        if (Boolean.compare(enableMDTInTableConfig, config.isMetadataTableEnabled()) != 0) {
+          updateMetadataTableEnableInTableConfig();
+        }
+      } else {
+        // initial METADATA_TABLE_ENABLE.key in table config
+        updateMetadataTableEnableInTableConfig();

Review comment:
       When: 
   a) table config does not contain `METADATA_TABLE_ENABLE`,
   b) MDT is disabled in write config,
   c) but MDT exists on the filesystem, 
   then shouldn't we delete it? 
   This can happen users with initially MDT enabled upgrade to the current version with MDT disabled. Isn't it?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
##########
@@ -781,6 +782,53 @@ public HoodieEngineContext getContext() {
     return Option.empty();
   }
 
+  /**
+   * Initialize hoodie.table.metadata.enable in hoodie.proterties;

Review comment:
       typo: hoodie.pro**p**erties

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -208,6 +209,11 @@
       .sinceVersion("0.11.0")
       .withDocumentation("Table checksum is used to guard against partial writes in HDFS. It is added as the last entry in hoodie.properties and then used to validate while reading table config.");
 
+  public static final ConfigProperty<Boolean> METADATA_TABLE_ENABLE = ConfigProperty
+      .key("hoodie.table.metadata.enable")
+      .defaultValue(HoodieMetadataConfig.ENABLE.defaultValue())
+      .withDocumentation("Enable the internal metadata table which serves table metadata like level file listings.");

Review comment:
       How does the reader path look like? What if MDT is disabled in write config but it was true for reader? Do we need to call out something in the doc that if MDT is disabled in write config then queries will not make use of MDT irrespective of reader MDT config is enabled or not?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org