You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/10 21:34:28 UTC

[GitHub] [hudi] nbalajee commented on a change in pull request #3426: [HUDI-2285] Metadata Table synchronous design

nbalajee commented on a change in pull request #3426:
URL: https://github.com/apache/hudi/pull/3426#discussion_r686259944



##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -477,9 +519,148 @@ public void testRollbackUnsyncedCommit(HoodieTableType tableType) throws Excepti
 
     try (SparkRDDWriteClient client = new SparkRDDWriteClient<>(engineContext, getWriteConfig(true, true))) {
       assertFalse(metadata(client).isInSync());
-      client.syncTableMetadata();
+      // TODO client.syncTableMetadata();
+      validateMetadata(client);
+    }
+
+    // If an unsynced commit is automatically rolled back during next commit, the rollback commit gets a timestamp
+    // greater than than the new commit which is started. Ensure that in this case the rollback is not processed
+    // as the earlier failed commit would not have been committed.
+    //
+    //  Dataset:   C1        C2         C3.inflight[failed]   C4   R5[rolls back C3]
+    //  Metadata:  C1.delta  C2.delta
+    //
+    // When R5 completes, C3.xxx will be deleted. When C4 completes, C4 and R5 will be committed to Metadata Table in
+    // that order. R5 should be neglected as C3 was never committed to metadata table.
+    newCommitTime = HoodieActiveTimeline.createNewInstantTime();
+    try (SparkRDDWriteClient client = new SparkRDDWriteClient(engineContext, getWriteConfig(false, false), true)) {
+      // Metadata disabled and no auto-commit
+      client.startCommitWithTime(newCommitTime);
+      List<HoodieRecord> records = dataGen.generateUpdates(newCommitTime, 10);
+      List<WriteStatus> writeStatuses = client.upsert(jsc.parallelize(records, 1), newCommitTime).collect();
+      assertNoWriteErrors(writeStatuses);
+      // Not committed so left in inflight state
+      // TODO client.syncTableMetadata();
+      assertTrue(metadata(client).isInSync());
+      validateMetadata(client);
+    }
+
+    newCommitTime = HoodieActiveTimeline.createNewInstantTime();
+    try (SparkRDDWriteClient client = new SparkRDDWriteClient<>(engineContext, getWriteConfig(true, true), true)) {
+      // Metadata enabled
+      // The previous commit will be rolled back automatically
+      client.startCommitWithTime(newCommitTime);
+      List<HoodieRecord> records = dataGen.generateUpdates(newCommitTime, 10);
+      List<WriteStatus> writeStatuses = client.upsert(jsc.parallelize(records, 1), newCommitTime).collect();
+      assertNoWriteErrors(writeStatuses);
+      assertTrue(metadata(client).isInSync());
       validateMetadata(client);
     }
+
+    // In this scenario an async operations is started and completes around the same time of the failed commit.
+    // Rest of the reasoning is same as above test.
+    //  C4.clean was an asynchronous clean started along with C3. The clean completed but C3 commit failed.
+    //
+    //  Dataset:   C1        C2         C3.inflight[failed]  C4.clean     C5   R6[rolls back C3]
+    //  Metadata:  C1.delta  C2.delta
+    //
+    // When R6 completes, C3.xxx will be deleted. When C5 completes, C4, C5 and R6 will be committed to Metadata Table

Review comment:
       If R5 completed, then metadata table can be updated with C4, C5 etc.  
   
   Corner case: R6 rolled back C3 deleting all associated data/metadata files.  But before R6.commit was written, Rollback failed (eg, failure related to writing rollback metadata - in other words c3.inflight was replaced by R6.inflight).

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/BaseActionExecutor.java
##########
@@ -46,4 +50,24 @@ public BaseActionExecutor(HoodieEngineContext context, HoodieWriteConfig config,
   }
 
   public abstract R execute();
+
+  protected final void syncTableMetadata(HoodieCommitMetadata metadata) {

Review comment:
       As we are writing to the metadata table, from the in-memory states, let's call these writeTableMetadata() (instead of sync), just to be clear.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org