You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/03 15:10:30 UTC

[GitHub] [hudi] nsivabalan opened a new pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

nsivabalan opened a new pull request #3595:
URL: https://github.com/apache/hudi/pull/3595


   ## What is the purpose of the pull request
   
   Adding tests to Metadata table based on HoodieTestTable. Objective is to make the tests lean and consistent. Especially contents of data files does not matter for metadata, we have an opportunity to make it simpler. 
   
   ## Brief change log
   
   - Added few building blocks to HoodieTestTable to assist in this testing. Especially metadata sync relies on commitMetadata and hence all such supporting blocks are added to HoodieTestTable. 
   - Added tests covering bootstrap, regular writes, clean, compaction, rollback, inflight operation mid timeline, etc. 
   - As of now, focussed on COW. Yet to add log files related tests to MOR. 
   
   ## Verify this pull request
   
   Change itself is just around tests. 
   
     - *Added tests to TestHoodieBackedMetadata*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#issuecomment-912612994


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002",
       "triggerID" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2003",
       "triggerID" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "37a2c3ee7e5703de90f1ace958d6b905aaa1d019",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2006",
       "triggerID" : "37a2c3ee7e5703de90f1ace958d6b905aaa1d019",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 37a2c3ee7e5703de90f1ace958d6b905aaa1d019 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2006) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#discussion_r703206269



##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
##########
@@ -307,6 +318,13 @@ public static long getTotalMarkerFileCount(String basePath, String partitionPath
         .endsWith(String.format("%s.%s", HoodieTableMetaClient.MARKER_EXTN, ioType))).count();
   }
 
+  public static List<Path> getPartitionPaths(Path basePath) throws IOException {

Review comment:
       there exists some functions too like getXXX and deleteXXX. Maybe it's time to rename this to `FileCRUDUtils`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#issuecomment-912612994


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0183d684704d4c6a36dd6cb4c985beddb9d9ad76 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#discussion_r701975575



##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -1261,17 +1532,190 @@ private void validateMetadata(SparkRDDWriteClient testClient) throws IOException
     LOG.info("Validation time=" + timer.endTimer());
   }
 
+  /**
+   * Validate the metadata tables contents to ensure it matches what is on the file system.
+   */
+  private void validateMetadata(HoodieTestTable testTable, HoodieWriteConfig config, HoodieEngineContext hoodieEngineContext) throws IOException {
+    validateMetadata(testTable, config, hoodieEngineContext, Collections.emptyList());
+  }
+
+  /**
+   * Validate the metadata tables contents to ensure it matches what is on the file system.
+   */
+  private void validateMetadata(HoodieTestTable testTable, HoodieWriteConfig config, HoodieEngineContext hoodieEngineContext,

Review comment:
       Note to Reviewer: This is an almost replica of existing ValidateMetadata except that this uses HoodieTestTable as source table for validation. As mentioned in the description, will be adding more tests and eventually will remove direct SparkRDDClient based tests. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#discussion_r702008465



##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java
##########
@@ -396,6 +536,27 @@ public String getBaseFileNameById(String fileId) {
     return baseFileName(currentInstantTime, fileId);
   }
 
+  public List<String> getEarliestFilesInPartition(String partition, int count) throws IOException {
+    List<FileStatus> fileStatuses = Arrays.asList(listAllFilesInPartition(partition));
+    Collections.sort(fileStatuses, new Comparator<FileStatus>() {

Review comment:
       can this be less verbose, like calling .compare on the `getModificationTime()`?

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
##########
@@ -307,6 +318,13 @@ public static long getTotalMarkerFileCount(String basePath, String partitionPath
         .endsWith(String.format("%s.%s", HoodieTableMetaClient.MARKER_EXTN, ioType))).count();
   }
 
+  public static List<Path> getPartitionPaths(Path basePath) throws IOException {

Review comment:
       this does not belong here? in `FileCreateUtils`? its not really creating anything

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/PartitionFileInfoMap.java
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.testutils;
+
+import org.apache.hudi.common.util.collection.Pair;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+public class PartitionFileInfoMap {
+  Map<String, Map<String, List<Pair<String, Integer>>>> partitionToFileIdMap = new HashMap<>();

Review comment:
       this map needs to be made nicer to read? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#issuecomment-912612994


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002",
       "triggerID" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0183d684704d4c6a36dd6cb4c985beddb9d9ad76 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#issuecomment-912612994


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002",
       "triggerID" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2003",
       "triggerID" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0183d684704d4c6a36dd6cb4c985beddb9d9ad76 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002) 
   * 031fe678ba16a9ef61541d9ca940ac37459661f9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2003) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#issuecomment-912612994


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002",
       "triggerID" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2003",
       "triggerID" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "37a2c3ee7e5703de90f1ace958d6b905aaa1d019",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "37a2c3ee7e5703de90f1ace958d6b905aaa1d019",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 031fe678ba16a9ef61541d9ca940ac37459661f9 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2003) 
   * 37a2c3ee7e5703de90f1ace958d6b905aaa1d019 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#issuecomment-926102631


   Closing in favor of https://github.com/apache/hudi/pull/3695
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
nsivabalan closed pull request #3595:
URL: https://github.com/apache/hudi/pull/3595


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#issuecomment-912612994


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002",
       "triggerID" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0183d684704d4c6a36dd6cb4c985beddb9d9ad76 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002) 
   * 031fe678ba16a9ef61541d9ca940ac37459661f9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
xushiyan commented on pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#issuecomment-914038407


   > In general, I want to make sure the new methods added to HoodieTestTable are in line with its design. @xushiyan any comments on that?
   
   @vinothchandar @nsivabalan At the beginning we were thinking make `HoodieTestTable` provide concise APIs doing basic table operations with empty files and `HoodieWriteableTestTable` writing actual data. `FileCreateUtils` is used by `HoodieTestTable` at lower level interfacing with the actual files. When more and more APIs are introduced, we have to re-write many logics and re-implement action-to-file-change translations. This resulted in the testutils having cumbersome code and learning hurdle, also error-prone. I felt some re-design is needed. 
   
   Some thoughts on the re-design: developers are familiar with HoodieXXXClient so we would need `HoodieTestTable` make use of some client with similar public APIs, say `HoodieTestTableClient`. 
   - HoodieTestTable owns a HoodieTestTableClient and exposes the client's APIs for dev to prep their own data
   - HoodieTestTableClient should manipulate timeline and metadata as normal
   - For UTs don't need real data, the client writes out empty log and base files.
   - For FTs, the test table and client should be aware of the EngineContext set by `XXXFunctionalTestHarness` and act like the actual clients
   - Eventually dev only learn APIs provided by `HoodieTestTableClient` to prep their tests. It can include high-level APIs like `.write100RecordsIn3Partitions()` as such.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#discussion_r706305915



##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -1108,6 +1119,259 @@ public void testMetdataTableCommitFailure() throws Exception {
     assertTrue(timeline.getRollbackTimeline().countInstants() == 1);
   }
 
+  /**
+   * Test simple bootstrap of metadata table.
+   * Trigger few write operations and boostrap metadata table. Validate.
+   * Add few more writes to sync and validate.
+   * @param tableType
+   * @throws Exception
+   */
+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  public void testBootstrapWithTestTable(HoodieTableType tableType) throws Exception {
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    // bootstrap with few commits
+    testBootstrap(testTable, false);
+  }
+
+  /**
+   * Before bootstrapping, rollback a commit in the original table.
+   * Ensure after bootstrap, sync and validate succeeds.
+   * @throws Exception
+  */
+  @Test
+  public void testBootstrapWithRolledBackCommitTestTable() throws Exception {
+    tableType = HoodieTableType.COPY_ON_WRITE;
+    init(tableType);

Review comment:
       ideally all test cases should parameterize with table type 
   ```
     @ParameterizedTest
     @EnumSource(HoodieTableType.class)
   ```

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -1108,6 +1119,259 @@ public void testMetdataTableCommitFailure() throws Exception {
     assertTrue(timeline.getRollbackTimeline().countInstants() == 1);
   }
 
+  /**
+   * Test simple bootstrap of metadata table.
+   * Trigger few write operations and boostrap metadata table. Validate.
+   * Add few more writes to sync and validate.
+   * @param tableType
+   * @throws Exception
+   */
+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  public void testBootstrapWithTestTable(HoodieTableType tableType) throws Exception {
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    // bootstrap with few commits
+    testBootstrap(testTable, false);
+  }
+
+  /**
+   * Before bootstrapping, rollback a commit in the original table.
+   * Ensure after bootstrap, sync and validate succeeds.
+   * @throws Exception
+  */
+  @Test
+  public void testBootstrapWithRolledBackCommitTestTable() throws Exception {
+    tableType = HoodieTableType.COPY_ON_WRITE;
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    // bootstrap w/ few commits, but rollback one of the commit before bootstrapping.
+    testBootstrap(testTable,true);
+  }
+
+  private void testBootstrap(HoodieTestTable testTable, boolean addRollback) throws Exception {

Review comment:
       i've seen this pattern in many classes: create a private method doing all the test steps with a variable control different scenarios while different testing methods invoke it with the variable. We should start avoiding this, for reasons
   - control flow is an anti-pattern in test code. Each testcase just follows a simple flow: prep -> execute -> verify. Any varying part can be moved to a different test method to explicitly show a different scenario
   - I can see the use of control flow is mainly to reuse some code in the original flow. It's a sign that the original flow's code itself is not concise enough to be repeated. I think repeating some code across testcase is acceptable and even preferred: testcases should be isolated and people wants to read the flow as is without jumping back and forth btw methods. Repeating concise test prep and verification logic makes the scenario more readable and manageable in 1 place. This requires the test utils classes properly refactored and doing heavy liftings.

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/PartitionFileInfoMap.java
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.testutils;
+
+import org.apache.hudi.common.util.collection.Pair;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+public class PartitionFileInfoMap {
+  Map<String, Map<String, List<Pair<String, Integer>>>> partitionToFileIdMap = new HashMap<>();
+
+  public PartitionFileInfoMap addPartitionAndBasefiles(String commitTime, String partitionPath, List<Integer> lengths) {
+
+    if (!partitionToFileIdMap.containsKey(commitTime)) {
+      partitionToFileIdMap.put(commitTime, new HashMap<>());
+    }
+    if (!this.partitionToFileIdMap.get(commitTime).containsKey(partitionPath)) {
+      this.partitionToFileIdMap.get(commitTime).put(partitionPath, new ArrayList<>());
+    }
+
+    List<Pair<String, Integer>> fileInfos = new ArrayList<>();
+    for (int length : lengths) {
+      fileInfos.add(Pair.of(UUID.randomUUID().toString(), length));
+    }
+    this.partitionToFileIdMap.get(commitTime).get(partitionPath).addAll(fileInfos);
+    return this;
+  }
+
+  public Map<String, List<Pair<String, Integer>>> getPartitionToFileIdMap(String commitTime) {
+    return this.partitionToFileIdMap.get(commitTime);
+  }
+}

Review comment:
       should fix the IDE setting to auto fix EOL problem

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -1108,6 +1119,259 @@ public void testMetdataTableCommitFailure() throws Exception {
     assertTrue(timeline.getRollbackTimeline().countInstants() == 1);
   }
 
+  /**
+   * Test simple bootstrap of metadata table.
+   * Trigger few write operations and boostrap metadata table. Validate.
+   * Add few more writes to sync and validate.
+   * @param tableType
+   * @throws Exception
+   */
+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  public void testBootstrapWithTestTable(HoodieTableType tableType) throws Exception {
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    // bootstrap with few commits
+    testBootstrap(testTable, false);
+  }
+
+  /**
+   * Before bootstrapping, rollback a commit in the original table.
+   * Ensure after bootstrap, sync and validate succeeds.
+   * @throws Exception
+  */
+  @Test
+  public void testBootstrapWithRolledBackCommitTestTable() throws Exception {
+    tableType = HoodieTableType.COPY_ON_WRITE;
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    // bootstrap w/ few commits, but rollback one of the commit before bootstrapping.
+    testBootstrap(testTable,true);
+  }
+
+  private void testBootstrap(HoodieTestTable testTable, boolean addRollback) throws Exception {
+
+    // bootstrap w/ 3 or 5 commits
+    testTable.doWriteOperation(testTable, "001", WriteOperationType.INSERT, Arrays.asList("p1", "p2"), Arrays.asList("p1", "p2"),
+        2, true);
+    testTable.doWriteOperation(testTable, "002", WriteOperationType.INSERT, Collections.emptyList(), Arrays.asList("p1", "p2"),
+        2, true);
+    syncAndValidate(testTable);
+
+    if (addRollback) {
+      doRollback(testTable, "003", "004", Collections.singletonList("p3"), Arrays.asList("p1","p2", "p3"), 2);
+    }
+    testTable.doWriteOperation(testTable, "005", WriteOperationType.INSERT, Collections.emptyList(), Arrays.asList("p1", "p2"),
+        4);
+    syncAndValidate(testTable);
+
+    // trigger an upsert and validate
+    testTable.doWriteOperation(testTable, "006", WriteOperationType.UPSERT, Collections.singletonList("p3"),
+        Arrays.asList("p1", "p2", "p3"), 4, false);
+    syncAndValidate(testTable);
+  }
+
+  private void doRollback(HoodieTestTable testTable, String commitTimeToRollback, String commitTime,
+                          List<String> newPartitionsToAdd, List<String> partitionsToAddFiles, int numFilesPerPartition) throws Exception {
+    // trigger an UPSERT that will be rolled back
+    Pair<HoodieCommitMetadata, PartitionFileInfoMap> commitMeta = testTable.doWriteOperation(testTable, commitTimeToRollback, WriteOperationType.UPSERT,
+        newPartitionsToAdd,
+        partitionsToAddFiles, numFilesPerPartition, false);
+    syncTableMetadata();
+
+    // rollback last commit
+    Map<String, List<String>> partitionFilesToDelete = getPartitionFilesToDelete(commitMeta.getKey());
+    HoodieRollbackMetadata rollbackMetadata = testTable.getRollbackMetadata(commitTimeToRollback, commitTime, partitionFilesToDelete);
+    testTable.addRollback(commitTime, rollbackMetadata);
+
+    // delete the resp files from test table before validation
+    for (Map.Entry<String, List<String>> entry : partitionFilesToDelete.entrySet()) {
+      testTable.deleteFilesInPartition(entry.getKey(), entry.getValue());
+    }
+    syncAndValidate(testTable);
+  }
+
+  /**
+   * Test few table operations like insert, upsert, compaction, clean.
+   * @param tableType
+   * @throws Exception
+   */
+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  public void testTableOperationsWithTestTable(HoodieTableType tableType) throws Exception {
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    testTableOperations(testTable,false);
+  }
+
+  /**
+   * 1. Enable metadata to sync and validate.
+   * 2. Disable metadata and add few writes to table.
+   * 3. Enable back again to sync and validate.
+   * @throws Exception
+   */

Review comment:
       if test logic is encapsulate in well-design util APIs, we may not need extra javadoc to explain the flow. Some inline comments might still be helpful but ideally code itself should be able to explain it pretty well

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/PartitionDeleteFileList.java
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.testutils;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+public class PartitionDeleteFileList {

Review comment:
       as discussed, we can start creating a `HoodieTestState` and encapsulate it there.

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -1261,17 +1525,189 @@ private void validateMetadata(SparkRDDWriteClient testClient) throws IOException
     LOG.info("Validation time=" + timer.endTimer());
   }
 
+  /**
+   * Validate the metadata tables contents to ensure it matches what is on the file system.
+   */
+  private void validateMetadata(HoodieTestTable testTable) throws IOException {
+    validateMetadata(testTable, Collections.emptyList());
+  }
+
+  /**
+   * Validate the metadata tables contents to ensure it matches what is on the file system.
+   */
+  private void validateMetadata(HoodieTestTable testTable, List<String> inflightCommits) throws IOException {
+    HoodieTableMetadata tableMetadata = metadata(writeConfig, context);
+    assertNotNull(tableMetadata, "MetadataReader should have been initialized");
+    if (!writeConfig.isMetadataTableEnabled()) {
+      return;
+    }
+
+    assertEquals(inflightCommits, testTable.inflightCommits());
+
+    HoodieTimer timer = new HoodieTimer().startTimer();
+    HoodieSparkEngineContext engineContext = new HoodieSparkEngineContext(jsc);
+
+    // Partitions should match
+    List<java.nio.file.Path> fsPartitionPaths = testTable.getAllPartitionPaths();
+    List<String> fsPartitions = new ArrayList<>();
+    fsPartitionPaths.forEach(entry -> fsPartitions.add(entry.getFileName().toString()));
+    List<String> metadataPartitions = tableMetadata.getAllPartitionPaths();
+
+    Collections.sort(fsPartitions);
+    Collections.sort(metadataPartitions);
+
+    assertEquals(fsPartitions.size(), metadataPartitions.size(), "Partitions should match");
+    assertTrue(fsPartitions.equals(metadataPartitions), "Partitions should match");
+
+    // Files within each partition should match
+    metaClient = HoodieTableMetaClient.reload(metaClient);
+    HoodieTable table = HoodieSparkTable.create(writeConfig, engineContext);
+    TableFileSystemView tableView = table.getHoodieView();
+    List<String> fullPartitionPaths = fsPartitions.stream().map(partition -> basePath + "/" + partition).collect(Collectors.toList());
+    Map<String, FileStatus[]> partitionToFilesMap = tableMetadata.getAllFilesInPartitions(fullPartitionPaths);
+    assertEquals(fsPartitions.size(), partitionToFilesMap.size());
+
+    fsPartitions.forEach(partition -> {
+      try {
+        Path partitionPath;
+        if (partition.equals("")) {
+          // Should be the non-partitioned case
+          partitionPath = new Path(basePath);
+        } else {
+          partitionPath = new Path(basePath, partition);
+        }
+
+        FileStatus[] fsStatuses = testTable.listAllFilesInPartition(partition);
+        FileStatus[] metaStatuses = tableMetadata.getAllFilesInPartition(partitionPath);
+        List<String> fsFileNames = Arrays.stream(fsStatuses)
+            .map(s -> s.getPath().getName()).collect(Collectors.toList());
+        List<String> metadataFilenames = Arrays.stream(metaStatuses)
+            .map(s -> s.getPath().getName()).collect(Collectors.toList());
+        Collections.sort(fsFileNames);
+        Collections.sort(metadataFilenames);
+
+        assertEquals(fsStatuses.length, partitionToFilesMap.get(basePath + "/" + partition).length);
+
+        // File sizes should be valid
+        Arrays.stream(metaStatuses).forEach(s -> assertTrue(s.getLen() > 0));

Review comment:
       we should prefer for-loop over lambda in test code when there is exception to avoid try-catch block. Just declare exception all the way up we can anyway capture it when test failed.

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
##########
@@ -59,6 +64,8 @@
 
 public class FileCreateUtils {

Review comment:
       to align with the new design, we should later aim to restrain its use. This can be useful for testing low-level file-manipulation logic. HoodieTestTable should leverage more src code path.

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -1108,6 +1119,259 @@ public void testMetdataTableCommitFailure() throws Exception {
     assertTrue(timeline.getRollbackTimeline().countInstants() == 1);
   }
 
+  /**
+   * Test simple bootstrap of metadata table.
+   * Trigger few write operations and boostrap metadata table. Validate.
+   * Add few more writes to sync and validate.
+   * @param tableType
+   * @throws Exception
+   */
+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  public void testBootstrapWithTestTable(HoodieTableType tableType) throws Exception {
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    // bootstrap with few commits
+    testBootstrap(testTable, false);
+  }
+
+  /**
+   * Before bootstrapping, rollback a commit in the original table.
+   * Ensure after bootstrap, sync and validate succeeds.
+   * @throws Exception
+  */
+  @Test
+  public void testBootstrapWithRolledBackCommitTestTable() throws Exception {
+    tableType = HoodieTableType.COPY_ON_WRITE;
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    // bootstrap w/ few commits, but rollback one of the commit before bootstrapping.
+    testBootstrap(testTable,true);
+  }
+
+  private void testBootstrap(HoodieTestTable testTable, boolean addRollback) throws Exception {
+
+    // bootstrap w/ 3 or 5 commits
+    testTable.doWriteOperation(testTable, "001", WriteOperationType.INSERT, Arrays.asList("p1", "p2"), Arrays.asList("p1", "p2"),

Review comment:
       try making use of varargs instead of List for test util APIs. varargs gives more flexibility and does not require caller to build a list (less code)

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java
##########
@@ -421,13 +582,102 @@ public String getBaseFileNameById(String fileId) {
   }
 
   public FileStatus[] listAllFilesInPartition(String partitionPath) throws IOException {
-    return FileSystemTestUtils.listRecursive(fs, new Path(Paths.get(basePath, partitionPath).toString())).toArray(new FileStatus[0]);
+    return FileSystemTestUtils.listRecursive(fs, new Path(Paths.get(basePath, partitionPath).toString())).stream()
+        .filter(entry -> {
+          boolean toReturn = true;
+          String fileName = entry.getPath().getName();
+          if (fileName.equals(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE)) {
+            toReturn = false;
+          } else {
+            for (String inflight : inflightCommits) {
+              if (fileName.contains(inflight)) {
+                toReturn = false;
+              }
+            }
+          }
+          return toReturn;
+        }).collect(Collectors.toList()).toArray(new FileStatus[0]);
   }
 
   public FileStatus[] listAllFilesInTempFolder() throws IOException {
     return FileSystemTestUtils.listRecursive(fs, new Path(Paths.get(basePath, HoodieTableMetaClient.TEMPFOLDER_NAME).toString())).toArray(new FileStatus[0]);
   }
 
+  public void deleteFilesInPartition(String partitionPath, List<String> filesToDelete) throws IOException {
+    FileStatus[] allFiles = listAllFilesInPartition(partitionPath);
+    Arrays.stream(allFiles).filter(entry -> filesToDelete.contains(entry.getPath().getName())).forEach(entry -> {
+      try {
+        Files.delete(Paths.get(basePath, partitionPath, entry.getPath().getName()));
+      } catch (IOException e) {
+        e.printStackTrace();
+      }
+    });
+  }
+
+  public HoodieCleanMetadata doClean(HoodieTestTable testTable, String commitTime, Map<String, Integer> partitionFileCountsToDelete) throws IOException {
+    Map<String, List<String>> partitionFilesToDelete = new HashMap<>();
+    for (Map.Entry<String, Integer> entry : partitionFileCountsToDelete.entrySet()) {
+      partitionFilesToDelete.put(entry.getKey(), testTable.getEarliestFilesInPartition(entry.getKey(), entry.getValue()));
+    }
+    PartitionDeleteFileList partitionDeleteFileList = new PartitionDeleteFileList();
+    for (Map.Entry<String, List<String>> entry : partitionFilesToDelete.entrySet()) {
+      partitionDeleteFileList = partitionDeleteFileList.addPartitionAndBasefiles(commitTime, entry.getKey(), entry.getValue());
+      testTable.deleteFilesInPartition(entry.getKey(), entry.getValue());
+    }
+    Pair<HoodieCleanerPlan, HoodieCleanMetadata> cleanerMeta = testTable.getHoodieCleanMetadata(commitTime, partitionDeleteFileList.getPartitionToFileIdMap(commitTime));
+    testTable.addClean(commitTime, cleanerMeta.getKey(), cleanerMeta.getValue());
+    return cleanerMeta.getValue();
+  }
+
+  public HoodieTestTable doCompaction(HoodieTestTable testTable, String commitTime, List<String> partitions) throws Exception {
+    this.currentInstantTime = commitTime;
+    PartitionFileInfoMap partitionFileInfoMap = new PartitionFileInfoMap();
+    for (String partition : partitions) {
+      partitionFileInfoMap = partitionFileInfoMap.addPartitionAndBasefiles(commitTime, partition, Arrays.asList(100 + RANDOM.nextInt(500)));
+    }
+    HoodieCommitMetadata commitMetadata = testTable.createCommitMetadata(WriteOperationType.COMPACT, commitTime, partitionFileInfoMap.getPartitionToFileIdMap(commitTime));
+    for (String partition : partitions) {
+      testTable = testTable.withBaseFilesInPartition(partition, partitionFileInfoMap.getPartitionToFileIdMap(commitTime).get(partition));
+    }
+    return testTable.addCompaction(commitTime, commitMetadata);
+  }
+
+  public Pair<HoodieCommitMetadata, PartitionFileInfoMap> doWriteOperation(HoodieTestTable testTable, String commitTime, WriteOperationType operationType,

Review comment:
       this is an instance method, it does not need user to pass in a testTable. Unless you want this to be static?

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/PartitionFileInfoMap.java
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.testutils;
+
+import org.apache.hudi.common.util.collection.Pair;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+public class PartitionFileInfoMap {

Review comment:
       ditto

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -1108,6 +1119,259 @@ public void testMetdataTableCommitFailure() throws Exception {
     assertTrue(timeline.getRollbackTimeline().countInstants() == 1);
   }
 
+  /**
+   * Test simple bootstrap of metadata table.
+   * Trigger few write operations and boostrap metadata table. Validate.
+   * Add few more writes to sync and validate.
+   * @param tableType
+   * @throws Exception
+   */
+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  public void testBootstrapWithTestTable(HoodieTableType tableType) throws Exception {
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    // bootstrap with few commits
+    testBootstrap(testTable, false);
+  }
+
+  /**
+   * Before bootstrapping, rollback a commit in the original table.
+   * Ensure after bootstrap, sync and validate succeeds.
+   * @throws Exception
+  */
+  @Test
+  public void testBootstrapWithRolledBackCommitTestTable() throws Exception {
+    tableType = HoodieTableType.COPY_ON_WRITE;
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    // bootstrap w/ few commits, but rollback one of the commit before bootstrapping.
+    testBootstrap(testTable,true);
+  }
+
+  private void testBootstrap(HoodieTestTable testTable, boolean addRollback) throws Exception {
+
+    // bootstrap w/ 3 or 5 commits
+    testTable.doWriteOperation(testTable, "001", WriteOperationType.INSERT, Arrays.asList("p1", "p2"), Arrays.asList("p1", "p2"),
+        2, true);
+    testTable.doWriteOperation(testTable, "002", WriteOperationType.INSERT, Collections.emptyList(), Arrays.asList("p1", "p2"),
+        2, true);
+    syncAndValidate(testTable);
+
+    if (addRollback) {
+      doRollback(testTable, "003", "004", Collections.singletonList("p3"), Arrays.asList("p1","p2", "p3"), 2);
+    }
+    testTable.doWriteOperation(testTable, "005", WriteOperationType.INSERT, Collections.emptyList(), Arrays.asList("p1", "p2"),
+        4);
+    syncAndValidate(testTable);
+
+    // trigger an upsert and validate
+    testTable.doWriteOperation(testTable, "006", WriteOperationType.UPSERT, Collections.singletonList("p3"),
+        Arrays.asList("p1", "p2", "p3"), 4, false);
+    syncAndValidate(testTable);
+  }
+
+  private void doRollback(HoodieTestTable testTable, String commitTimeToRollback, String commitTime,
+                          List<String> newPartitionsToAdd, List<String> partitionsToAddFiles, int numFilesPerPartition) throws Exception {
+    // trigger an UPSERT that will be rolled back
+    Pair<HoodieCommitMetadata, PartitionFileInfoMap> commitMeta = testTable.doWriteOperation(testTable, commitTimeToRollback, WriteOperationType.UPSERT,
+        newPartitionsToAdd,
+        partitionsToAddFiles, numFilesPerPartition, false);
+    syncTableMetadata();
+
+    // rollback last commit
+    Map<String, List<String>> partitionFilesToDelete = getPartitionFilesToDelete(commitMeta.getKey());
+    HoodieRollbackMetadata rollbackMetadata = testTable.getRollbackMetadata(commitTimeToRollback, commitTime, partitionFilesToDelete);
+    testTable.addRollback(commitTime, rollbackMetadata);
+
+    // delete the resp files from test table before validation
+    for (Map.Entry<String, List<String>> entry : partitionFilesToDelete.entrySet()) {
+      testTable.deleteFilesInPartition(entry.getKey(), entry.getValue());
+    }
+    syncAndValidate(testTable);
+  }
+
+  /**
+   * Test few table operations like insert, upsert, compaction, clean.
+   * @param tableType
+   * @throws Exception
+   */
+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  public void testTableOperationsWithTestTable(HoodieTableType tableType) throws Exception {
+    init(tableType);
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    testTableOperations(testTable,false);
+  }
+
+  /**
+   * 1. Enable metadata to sync and validate.
+   * 2. Disable metadata and add few writes to table.
+   * 3. Enable back again to sync and validate.
+   * @throws Exception
+   */

Review comment:
       `@throws Exception` looks redundant here. most of the time we just let exception throw and investigate the failure.

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java
##########
@@ -144,6 +168,33 @@ public HoodieTestTable addCommit(String instantTime) throws Exception {
     return this;
   }
 
+  public HoodieCommitMetadata createCommitMetadata(WriteOperationType operationType, String commitTime,
+                                                   Map<String, List<Pair<String, Integer>>> partitionToFileIdMap) {

Review comment:
       should try encapsulate data structure like `partitionToFileIdMap` within `HoodieTestState` and  make it invisible to users. It's not easy to grasp and keep recalling what info is kept in the Map. And more friction of using it in an API




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#issuecomment-912612994


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002",
       "triggerID" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2003",
       "triggerID" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "37a2c3ee7e5703de90f1ace958d6b905aaa1d019",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "37a2c3ee7e5703de90f1ace958d6b905aaa1d019",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0183d684704d4c6a36dd6cb4c985beddb9d9ad76 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002) 
   * 031fe678ba16a9ef61541d9ca940ac37459661f9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2003) 
   * 37a2c3ee7e5703de90f1ace958d6b905aaa1d019 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #3595: [HUDI-2395] Rewrite metadata tests using HoodieTestTable

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #3595:
URL: https://github.com/apache/hudi/pull/3595#issuecomment-912612994


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002",
       "triggerID" : "0183d684704d4c6a36dd6cb4c985beddb9d9ad76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "031fe678ba16a9ef61541d9ca940ac37459661f9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0183d684704d4c6a36dd6cb4c985beddb9d9ad76 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2002) 
   * 031fe678ba16a9ef61541d9ca940ac37459661f9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org