You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/10 16:59:06 UTC

[GitHub] [hudi] nsivabalan commented on a change in pull request #4519: [HUDI-3180] Include files from completed commits while bootstrapping metadata table

nsivabalan commented on a change in pull request #4519:
URL: https://github.com/apache/hudi/pull/4519#discussion_r781376321



##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java
##########
@@ -76,6 +80,36 @@ public void testMetadataBootstrapInsertUpsertClean(HoodieTableType tableType) th
     bootstrapAndVerify();
   }
 
+  /**
+   * Validate that bootstrap considers only files part of completed commit and ignore any extra files.
+   */
+  @Test
+  public void testMetadataBootstrapWithExtraFiles() throws Exception {
+    HoodieTableType tableType = COPY_ON_WRITE;
+    init(tableType, false);
+    doPreBootstrapWriteOperation(testTable, INSERT, "0000001");
+    doPreBootstrapWriteOperation(testTable, "0000002");
+    doPreBootstrapClean(testTable, "0000003", Arrays.asList("0000001"));
+    doPreBootstrapWriteOperation(testTable, "0000005");
+    // add few extra files to table. bootstrap should include those files.
+    String fileName = UUID.randomUUID().toString();
+    Path baseFilePath = FileCreateUtils.getBaseFilePath(basePath, "p1", "0000006", fileName);
+    FileCreateUtils.createBaseFile(basePath, "p1", "0000006", fileName, 100);

Review comment:
       if its part of the timeline, bootstrap may not kick in. also, not sure if we will gain much from it. this test fails if not the fix in source code as part of this patch. So, we should be good. Let me know what you think. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -746,9 +746,16 @@ protected void bootstrapCommit(List<DirectoryInfo> partitionInfoList, String cre
     HoodieData<HoodieRecord> partitionRecords = engineContext.parallelize(Arrays.asList(allPartitionRecord), 1);
     if (!partitionInfoList.isEmpty()) {
       HoodieData<HoodieRecord> fileListRecords = engineContext.parallelize(partitionInfoList, partitionInfoList.size()).map(partitionInfo -> {
+        Map<String, Long> fileNameToSizeMap = partitionInfo.getFileNameToSizeMap();
+        // filter for files that are part of the completed commits
+        Map<String, Long> validFileNameToSizeMap = fileNameToSizeMap.entrySet().stream().filter(fileSizePair -> {
+          String commitTime = FSUtils.getCommitTime(fileSizePair.getKey());
+          return HoodieTimeline.compareTimestamps(commitTime, HoodieTimeline.LESSER_THAN_OR_EQUALS, createInstantTime);

Review comment:
       bootstrap itself will get triggered only if all operations are complete. If there was a partially failed commit, unless an explicit rollback happens, bootstrap may not kick in. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org