You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/09 23:40:43 UTC

[GitHub] [hudi] manojpec commented on a change in pull request #4519: [HUDI-3180] Include files from completed commits while bootstrapping metadata table

manojpec commented on a change in pull request #4519:
URL: https://github.com/apache/hudi/pull/4519#discussion_r780847273



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -746,9 +746,16 @@ protected void bootstrapCommit(List<DirectoryInfo> partitionInfoList, String cre
     HoodieData<HoodieRecord> partitionRecords = engineContext.parallelize(Arrays.asList(allPartitionRecord), 1);
     if (!partitionInfoList.isEmpty()) {
       HoodieData<HoodieRecord> fileListRecords = engineContext.parallelize(partitionInfoList, partitionInfoList.size()).map(partitionInfo -> {
+        Map<String, Long> fileNameToSizeMap = partitionInfo.getFileNameToSizeMap();
+        // filter for files that are part of the completed commits
+        Map<String, Long> validFileNameToSizeMap = fileNameToSizeMap.entrySet().stream().filter(fileSizePair -> {
+          String commitTime = FSUtils.getCommitTime(fileSizePair.getKey());
+          return HoodieTimeline.compareTimestamps(commitTime, HoodieTimeline.LESSER_THAN_OR_EQUALS, createInstantTime);

Review comment:
       this does not filter out the failed old commits right?

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java
##########
@@ -76,6 +80,36 @@ public void testMetadataBootstrapInsertUpsertClean(HoodieTableType tableType) th
     bootstrapAndVerify();
   }
 
+  /**
+   * Validate that bootstrap considers only files part of completed commit and ignore any extra files.
+   */
+  @Test
+  public void testMetadataBootstrapWithExtraFiles() throws Exception {
+    HoodieTableType tableType = COPY_ON_WRITE;
+    init(tableType, false);
+    doPreBootstrapWriteOperation(testTable, INSERT, "0000001");
+    doPreBootstrapWriteOperation(testTable, "0000002");
+    doPreBootstrapClean(testTable, "0000003", Arrays.asList("0000001"));
+    doPreBootstrapWriteOperation(testTable, "0000005");
+    // add few extra files to table. bootstrap should include those files.
+    String fileName = UUID.randomUUID().toString();
+    Path baseFilePath = FileCreateUtils.getBaseFilePath(basePath, "p1", "0000006", fileName);
+    FileCreateUtils.createBaseFile(basePath, "p1", "0000006", fileName, 100);

Review comment:
       Should we instead start the commit and not have it completed so that we have it in timeline also ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org