You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "danny0405 (via GitHub)" <gi...@apache.org> on 2023/01/28 02:48:06 UTC

[GitHub] [hudi] danny0405 commented on a diff in pull request #7767: [HUDI-5629] Clean CDC log fils

danny0405 commented on code in PR #7767:
URL: https://github.com/apache/hudi/pull/7767#discussion_r1089604222


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -436,23 +433,12 @@ private List<CleanFileInfo> getCleanFileInfoForSlice(FileSlice nextSlice) {
         cleanPaths.add(new CleanFileInfo(dataFile.getBootstrapBaseFile().get().getPath(), true));
       }
     }
-    if (hoodieTable.getMetaClient().getTableType() == HoodieTableType.MERGE_ON_READ) {
-      // If merge on read, then clean the log files for the commits as well
-      Predicate<HoodieLogFile> notCDCLogFile =
-          hoodieLogFile -> !hoodieLogFile.getFileName().endsWith(HoodieCDCUtils.CDC_LOGFILE_SUFFIX);
-      cleanPaths.addAll(
-          nextSlice.getLogFiles().filter(notCDCLogFile).map(lf -> new CleanFileInfo(lf.getPath().toString(), false))
-              .collect(Collectors.toList()));
-    }
-    if (hoodieTable.getMetaClient().getTableConfig().isCDCEnabled()) {
-      // The cdc log files will be written out in cdc scenario, no matter the table type is mor or cow.
-      // Here we need to clean uo these cdc log files.
-      Predicate<HoodieLogFile> isCDCLogFile =
-          hoodieLogFile -> hoodieLogFile.getFileName().endsWith(HoodieCDCUtils.CDC_LOGFILE_SUFFIX);
-      cleanPaths.addAll(
-          nextSlice.getLogFiles().filter(isCDCLogFile).map(lf -> new CleanFileInfo(lf.getPath().toString(), false))
-              .collect(Collectors.toList()));
-    }
+
+    // clean the log files for the commits, which contain cdc log files in cdc scenario
+    // and normal log files for mor tables.
+    cleanPaths.addAll(
+        nextSlice.getLogFiles().map(lf -> new CleanFileInfo(lf.getPath().toString(), false))
+            .collect(Collectors.toList()));
     return cleanPaths;

Review Comment:
   Should pre-condition `hoodieTable.getMetaClient().getTableType() == HoodieTableType.MERGE_ON_READ` be kept for efficiency?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org