You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Zouxxyy (via GitHub)" <gi...@apache.org> on 2023/04/03 08:09:20 UTC

[GitHub] [hudi] Zouxxyy opened a new pull request, #8364: [HUDI-6007] Add log files to savepoint metadata

Zouxxyy opened a new pull request, #8364:
URL: https://github.com/apache/hudi/pull/8364

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8364:
URL: https://github.com/apache/hudi/pull/8364#discussion_r1188294647


##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -850,6 +850,25 @@ public final Stream<FileSlice> getLatestFileSlicesBeforeOrOn(String partitionStr
     }
   }
 
+  @Override
+  public final Map<String, Stream<FileSlice>> getAllLatestFileSlicesBeforeOrOn(String maxCommitTime) {
+    try {
+      readLock.lock();
+      List<String> formattedPartitionList = ensureAllPartitionsLoadedCorrectly();
+      return formattedPartitionList.stream().collect(Collectors.toMap(
+          Function.identity(),
+          partitionPath -> fetchAllStoredFileGroups(partitionPath)
+              .filter(slice -> !isFileGroupReplacedBeforeOrOn(slice.getFileGroupId(), maxCommitTime))
+              .map(fg -> fg.getAllFileSlicesBeforeOn(maxCommitTime))
+              .map(sliceStream -> sliceStream.flatMap(slice -> this.filterBaseFileAfterPendingCompaction(slice, false)))
+              .map(sliceStream -> Option.fromJavaOptional(sliceStream.findFirst())).filter(Option::isPresent).map(Option::get)

Review Comment:
   > this.filterBaseFileAfterPendingCompaction
   
   This may give in-complete slice for the `FileGroup`, but it still works for your use case. A more general way is we merge the pending slice log files with the previous file slice logs, just like what we do for the reader view.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -215,6 +216,21 @@ private List<String> getPartitionPathsForFullCleaning() {
     return FSUtils.getAllPartitionPaths(context, config.getMetadataConfig(), config.getBasePath());
   }
 
+  /**
+   *  Verify whether file slice exists in savepointedFiles, check both base file and log files
+   */
+  private boolean isFsExistInSavepointedFiles(FileSlice fs, List<String> savepointedFiles) {
+    if (fs.getBaseFile().isPresent() && savepointedFiles.contains(fs.getBaseFile().get().getFileName())) {
+      return true;
+    }

Review Comment:
   isFsExistInSavepointedFiles -> isFileSliceExistInSavepointedFiles



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestMergeIntoLogOnlyTable.scala:
##########
@@ -88,4 +88,57 @@ class TestMergeIntoLogOnlyTable extends HoodieSparkSqlTestBase {
       )
     })
   }
+
+  test("Test Savepoint with Log Only MOR Table") {
+    withRecordType()(withTempDir { tmp =>
+      // Create table with INMEMORY index to generate log only mor table.
+      val tableName = generateTableName
+      spark.sql(
+        s"""
+           |create table $tableName (
+           |  id int,

Review Comment:
   Can we move the tests to `TestSavepointsProcedure`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 merged PR #8364:
URL: https://github.com/apache/hudi/pull/8364


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on a diff in pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on code in PR #8364:
URL: https://github.com/apache/hudi/pull/8364#discussion_r1188679145


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -215,6 +216,21 @@ private List<String> getPartitionPathsForFullCleaning() {
     return FSUtils.getAllPartitionPaths(context, config.getMetadataConfig(), config.getBasePath());
   }
 
+  /**
+   *  Verify whether file slice exists in savepointedFiles, check both base file and log files
+   */
+  private boolean isFsExistInSavepointedFiles(FileSlice fs, List<String> savepointedFiles) {
+    if (fs.getBaseFile().isPresent() && savepointedFiles.contains(fs.getBaseFile().get().getFileName())) {
+      return true;
+    }

Review Comment:
   A finer-grained clean should be achievable, as long as our savepoint is accurate enough, but I'm not sure if it will affects other. If we want to achieve it, we should open a new PR?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -215,6 +216,21 @@ private List<String> getPartitionPathsForFullCleaning() {
     return FSUtils.getAllPartitionPaths(context, config.getMetadataConfig(), config.getBasePath());
   }
 
+  /**
+   *  Verify whether file slice exists in savepointedFiles, check both base file and log files
+   */
+  private boolean isFsExistInSavepointedFiles(FileSlice fs, List<String> savepointedFiles) {
+    if (fs.getBaseFile().isPresent() && savepointedFiles.contains(fs.getBaseFile().get().getFileName())) {
+      return true;
+    }

Review Comment:
   A finer-grained clean should be achievable, as long as our savepoint is accurate enough, but I'm not sure if it will affects other. If we want to achieve it, we should open a new different PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on a diff in pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on code in PR #8364:
URL: https://github.com/apache/hudi/pull/8364#discussion_r1188679145


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -215,6 +216,21 @@ private List<String> getPartitionPathsForFullCleaning() {
     return FSUtils.getAllPartitionPaths(context, config.getMetadataConfig(), config.getBasePath());
   }
 
+  /**
+   *  Verify whether file slice exists in savepointedFiles, check both base file and log files
+   */
+  private boolean isFsExistInSavepointedFiles(FileSlice fs, List<String> savepointedFiles) {
+    if (fs.getBaseFile().isPresent() && savepointedFiles.contains(fs.getBaseFile().get().getFileName())) {
+      return true;
+    }

Review Comment:
   A finer-grained clean should be achievable, as long as our savepoint is accurate enough, but I'm not sure if it will affects other



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8364:
URL: https://github.com/apache/hudi/pull/8364#discussion_r1188268883


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -215,6 +216,21 @@ private List<String> getPartitionPathsForFullCleaning() {
     return FSUtils.getAllPartitionPaths(context, config.getMetadataConfig(), config.getBasePath());
   }
 
+  /**
+   *  Verify whether file slice exists in savepointedFiles, check both base file and log files
+   */
+  private boolean isFsExistInSavepointedFiles(FileSlice fs, List<String> savepointedFiles) {
+    if (fs.getBaseFile().isPresent() && savepointedFiles.contains(fs.getBaseFile().get().getFileName())) {
+      return true;
+    }

Review Comment:
   Should we return true if base file exists while log file is missing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] boundarymate commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "boundarymate (via GitHub)" <gi...@apache.org>.
boundarymate commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1500059693

   @danny0405 Can you help with a review~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on a diff in pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on code in PR #8364:
URL: https://github.com/apache/hudi/pull/8364#discussion_r1188647600


##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -850,6 +850,25 @@ public final Stream<FileSlice> getLatestFileSlicesBeforeOrOn(String partitionStr
     }
   }
 
+  @Override
+  public final Map<String, Stream<FileSlice>> getAllLatestFileSlicesBeforeOrOn(String maxCommitTime) {
+    try {
+      readLock.lock();
+      List<String> formattedPartitionList = ensureAllPartitionsLoadedCorrectly();
+      return formattedPartitionList.stream().collect(Collectors.toMap(
+          Function.identity(),
+          partitionPath -> fetchAllStoredFileGroups(partitionPath)
+              .filter(slice -> !isFileGroupReplacedBeforeOrOn(slice.getFileGroupId(), maxCommitTime))
+              .map(fg -> fg.getAllFileSlicesBeforeOn(maxCommitTime))
+              .map(sliceStream -> sliceStream.flatMap(slice -> this.filterBaseFileAfterPendingCompaction(slice, false)))
+              .map(sliceStream -> Option.fromJavaOptional(sliceStream.findFirst())).filter(Option::isPresent).map(Option::get)

Review Comment:
   > This may give in-complete slice for the `FileGroup`, but it still works for your use case. A more general way is we merge the pending slice log files with the previous file slice logs, just like what we do for the reader view.
   
   `getAllLatestFileSlicesBeforeOrOn` copy the the logic of `getLatestFileSlicesBeforeOrOn` and set `includeFileSlicesInPendingCompaction` to true.
   In fact, I'm a little confused about this:
   ```java
     protected Stream<FileSlice> filterBaseFileAfterPendingCompaction(FileSlice fileSlice, boolean includeEmptyFileSlice) {
       if (isFileSliceAfterPendingCompaction(fileSlice)) {
         LOG.debug("File Slice (" + fileSlice + ") is in pending compaction");
         // Base file is filtered out of the file-slice as the corresponding compaction
         // instant not completed yet.
         FileSlice transformed = new FileSlice(fileSlice.getPartitionPath(), fileSlice.getBaseInstantTime(), fileSlice.getFileId());
         fileSlice.getLogFiles().forEach(transformed::addLogFile);
         if (transformed.isEmpty() && !includeEmptyFileSlice) {
           return Stream.of();
         }
         return Stream.of(transformed);
       }
       return Stream.of(fileSlice);
     }
   ```
   Here only the base file is filtered, so there is such a situation: the base file is not completed(under inflight compaction), but the log files are completed? If this is the case, I think it is reasonable to add these log files to the save point.
   
   And can you explain the reader view solution you provided? Here we may just return the required files for save point, do we need to merge them?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1540213465

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64142209a2925754a940698e5545a12f18d16187",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090",
       "triggerID" : "64142209a2925754a940698e5545a12f18d16187",
       "triggerType" : "PUSH"
     }, {
       "hash" : "63314f3069c4f06f8f214969af7dc60718131645",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16969",
       "triggerID" : "63314f3069c4f06f8f214969af7dc60718131645",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63314f3069c4f06f8f214969af7dc60718131645 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16969) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1493931564

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64142209a2925754a940698e5545a12f18d16187",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "64142209a2925754a940698e5545a12f18d16187",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 64142209a2925754a940698e5545a12f18d16187 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1540354747

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64142209a2925754a940698e5545a12f18d16187",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090",
       "triggerID" : "64142209a2925754a940698e5545a12f18d16187",
       "triggerType" : "PUSH"
     }, {
       "hash" : "63314f3069c4f06f8f214969af7dc60718131645",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16969",
       "triggerID" : "63314f3069c4f06f8f214969af7dc60718131645",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6154ca6b062f007e0189f8ecab6123c57ef863c9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16980",
       "triggerID" : "6154ca6b062f007e0189f8ecab6123c57ef863c9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63314f3069c4f06f8f214969af7dc60718131645 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16969) 
   * 6154ca6b062f007e0189f8ecab6123c57ef863c9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16980) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1541050871

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64142209a2925754a940698e5545a12f18d16187",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090",
       "triggerID" : "64142209a2925754a940698e5545a12f18d16187",
       "triggerType" : "PUSH"
     }, {
       "hash" : "63314f3069c4f06f8f214969af7dc60718131645",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16969",
       "triggerID" : "63314f3069c4f06f8f214969af7dc60718131645",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6154ca6b062f007e0189f8ecab6123c57ef863c9",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16980",
       "triggerID" : "6154ca6b062f007e0189f8ecab6123c57ef863c9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6154ca6b062f007e0189f8ecab6123c57ef863c9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16980) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on a diff in pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on code in PR #8364:
URL: https://github.com/apache/hudi/pull/8364#discussion_r1188671535


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -215,6 +216,21 @@ private List<String> getPartitionPathsForFullCleaning() {
     return FSUtils.getAllPartitionPaths(context, config.getMetadataConfig(), config.getBasePath());
   }
 
+  /**
+   *  Verify whether file slice exists in savepointedFiles, check both base file and log files
+   */
+  private boolean isFsExistInSavepointedFiles(FileSlice fs, List<String> savepointedFiles) {
+    if (fs.getBaseFile().isPresent() && savepointedFiles.contains(fs.getBaseFile().get().getFileName())) {
+      return true;
+    }

Review Comment:
   > Should we return true if base file exists while log file is missing?
   
   Currently, when run clean, the file slice is the smallest unit, so as long as there is one file in fs match, the whole file slice is not deleted
   
   For example:
   ```text
   (t1) fs: base, log1
   (t2) run savepoint
   (t3) fs: base, log1, log2
   ...
   (tn) run clean
   ```
   The entire fs will be preserved, here I follow the old logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1539601103

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64142209a2925754a940698e5545a12f18d16187",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090",
       "triggerID" : "64142209a2925754a940698e5545a12f18d16187",
       "triggerType" : "PUSH"
     }, {
       "hash" : "63314f3069c4f06f8f214969af7dc60718131645",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16969",
       "triggerID" : "63314f3069c4f06f8f214969af7dc60718131645",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 64142209a2925754a940698e5545a12f18d16187 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090) 
   * 63314f3069c4f06f8f214969af7dc60718131645 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16969) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1539592191

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64142209a2925754a940698e5545a12f18d16187",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090",
       "triggerID" : "64142209a2925754a940698e5545a12f18d16187",
       "triggerType" : "PUSH"
     }, {
       "hash" : "63314f3069c4f06f8f214969af7dc60718131645",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "63314f3069c4f06f8f214969af7dc60718131645",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 64142209a2925754a940698e5545a12f18d16187 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090) 
   * 63314f3069c4f06f8f214969af7dc60718131645 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1493943981

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64142209a2925754a940698e5545a12f18d16187",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090",
       "triggerID" : "64142209a2925754a940698e5545a12f18d16187",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 64142209a2925754a940698e5545a12f18d16187 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1499893221

   @yihua Can you help with a review~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1494738708

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64142209a2925754a940698e5545a12f18d16187",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090",
       "triggerID" : "64142209a2925754a940698e5545a12f18d16187",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 64142209a2925754a940698e5545a12f18d16187 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8364:
URL: https://github.com/apache/hudi/pull/8364#issuecomment-1540246966

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64142209a2925754a940698e5545a12f18d16187",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16090",
       "triggerID" : "64142209a2925754a940698e5545a12f18d16187",
       "triggerType" : "PUSH"
     }, {
       "hash" : "63314f3069c4f06f8f214969af7dc60718131645",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16969",
       "triggerID" : "63314f3069c4f06f8f214969af7dc60718131645",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6154ca6b062f007e0189f8ecab6123c57ef863c9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6154ca6b062f007e0189f8ecab6123c57ef863c9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63314f3069c4f06f8f214969af7dc60718131645 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16969) 
   * 6154ca6b062f007e0189f8ecab6123c57ef863c9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8364: [HUDI-6007] Add log files to savepoint metadata

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8364:
URL: https://github.com/apache/hudi/pull/8364#discussion_r1189288256


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -215,6 +216,21 @@ private List<String> getPartitionPathsForFullCleaning() {
     return FSUtils.getAllPartitionPaths(context, config.getMetadataConfig(), config.getBasePath());
   }
 
+  /**
+   *  Verify whether file slice exists in savepointedFiles, check both base file and log files
+   */
+  private boolean isFileSliceExistInSavepointedFiles(FileSlice fs, List<String> savepointedFiles) {
+    if (fs.getBaseFile().isPresent() && savepointedFiles.contains(fs.getBaseFile().get().getFileName())) {

Review Comment:
   isFileSliceExistInSavepointedFiles -> isFileSliceSavepointed ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org