You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "danny0405 (via GitHub)" <gi...@apache.org> on 2023/03/03 02:15:10 UTC

[GitHub] [hudi] danny0405 opened a new pull request, #8088: [HUDI-5873] The pending compactions of dataset table should not block…

danny0405 opened a new pull request, #8088:
URL: https://github.com/apache/hudi/pull/8088

   … MDT compaction
   
   ### Change Logs
   
   Adjust the MDT compaction strategy to not blocked by DT pending compactions.
   
   ### Impact
   
   Could reduce the metadata small files significantly for MOR table. 
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1467072461

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545",
       "triggerID" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637",
       "triggerID" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15700",
       "triggerID" : "1467041684",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "961aafcc251a8ed6bb18cc40c87365aa0a0924eb",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15701",
       "triggerID" : "961aafcc251a8ed6bb18cc40c87365aa0a0924eb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 10777af559d8be0a0c421ebfb98f001501638aa5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15700) 
   * 961aafcc251a8ed6bb18cc40c87365aa0a0924eb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15701) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1467041684

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1452876024

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c65842899078697c5c5ff647e89f7cf918531f8d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 closed pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block MDT compaction

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 closed pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block MDT compaction
URL: https://github.com/apache/hudi/pull/8088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1453004682

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545",
       "triggerID" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c65842899078697c5c5ff647e89f7cf918531f8d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] vinothchandar commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "vinothchandar (via GitHub)" <gi...@apache.org>.
vinothchandar commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1478834804

   https://issues.apache.org/jira/browse/HUDI-2458 I read through this JIRA. and seems fairly old. 
   @prashantwason do you have thoughts here? @nsivabalan whats your take on the updated PR? I need to get these nuances back into my head to reason about. Just trying to understand where everyone is
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1462209986

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545",
       "triggerID" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637",
       "triggerID" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 10777af559d8be0a0c421ebfb98f001501638aa5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1467063374

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545",
       "triggerID" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637",
       "triggerID" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15700",
       "triggerID" : "1467041684",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "961aafcc251a8ed6bb18cc40c87365aa0a0924eb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "961aafcc251a8ed6bb18cc40c87365aa0a0924eb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 10777af559d8be0a0c421ebfb98f001501638aa5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15700) 
   * 961aafcc251a8ed6bb18cc40c87365aa0a0924eb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8088:
URL: https://github.com/apache/hudi/pull/8088#discussion_r1132010791


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1029,23 +1029,63 @@ protected HoodieData<HoodieRecord> prepRecords(Map<MetadataPartitionType,
   /**
    *  Perform a compaction on the Metadata Table.
    *
-   * Cases to be handled:
-   *   1. We cannot perform compaction if there are previous inflight operations on the dataset. This is because
-   *      a compacted metadata base file at time Tx should represent all the actions on the dataset till time Tx.
-   *
-   *   2. In multi-writer scenario, a parallel operation with a greater instantTime may have completed creating a
-   *      deltacommit.
+   * <p>Cases to be handled:
+   * <ol>
+   *   <li>We cannot perform compaction if there are previous inflight operations on the dataset. This is because
+   *   a compacted metadata base file at time Tx should represent all the actions on the dataset till time Tx;</li>
+   *   <li>In multi-writer scenario, a parallel operation with a greater instantTime may have completed creating a
+   *   deltacommit.</li>
+   * </ol>
    */
   protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String instantTime) {
     // finish off any pending compactions if any from previous attempt.
     writeClient.runAnyPendingCompactions();
 
-    String latestDeltaCommitTimeInMetadataTable = metadataMetaClient.reloadActiveTimeline()
+    HoodieTimeline metadataCompletedDeltaCommitTimeline = metadataMetaClient.reloadActiveTimeline()
         .getDeltaCommitTimeline()
-        .filterCompletedInstants()
+        .filterCompletedInstants();
+    String latestDeltaCommitTimeInMetadataTable = metadataCompletedDeltaCommitTimeline
         .lastInstant().orElseThrow(() -> new HoodieMetadataException("No completed deltacommit in metadata table"))
         .getTimestamp();
-    List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
+    Set<String> metadataCompletedDeltaCommits = metadataCompletedDeltaCommitTimeline.getInstantsAsStream()
+        .map(HoodieInstant::getTimestamp)
+        .collect(Collectors.toSet());
+    // pending compactions in DT should not block the compaction of MDT.
+    // a pending compaction on the DT(for MOR table, this is a common case)
+    // could cause the MDT compaction not been triggered in time,
+    // the slow compaction progress of MDT can further affect the timeline archiving of DT,
+    // which would result in both timelines from DT and MDT can not be archived timely,
+    // that is how the small file issues from both the DT and MDT timelines emerge.
+
+    // why we could filter out the compaction commit that has not been committed into the MDT?
+
+    // there are 2 preconditions that need to address first:
+    // 1. only the write commits (commit, delta_commit, replace_commit) can trigger the MDT compaction;
+    // 2. the MDT is always committed before the DT.
+
+    // there are 3 cases we want to analyze for a compaction instant from DT:
+    // 1. both the DT and MDT does not commit the instant;
+    //    1.1 the compaction in DT is normal, it just lags long time to finish;
+    //    1.2 some error happens to the compaction procedure.
+    // 2. the MDT committed the compaction instant, while the DT hadn't;
+    //    2.1 the job crashed suddenly while the compactor tries to commit to the DT right after the MDT has been committed;
+    //    2.2 the job has been canceled manually right after the MDT has been committed.
+    // 3. both the DT and MDT commit the instant.
+
+    // the 3rd case should be okay, now let's analyze the first 2 cases:
+    //
+    // the 1st case: if the instant has not been committed yet, the compaction of MDT would just ignore the instant,
+    // so the pending instant can not be compacted into the HFile, the instant should also not be archived by both of the DT and the MDT(that is how the archival mechanism works),
+    // the log reader of MDT would ignore the instant correctly, the result view should work!
+
+    // the 2nd case: we can not trigger compact, because once the MDT triggers, the MDT archiver can then archive the instant, but this instant has not been committed in the DT,
+    // the MDT reader can not filter out the instant correctly, another reason is once the instant is compacted into HFile, the subsequent rollback from DT may try to look up
+    // the files to be rolled back, an exception could throw(although the default behavior is not to throws).
+

Review Comment:
   Let me explain the procedure a little more with a demo:
   
   ```java
   delta_c1 (F3, F4) (MDT)
   delta_c1 (F1, F2) (DT)
   c2.inflight (compaction triggers in DT)
   delta_c3 (F7, F8) (MDT)
   delta_c3 (F5, F6) (DT)
   c2 (F7, F8) (compaction complete in MDT)
   c2 failes to commit to DT
   delta_c4 (F9, F10) (MDT)
   -- can we trigger MDT compaction here? The answer is yes
   1. c2 in DT would block the archiving of C2 in MDT
   2. the MDT reader would ignore the C2 too because it is filtered by the c2 on DT timeline, so the compaction does not include c2
   delta_c4 (F11, F12) (DT)
   r5 (to rollback c2) (MDT)
   -F7, -F8
   r5 (to rollback c2) (DT)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1467051791

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545",
       "triggerID" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637",
       "triggerID" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15700",
       "triggerID" : "1467041684",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 10777af559d8be0a0c421ebfb98f001501638aa5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15700) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8088:
URL: https://github.com/apache/hudi/pull/8088#discussion_r1132010791


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1029,23 +1029,63 @@ protected HoodieData<HoodieRecord> prepRecords(Map<MetadataPartitionType,
   /**
    *  Perform a compaction on the Metadata Table.
    *
-   * Cases to be handled:
-   *   1. We cannot perform compaction if there are previous inflight operations on the dataset. This is because
-   *      a compacted metadata base file at time Tx should represent all the actions on the dataset till time Tx.
-   *
-   *   2. In multi-writer scenario, a parallel operation with a greater instantTime may have completed creating a
-   *      deltacommit.
+   * <p>Cases to be handled:
+   * <ol>
+   *   <li>We cannot perform compaction if there are previous inflight operations on the dataset. This is because
+   *   a compacted metadata base file at time Tx should represent all the actions on the dataset till time Tx;</li>
+   *   <li>In multi-writer scenario, a parallel operation with a greater instantTime may have completed creating a
+   *   deltacommit.</li>
+   * </ol>
    */
   protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String instantTime) {
     // finish off any pending compactions if any from previous attempt.
     writeClient.runAnyPendingCompactions();
 
-    String latestDeltaCommitTimeInMetadataTable = metadataMetaClient.reloadActiveTimeline()
+    HoodieTimeline metadataCompletedDeltaCommitTimeline = metadataMetaClient.reloadActiveTimeline()
         .getDeltaCommitTimeline()
-        .filterCompletedInstants()
+        .filterCompletedInstants();
+    String latestDeltaCommitTimeInMetadataTable = metadataCompletedDeltaCommitTimeline
         .lastInstant().orElseThrow(() -> new HoodieMetadataException("No completed deltacommit in metadata table"))
         .getTimestamp();
-    List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
+    Set<String> metadataCompletedDeltaCommits = metadataCompletedDeltaCommitTimeline.getInstantsAsStream()
+        .map(HoodieInstant::getTimestamp)
+        .collect(Collectors.toSet());
+    // pending compactions in DT should not block the compaction of MDT.
+    // a pending compaction on the DT(for MOR table, this is a common case)
+    // could cause the MDT compaction not been triggered in time,
+    // the slow compaction progress of MDT can further affect the timeline archiving of DT,
+    // which would result in both timelines from DT and MDT can not be archived timely,
+    // that is how the small file issues from both the DT and MDT timelines emerge.
+
+    // why we could filter out the compaction commit that has not been committed into the MDT?
+
+    // there are 2 preconditions that need to address first:
+    // 1. only the write commits (commit, delta_commit, replace_commit) can trigger the MDT compaction;
+    // 2. the MDT is always committed before the DT.
+
+    // there are 3 cases we want to analyze for a compaction instant from DT:
+    // 1. both the DT and MDT does not commit the instant;
+    //    1.1 the compaction in DT is normal, it just lags long time to finish;
+    //    1.2 some error happens to the compaction procedure.
+    // 2. the MDT committed the compaction instant, while the DT hadn't;
+    //    2.1 the job crashed suddenly while the compactor tries to commit to the DT right after the MDT has been committed;
+    //    2.2 the job has been canceled manually right after the MDT has been committed.
+    // 3. both the DT and MDT commit the instant.
+
+    // the 3rd case should be okay, now let's analyze the first 2 cases:
+    //
+    // the 1st case: if the instant has not been committed yet, the compaction of MDT would just ignore the instant,
+    // so the pending instant can not be compacted into the HFile, the instant should also not be archived by both of the DT and the MDT(that is how the archival mechanism works),
+    // the log reader of MDT would ignore the instant correctly, the result view should work!
+
+    // the 2nd case: we can not trigger compact, because once the MDT triggers, the MDT archiver can then archive the instant, but this instant has not been committed in the DT,
+    // the MDT reader can not filter out the instant correctly, another reason is once the instant is compacted into HFile, the subsequent rollback from DT may try to look up
+    // the files to be rolled back, an exception could throw(although the default behavior is not to throws).
+

Review Comment:
   Let me explain the procedure a little more with a demo:
   
   ```java
   delta_c1 (F3, F4) (MDT)
   delta_c1 (F1, F2) (DT)
   
   c2.inflight (compaction triggers in DT)
   
   delta_c3 (F7, F8) (MDT)
   delta_c3 (F5, F6) (DT)
   
   c2 (F7, F8) (compaction complete in MDT)
   c2 failes to commit to DT
   
   delta_c4 (F9, F10) (MDT)
   -- can we trigger MDT compaction here? The answer is yes
   1. c2 in DT would block the archiving of C2 in MDT
   2. the MDT reader would ignore the C2 too because it is filtered by the c2 on DT timeline, so the compaction does not include c2
   delta_c4 (F11, F12) (DT)
   
   r5 (to rollback c2) (MDT)
   -F7, -F8
   r5 (to rollback c2) (DT)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8088:
URL: https://github.com/apache/hudi/pull/8088#discussion_r1132010791


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1029,23 +1029,63 @@ protected HoodieData<HoodieRecord> prepRecords(Map<MetadataPartitionType,
   /**
    *  Perform a compaction on the Metadata Table.
    *
-   * Cases to be handled:
-   *   1. We cannot perform compaction if there are previous inflight operations on the dataset. This is because
-   *      a compacted metadata base file at time Tx should represent all the actions on the dataset till time Tx.
-   *
-   *   2. In multi-writer scenario, a parallel operation with a greater instantTime may have completed creating a
-   *      deltacommit.
+   * <p>Cases to be handled:
+   * <ol>
+   *   <li>We cannot perform compaction if there are previous inflight operations on the dataset. This is because
+   *   a compacted metadata base file at time Tx should represent all the actions on the dataset till time Tx;</li>
+   *   <li>In multi-writer scenario, a parallel operation with a greater instantTime may have completed creating a
+   *   deltacommit.</li>
+   * </ol>
    */
   protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String instantTime) {
     // finish off any pending compactions if any from previous attempt.
     writeClient.runAnyPendingCompactions();
 
-    String latestDeltaCommitTimeInMetadataTable = metadataMetaClient.reloadActiveTimeline()
+    HoodieTimeline metadataCompletedDeltaCommitTimeline = metadataMetaClient.reloadActiveTimeline()
         .getDeltaCommitTimeline()
-        .filterCompletedInstants()
+        .filterCompletedInstants();
+    String latestDeltaCommitTimeInMetadataTable = metadataCompletedDeltaCommitTimeline
         .lastInstant().orElseThrow(() -> new HoodieMetadataException("No completed deltacommit in metadata table"))
         .getTimestamp();
-    List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
+    Set<String> metadataCompletedDeltaCommits = metadataCompletedDeltaCommitTimeline.getInstantsAsStream()
+        .map(HoodieInstant::getTimestamp)
+        .collect(Collectors.toSet());
+    // pending compactions in DT should not block the compaction of MDT.
+    // a pending compaction on the DT(for MOR table, this is a common case)
+    // could cause the MDT compaction not been triggered in time,
+    // the slow compaction progress of MDT can further affect the timeline archiving of DT,
+    // which would result in both timelines from DT and MDT can not be archived timely,
+    // that is how the small file issues from both the DT and MDT timelines emerge.
+
+    // why we could filter out the compaction commit that has not been committed into the MDT?
+
+    // there are 2 preconditions that need to address first:
+    // 1. only the write commits (commit, delta_commit, replace_commit) can trigger the MDT compaction;
+    // 2. the MDT is always committed before the DT.
+
+    // there are 3 cases we want to analyze for a compaction instant from DT:
+    // 1. both the DT and MDT does not commit the instant;
+    //    1.1 the compaction in DT is normal, it just lags long time to finish;
+    //    1.2 some error happens to the compaction procedure.
+    // 2. the MDT committed the compaction instant, while the DT hadn't;
+    //    2.1 the job crashed suddenly while the compactor tries to commit to the DT right after the MDT has been committed;
+    //    2.2 the job has been canceled manually right after the MDT has been committed.
+    // 3. both the DT and MDT commit the instant.
+
+    // the 3rd case should be okay, now let's analyze the first 2 cases:
+    //
+    // the 1st case: if the instant has not been committed yet, the compaction of MDT would just ignore the instant,
+    // so the pending instant can not be compacted into the HFile, the instant should also not be archived by both of the DT and the MDT(that is how the archival mechanism works),
+    // the log reader of MDT would ignore the instant correctly, the result view should work!
+
+    // the 2nd case: we can not trigger compact, because once the MDT triggers, the MDT archiver can then archive the instant, but this instant has not been committed in the DT,
+    // the MDT reader can not filter out the instant correctly, another reason is once the instant is compacted into HFile, the subsequent rollback from DT may try to look up
+    // the files to be rolled back, an exception could throw(although the default behavior is not to throws).
+

Review Comment:
   Let me explain the procedure a little more with a demo:
   
   ```java
   delta_c1 (F3, F4) (MDT)
   delta_c1 (F1, F2) (DT)
   
   c2.inflight (compaction triggers in DT)
   
   delta_c3 (F7, F8) (MDT)
   delta_c3 (F5, F6) (DT)
   
   c2 failes to commit to MDT
   
   delta_c4 (F9, F10) (MDT)
   -- can we trigger MDT compaction here? The answer is yes
       1. c2 in DT would block the archiving of C2 in MDT
       2. the MDT reader would ignore the C2 too because it is filtered by the c2 on DT timeline, so the compaction 
           does not include c2
   delta_c4 (F11, F12) (DT)
   
   r5 (to rollback c2) (MDT)
   -F7, -F8
   r5 (to rollback c2) (DT)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1467252776

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545",
       "triggerID" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637",
       "triggerID" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15700",
       "triggerID" : "1467041684",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "961aafcc251a8ed6bb18cc40c87365aa0a0924eb",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15701",
       "triggerID" : "961aafcc251a8ed6bb18cc40c87365aa0a0924eb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 961aafcc251a8ed6bb18cc40c87365aa0a0924eb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15701) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1452880303

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545",
       "triggerID" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c65842899078697c5c5ff647e89f7cf918531f8d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1462027120

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545",
       "triggerID" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637",
       "triggerID" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c65842899078697c5c5ff647e89f7cf918531f8d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545) 
   * 10777af559d8be0a0c421ebfb98f001501638aa5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15637) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1461943991

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545",
       "triggerID" : "c65842899078697c5c5ff647e89f7cf918531f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "10777af559d8be0a0c421ebfb98f001501638aa5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c65842899078697c5c5ff647e89f7cf918531f8d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15545) 
   * 10777af559d8be0a0c421ebfb98f001501638aa5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1459253998

   Thanks for the reminder, I went through the code and there seems two checking logic for the flag `wasSynced`:
   
   1. the instant we want to rollback should still be includede in the active timeline
   2. or the instant time should be smaller than or equals to the latest compaction instant of the MDT
   
   But let's address this case specifically, in this patch, I lossen the restriction to unlock the restriction that the inflight compaction can block the compaction of MDT. It does not break the rule of 1 and 2 because the instant consided to be compacted should already be consider committed/complete on the DT, which means it would never be rolled back.
   
   If the instant was never considered to be rolled back, then things ganna be okay right?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #8088: [HUDI-5873] The pending compactions of dataset table should not block…

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on PR #8088:
URL: https://github.com/apache/hudi/pull/8088#issuecomment-1489728180

   I have documented all intricacies and interplay here https://issues.apache.org/jira/browse/HUDI-2458?focusedCommentId=17706580&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17706580 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org