You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2023/01/10 18:01:37 UTC

[GitHub] [hudi] jonvex opened a new pull request, #7638: [HUDI-5520] Fail MDT when list of log files grow > 1000

jonvex opened a new pull request, #7638:
URL: https://github.com/apache/hudi/pull/7638

   ### Change Logs
   
   If there is an instance stuck pending then compaction will never occur. This change throws an exception if 1000 log files are created without a compaction occurring.
   
   ### Impact
   
   Logfiles will no longer grow unbounded.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7638: [HUDI-5520] Fail MDT when list of log files grow > 1000

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7638:
URL: https://github.com/apache/hudi/pull/7638#issuecomment-1377656833

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7638: [HUDI-5520] Fail MDT when list of log files grow unboundedly

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7638:
URL: https://github.com/apache/hudi/pull/7638#discussion_r1199700498


##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##########
@@ -861,6 +862,33 @@ public void testMetadataTableWithPendingCompaction(boolean simulateFailedCompact
     }
   }
 
+  /**
+   * Tests to make sure that compaction won't be delayed forever due to a stuck pending commit
+   * */
+  @Test
+  public void testMetadataTableWithLongLog() throws Exception {
+    HoodieTableType tableType = COPY_ON_WRITE;
+    init(tableType, false);
+    writeConfig = getWriteConfigBuilder(true, true, false)
+        .withMetadataConfig(HoodieMetadataConfig.newBuilder()
+            .enable(true)
+            .enableFullScan(true)
+            .enableMetrics(false)
+            .build()).build();
+    initWriteConfigAndMetatableWriter(writeConfig, true);
+
+    HoodieException e =  assertThrows(HoodieException.class, () -> {
+      for (int i = 0; i < HoodieBackedTableMetadataWriter.MAX_LOG_FILE_LIST_LENGTH + 100; i++) {

Review Comment:
   this test actually does 1100 commits to finish.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1040,6 +1043,24 @@ protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String inst
     }
   }
 
+  /**
+   * If there is an instant that is stuck pending, compaction will never occur and the log length will grow unbounded.
+   * Throw an exception if MAX_LOG_FILE_LIST_LENGTH is exceeded.
+   */
+  private void checkLogFileListLength() {

Review Comment:
   this should be static and parameterize the max length



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7638: [HUDI-5520] Fail MDT when list of log files grow unboundedly

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7638:
URL: https://github.com/apache/hudi/pull/7638#issuecomment-1377976435

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14226",
       "triggerID" : "26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d7629f13970a0a00bfb9825b44eb7846bb58814d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d7629f13970a0a00bfb9825b44eb7846bb58814d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14226) 
   * d7629f13970a0a00bfb9825b44eb7846bb58814d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #7638: [HUDI-5520] Fail MDT when list of log files grow unboundedly

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on PR #7638:
URL: https://github.com/apache/hudi/pull/7638#issuecomment-1556120001

   re-worked in https://github.com/apache/hudi/pull/8772


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan closed pull request #7638: [HUDI-5520] Fail MDT when list of log files grow unboundedly

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan closed pull request #7638: [HUDI-5520] Fail MDT when list of log files grow unboundedly
URL: https://github.com/apache/hudi/pull/7638


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7638: [HUDI-5520] Fail MDT when list of log files grow > 1000

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #7638:
URL: https://github.com/apache/hudi/pull/7638#discussion_r1066160180


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1040,6 +1043,23 @@ protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String inst
     }
   }
 
+  /**
+   * If there is an instant that is stuck pending, compaction will never occur and the log length will grow unbounded.
+   * Throw an exception if MAX_LOG_FILE_LIST_LENGTH is exceeded.
+   */
+  private void checkLogFileListLength() {
+    Option<HoodieInstant> lastCompaction = metadataMetaClient.reloadActiveTimeline().getCommitTimeline().filterCompletedInstants().lastInstant();
+    int logSize;
+    if (lastCompaction.isPresent()) {
+      logSize = metadataMetaClient.getActiveTimeline().getDeltaCommitTimeline().findInstantsAfter(lastCompaction.get().getTimestamp()).countInstants();
+    } else {
+      logSize = metadataMetaClient.getActiveTimeline().getDeltaCommitTimeline().countInstants();
+    }
+    if (logSize > MAX_LOG_FILE_LIST_LENGTH) {
+      throw new HoodieException("List of log files has grown beyond " + MAX_LOG_FILE_LIST_LENGTH + ".");

Review Comment:
   lets fix the log msg. 
   "Looks like metadata table log files are growing unbounded due to a pending instant in data table timeline. Please fix that and restart the pipeline". 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7638: [HUDI-5520] Fail MDT when list of log files grow unboundedly

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7638:
URL: https://github.com/apache/hudi/pull/7638#issuecomment-1378115047

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14226",
       "triggerID" : "26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d7629f13970a0a00bfb9825b44eb7846bb58814d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14227",
       "triggerID" : "d7629f13970a0a00bfb9825b44eb7846bb58814d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d7629f13970a0a00bfb9825b44eb7846bb58814d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14227) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7638: [HUDI-5520] Fail MDT when list of log files grow > 1000

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7638:
URL: https://github.com/apache/hudi/pull/7638#issuecomment-1377664161

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14226",
       "triggerID" : "26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14226) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7638: [HUDI-5520] Fail MDT when list of log files grow unboundedly

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7638:
URL: https://github.com/apache/hudi/pull/7638#issuecomment-1377983103

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14226",
       "triggerID" : "26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d7629f13970a0a00bfb9825b44eb7846bb58814d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14227",
       "triggerID" : "d7629f13970a0a00bfb9825b44eb7846bb58814d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 26bc4dc5efb73ff7eefb7d5ddf4510e4c45af83b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14226) 
   * d7629f13970a0a00bfb9825b44eb7846bb58814d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14227) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org