You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "prashantwason (via GitHub)" <gi...@apache.org> on 2023/06/30 17:03:50 UTC

[GitHub] [hudi] prashantwason opened a new pull request, #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

prashantwason opened a new pull request, #9106:
URL: https://github.com/apache/hudi/pull/9106

   [HUDI-6118] Some fixes to improve the MDT and record index code base.
   
   ### Change Logs
   
   1. Print MDT partition name instead of the enum tostring in logs
   2. Use fsView.loadAllPartitions()
   3. When publishing size metrics for MDT, only consider partitions which have been initialized
   4. Fixed job status names
   5. Limited logs which were printing the entire list of partitions. This is very verbose for datasets with large number of partitions
   6. Added a config to reduce the max parallelism of record index initialization.
   7. Changed defaults for MDT write configs to reasonable values
   8. Added config for MDT logBlock size. Larger blocks are preferred to reduce lookup time.
   9. Fixed the size metrics for MDT. These metrics should be set instead of incremented.
   
   
   ### Impact
   
   Fixes issues for the recently commited RI and MDT changes
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1631750486

   > then a wrong setting here would probably keep only a single HFile
   
   Can we add some validation logic in metadata table write config builder and guard the correctness? To keep at least 2 version for each file group will also double the storage for metadata table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1640230307

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663",
       "triggerID" : "1640202630",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 16ae34ec0e91811bae11a980749f5b77d048adba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1638593006

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 16ae34ec0e91811bae11a980749f5b77d048adba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1638590939

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1640565751

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663",
       "triggerID" : "1640202630",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 16ae34ec0e91811bae11a980749f5b77d048adba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1644793004

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663",
       "triggerID" : "1640202630",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694",
       "triggerID" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694) 
   * d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1644844999

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663",
       "triggerID" : "1640202630",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694",
       "triggerID" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18746",
       "triggerID" : "d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee2ba88f2032f599e9180dae5410772cbc19614d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18750",
       "triggerID" : "ee2ba88f2032f599e9180dae5410772cbc19614d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18746) 
   * ee2ba88f2032f599e9180dae5410772cbc19614d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18750) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1248767527


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -91,8 +108,9 @@ public static HoodieWriteConfig createMetadataWriteConfig(
         .withCleanConfig(HoodieCleanConfig.newBuilder()
             .withAsyncClean(DEFAULT_METADATA_ASYNC_CLEAN)
             .withAutoClean(false)
-            .withCleanerParallelism(parallelism)
-            .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
+            .withCleanerParallelism(defaultParallelism)
+            .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS)
+            .retainFileVersions(2)

Review Comment:
   This is a big behavior change, why we always leave some log files there then?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1270218438


##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -285,6 +277,12 @@ public final class HoodieMetadataConfig extends HoodieConfig {
       .withDocumentation("The current number of records are multiplied by this number when estimating the number of "
           + "file groups to create automatically. This helps account for growth in the number of records in the dataset.");
 
+  public static final ConfigProperty<Integer> RECORD_INDEX_MAX_PARALLELISM = ConfigProperty
+      .key(METADATA_PREFIX + ".max.init.parallelism")
+      .defaultValue(100000)
+      .sinceVersion("0.14.0")

Review Comment:
   Do we need a parallelism of 100000 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1640202630

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1631626905

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * eb56e1be9ea831362a61adccec2ec2826c86d6a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240) 
   * 2c07b3e13de51845aad4e280c5fb07688f103d4a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1615153382

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * eb56e1be9ea831362a61adccec2ec2826c86d6a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 merged PR #9106:
URL: https://github.com/apache/hudi/pull/9106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1260358088


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -65,7 +66,23 @@ public class HoodieMetadataWriteUtils {
   public static HoodieWriteConfig createMetadataWriteConfig(
       HoodieWriteConfig writeConfig, HoodieFailedWritesCleaningPolicy failedWritesCleaningPolicy) {
     String tableName = writeConfig.getTableName() + METADATA_TABLE_NAME_SUFFIX;
-    int parallelism = writeConfig.getMetadataInsertParallelism();
+
+    // MDT writes are always prepped. Hence, insert and upsert shuffle parallelism are not important to be configured. Same for delete
+    // parallelism as deletes are not used.
+    // The finalize, cleaner and rollback tasks will operate on each fileGroup so their parallelism should be as large as the total file groups.
+    // But it's not possible to accurately get the file group count here so keeping these values large enough. This parallelism would
+    // any ways be limited by the executor counts.
+    final int defaultParallelism = 512;
+
+    // File groups in each partition are fixed at creation time and we do not want them to be split into multiple files
+    // ever. Hence, we use a very large basefile size in metadata table. The actual size of the HFiles created will
+    // eventually depend on the number of file groups selected for each partition (See estimateFileGroupCount function)
+    final long maxHFileSizeBytes = 10 * 1024 * 1024 * 1024L; // 10GB

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1260360731


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -116,11 +134,10 @@ public static HoodieWriteConfig createMetadataWriteConfig(
             // Below config is only used if isLogCompactionEnabled is set.
             .withLogCompactionBlocksThreshold(writeConfig.getMetadataLogCompactBlocksThreshold())
             .build())
-        .withParallelism(parallelism, parallelism)
-        .withDeleteParallelism(parallelism)
-        .withRollbackParallelism(parallelism)
-        .withFinalizeWriteParallelism(parallelism)
-        .withAllowMultiWriteOnSameInstant(true)
+        .withStorageConfig(HoodieStorageConfig.newBuilder().hfileMaxFileSize(maxHFileSizeBytes)
+            .logFileMaxSize(maxLogFileSizeBytes).logFileDataBlockMaxSize(maxLogBlockSizeBytes).build())
+        .withRollbackParallelism(defaultParallelism)
+        .withFinalizeWriteParallelism(defaultParallelism)

Review Comment:
   Yes. This was required because the previous code for commits would overwrite the same instant if it already exists. With the already commited rollback PR, if we get a commit on MDT with same timestamp as previous applied deltacommit then we will first rollback the previously applied deltacommit on MDT and then commit the new change. Hence, multi-write-on-same-instant is never possible in MDT.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1631919853

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2c07b3e13de51845aad4e280c5fb07688f103d4a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500) 
   * 16ae34ec0e91811bae11a980749f5b77d048adba Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1248764700


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -537,7 +538,8 @@ public HoodieTableMetaClient getMetadataMetaClient() {
   }
 
   public Map<String, String> stats() {
-    return metrics.map(m -> m.getStats(true, metadataMetaClient, this)).orElse(new HashMap<>());
+    Set<String> allMetadataPartitionPaths = Arrays.stream(MetadataPartitionType.values()).map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    return metrics.map(m -> m.getStats(true, metadataMetaClient, this, allMetadataPartitionPaths)).orElse(new HashMap<>());

Review Comment:
   Do we need to fetch the enabled partitions here instead of all?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1260358249


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -65,7 +66,23 @@ public class HoodieMetadataWriteUtils {
   public static HoodieWriteConfig createMetadataWriteConfig(
       HoodieWriteConfig writeConfig, HoodieFailedWritesCleaningPolicy failedWritesCleaningPolicy) {
     String tableName = writeConfig.getTableName() + METADATA_TABLE_NAME_SUFFIX;
-    int parallelism = writeConfig.getMetadataInsertParallelism();
+
+    // MDT writes are always prepped. Hence, insert and upsert shuffle parallelism are not important to be configured. Same for delete
+    // parallelism as deletes are not used.
+    // The finalize, cleaner and rollback tasks will operate on each fileGroup so their parallelism should be as large as the total file groups.
+    // But it's not possible to accurately get the file group count here so keeping these values large enough. This parallelism would
+    // any ways be limited by the executor counts.
+    final int defaultParallelism = 512;
+
+    // File groups in each partition are fixed at creation time and we do not want them to be split into multiple files
+    // ever. Hence, we use a very large basefile size in metadata table. The actual size of the HFiles created will
+    // eventually depend on the number of file groups selected for each partition (See estimateFileGroupCount function)
+    final long maxHFileSizeBytes = 10 * 1024 * 1024 * 1024L; // 10GB
+
+    // Keeping the log blocks as large as the log files themselves reduces the number of HFile blocks to be checked for
+    // presence of keys.
+    final long maxLogFileSizeBytes = writeConfig.getMetadataConfig().getMaxLogFileSize();
+    final long maxLogBlockSizeBytes = maxLogFileSizeBytes;

Review Comment:
   Removed. Moved the comment to where it is used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1260354921


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -537,7 +538,8 @@ public HoodieTableMetaClient getMetadataMetaClient() {
   }
 
   public Map<String, String> stats() {
-    return metrics.map(m -> m.getStats(true, metadataMetaClient, this)).orElse(new HashMap<>());
+    Set<String> allMetadataPartitionPaths = Arrays.stream(MetadataPartitionType.values()).map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    return metrics.map(m -> m.getStats(true, metadataMetaClient, this, allMetadataPartitionPaths)).orElse(new HashMap<>());

Review Comment:
   Removed the reload of timeline. It is actually not required since the code is called right after commit where the metaClient is reloaded anyways.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1631910270

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2c07b3e13de51845aad4e280c5fb07688f103d4a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500) 
   * 16ae34ec0e91811bae11a980749f5b77d048adba UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1642575904

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663",
       "triggerID" : "1640202630",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 16ae34ec0e91811bae11a980749f5b77d048adba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663) 
   * 9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1642789217

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663",
       "triggerID" : "1640202630",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694",
       "triggerID" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1644839980

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663",
       "triggerID" : "1640202630",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694",
       "triggerID" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18746",
       "triggerID" : "d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee2ba88f2032f599e9180dae5410772cbc19614d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ee2ba88f2032f599e9180dae5410772cbc19614d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694) 
   * d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18746) 
   * ee2ba88f2032f599e9180dae5410772cbc19614d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1248767090


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -65,7 +66,23 @@ public class HoodieMetadataWriteUtils {
   public static HoodieWriteConfig createMetadataWriteConfig(
       HoodieWriteConfig writeConfig, HoodieFailedWritesCleaningPolicy failedWritesCleaningPolicy) {
     String tableName = writeConfig.getTableName() + METADATA_TABLE_NAME_SUFFIX;
-    int parallelism = writeConfig.getMetadataInsertParallelism();
+
+    // MDT writes are always prepped. Hence, insert and upsert shuffle parallelism are not important to be configured. Same for delete
+    // parallelism as deletes are not used.
+    // The finalize, cleaner and rollback tasks will operate on each fileGroup so their parallelism should be as large as the total file groups.
+    // But it's not possible to accurately get the file group count here so keeping these values large enough. This parallelism would
+    // any ways be limited by the executor counts.
+    final int defaultParallelism = 512;
+
+    // File groups in each partition are fixed at creation time and we do not want them to be split into multiple files
+    // ever. Hence, we use a very large basefile size in metadata table. The actual size of the HFiles created will
+    // eventually depend on the number of file groups selected for each partition (See estimateFileGroupCount function)
+    final long maxHFileSizeBytes = 10 * 1024 * 1024 * 1024L; // 10GB
+
+    // Keeping the log blocks as large as the log files themselves reduces the number of HFile blocks to be checked for
+    // presence of keys.
+    final long maxLogFileSizeBytes = writeConfig.getMetadataConfig().getMaxLogFileSize();
+    final long maxLogBlockSizeBytes = maxLogFileSizeBytes;

Review Comment:
   redundant?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1631621309

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * eb56e1be9ea831362a61adccec2ec2826c86d6a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240) 
   * 2c07b3e13de51845aad4e280c5fb07688f103d4a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1645952466

   @danny0405 The max init values for other indexes are too low (See HUID 6553). Indexes are really useful for large datasets which have large number of partitions and files. Assume a large dataset with 100K+ files. The default parallelism of the index initialization in code is like 200 which would take HOURS for the indexes to be built. With a large parallelism:
   1. The actual used parallelism is min(number_of_operations, 100,000)
   2. So for small datasets, the lower value is used'
   3. For larger datasets 100K is used.
   
   We routinely have datasets with over 1M files in them (as large as 6M files). I have tested with various parallelism values and its not an exact science but somewhere around 100,000 was where I got the fastest bootstrap of the indexes. Very large parallelism causes OOM and memory issues on Spark.
   
   If you leave the defaults to 200 -> many people would report timeouts building indexes on larger tables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1638911507

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 16ae34ec0e91811bae11a980749f5b77d048adba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1268479126


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -91,8 +108,9 @@ public static HoodieWriteConfig createMetadataWriteConfig(
         .withCleanConfig(HoodieCleanConfig.newBuilder()
             .withAsyncClean(DEFAULT_METADATA_ASYNC_CLEAN)
             .withAutoClean(false)
-            .withCleanerParallelism(parallelism)
-            .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
+            .withCleanerParallelism(defaultParallelism)
+            .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS)
+            .retainFileVersions(2)

Review Comment:
   I have reverted this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1642543868

   @danny0405 @nsivabalan  I have reverted the change to the cleaning policy. PTAL again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1642585086

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663",
       "triggerID" : "1640202630",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694",
       "triggerID" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 16ae34ec0e91811bae11a980749f5b77d048adba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663) 
   * 9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1252401000


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -116,11 +134,10 @@ public static HoodieWriteConfig createMetadataWriteConfig(
             // Below config is only used if isLogCompactionEnabled is set.
             .withLogCompactionBlocksThreshold(writeConfig.getMetadataLogCompactBlocksThreshold())
             .build())
-        .withParallelism(parallelism, parallelism)
-        .withDeleteParallelism(parallelism)
-        .withRollbackParallelism(parallelism)
-        .withFinalizeWriteParallelism(parallelism)
-        .withAllowMultiWriteOnSameInstant(true)
+        .withStorageConfig(HoodieStorageConfig.newBuilder().hfileMaxFileSize(maxHFileSizeBytes)
+            .logFileMaxSize(maxLogFileSizeBytes).logFileDataBlockMaxSize(maxLogBlockSizeBytes).build())
+        .withRollbackParallelism(defaultParallelism)
+        .withFinalizeWriteParallelism(defaultParallelism)

Review Comment:
   did you remove .withAllowMultiWriteOnSameInstant(true) intentionally ? 



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -91,8 +108,9 @@ public static HoodieWriteConfig createMetadataWriteConfig(
         .withCleanConfig(HoodieCleanConfig.newBuilder()
             .withAsyncClean(DEFAULT_METADATA_ASYNC_CLEAN)
             .withAutoClean(false)
-            .withCleanerParallelism(parallelism)
-            .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
+            .withCleanerParallelism(defaultParallelism)
+            .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS)
+            .retainFileVersions(2)

Review Comment:
   I understand it could be a larger change, but file versions makes sense in general. If uber has been running w/ file versions for 6+ months, we should do a round of testing on our end, and can possibly proceed.
    but incremental cleaning may not kick in. so, for large MDTs, wondering will there be any latency hit 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -341,7 +341,11 @@ private void ensurePartitionsLoadedCorrectly(List<String> partitionList) {
         long beginTs = System.currentTimeMillis();
         // Not loaded yet
         try {
-          LOG.info("Building file system view for partitions " + partitionSet);
+          if (partitionSet.size() < 100) {
+            LOG.info("Building file system view for partitions: " + partitionSet);

Review Comment:
   yes, may be we should reconsider the freq of logging here. for eg, log every every 100 partitions or something. not sure we will gain much by logging this for every partition. 



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -537,7 +538,8 @@ public HoodieTableMetaClient getMetadataMetaClient() {
   }
 
   public Map<String, String> stats() {
-    return metrics.map(m -> m.getStats(true, metadataMetaClient, this)).orElse(new HashMap<>());
+    Set<String> allMetadataPartitionPaths = Arrays.stream(MetadataPartitionType.values()).map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    return metrics.map(m -> m.getStats(true, metadataMetaClient, this, allMetadataPartitionPaths)).orElse(new HashMap<>());

Review Comment:
   HoodieMetadataMetrics.getStats(boolean detailed, HoodieTableMetaClient metaClient, HoodieTableMetadata metadata) 
   
   reloads the timeline. 
   can we move the reload to outside of the caller so that we don't reload for every MDT partition stats



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -176,7 +176,7 @@ private void initMetadataReader() {
     }
 
     try {
-      this.metadata = new HoodieBackedTableMetadata(engineContext, dataWriteConfig.getMetadataConfig(), dataWriteConfig.getBasePath());
+      this.metadata = new HoodieBackedTableMetadata(engineContext, dataWriteConfig.getMetadataConfig(), dataWriteConfig.getBasePath(), true);

Review Comment:
   rational is that, metadata writer itself is short lived just for committing one instant and so we should be good to enable re-use here? 
   do we even expect to see any improvement here, since this is meant just for one write to MDT? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1252501322


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -91,8 +108,9 @@ public static HoodieWriteConfig createMetadataWriteConfig(
         .withCleanConfig(HoodieCleanConfig.newBuilder()
             .withAsyncClean(DEFAULT_METADATA_ASYNC_CLEAN)
             .withAutoClean(false)
-            .withCleanerParallelism(parallelism)
-            .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
+            .withCleanerParallelism(defaultParallelism)
+            .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS)
+            .retainFileVersions(2)

Review Comment:
   Even if Uber has been running for 6+ months, it does not mean the config work well for OSS, because while we migrating the Uber patches, many fixes and other nuances are introduced, I would suggest we move this change to the next release to keep the stability of existing MDT workflow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1631780663

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2c07b3e13de51845aad4e280c5fb07688f103d4a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1260359877


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -176,7 +176,7 @@ private void initMetadataReader() {
     }
 
     try {
-      this.metadata = new HoodieBackedTableMetadata(engineContext, dataWriteConfig.getMetadataConfig(), dataWriteConfig.getBasePath());
+      this.metadata = new HoodieBackedTableMetadata(engineContext, dataWriteConfig.getMetadataConfig(), dataWriteConfig.getBasePath(), true);

Review Comment:
   When initializing additional indexes (after files partition is already initialized), the file listing is taken from the HoodieBackedTableMetadata itself. In this case, it is better to have the reuse enabled so we dont keep listing each time an additional index is initialized. 
   
   This is an optimization for the case when FILES index exists and we are initializing one or more indexes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1631610969

   @danny0405 @nsivabalan I think the cleaning strategy change for MDT is a bugfix because of the following enhancements:
   1. Initial commit on the MDT will create hfiles
   2. Rollbacks not actually rollback the MDT instead of adding a -f1, -f2 deltacommit 
   
   If we KEEP_LATEST_COMMITS then a wrong setting here would probably keep only a single HFile and that will limit the rollback. We cannot rollback the MDT beyond the last hfile as we will lose data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1644797640

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663",
       "triggerID" : "1640202630",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694",
       "triggerID" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18746",
       "triggerID" : "d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694) 
   * d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18746) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1248766813


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -65,7 +66,23 @@ public class HoodieMetadataWriteUtils {
   public static HoodieWriteConfig createMetadataWriteConfig(
       HoodieWriteConfig writeConfig, HoodieFailedWritesCleaningPolicy failedWritesCleaningPolicy) {
     String tableName = writeConfig.getTableName() + METADATA_TABLE_NAME_SUFFIX;
-    int parallelism = writeConfig.getMetadataInsertParallelism();
+
+    // MDT writes are always prepped. Hence, insert and upsert shuffle parallelism are not important to be configured. Same for delete
+    // parallelism as deletes are not used.
+    // The finalize, cleaner and rollback tasks will operate on each fileGroup so their parallelism should be as large as the total file groups.
+    // But it's not possible to accurately get the file group count here so keeping these values large enough. This parallelism would
+    // any ways be limited by the executor counts.
+    final int defaultParallelism = 512;
+
+    // File groups in each partition are fixed at creation time and we do not want them to be split into multiple files
+    // ever. Hence, we use a very large basefile size in metadata table. The actual size of the HFiles created will
+    // eventually depend on the number of file groups selected for each partition (See estimateFileGroupCount function)
+    final long maxHFileSizeBytes = 10 * 1024 * 1024 * 1024L; // 10GB

Review Comment:
   Can we define them as static constants instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1248765118


##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -341,7 +341,11 @@ private void ensurePartitionsLoadedCorrectly(List<String> partitionList) {
         long beginTs = System.currentTimeMillis();
         // Not loaded yet
         try {
-          LOG.info("Building file system view for partitions " + partitionSet);
+          if (partitionSet.size() < 100) {
+            LOG.info("Building file system view for partitions: " + partitionSet);

Review Comment:
   Can we just switch to LOG.debug, is the logging really useful?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1260355443


##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -341,7 +341,11 @@ private void ensurePartitionsLoadedCorrectly(List<String> partitionList) {
         long beginTs = System.currentTimeMillis();
         // Not loaded yet
         try {
-          LOG.info("Building file system view for partitions " + partitionSet);
+          if (partitionSet.size() < 100) {
+            LOG.info("Building file system view for partitions: " + partitionSet);

Review Comment:
   Converted to a debug log.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1632403632

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 16ae34ec0e91811bae11a980749f5b77d048adba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1614965807

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * eb56e1be9ea831362a61adccec2ec2826c86d6a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1614956663

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * eb56e1be9ea831362a61adccec2ec2826c86d6a7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1645058361

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240",
       "triggerID" : "eb56e1be9ea831362a61adccec2ec2826c86d6a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18500",
       "triggerID" : "2c07b3e13de51845aad4e280c5fb07688f103d4a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18519",
       "triggerID" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18646",
       "triggerID" : "1638590939",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "16ae34ec0e91811bae11a980749f5b77d048adba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18663",
       "triggerID" : "1640202630",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18694",
       "triggerID" : "9fc729dfc27c694113e9bfc07c0f1c5ccd78c82b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18746",
       "triggerID" : "d5c62b9be73ae69b2b21eb7cb3da26b8bc95f670",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee2ba88f2032f599e9180dae5410772cbc19614d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18750",
       "triggerID" : "ee2ba88f2032f599e9180dae5410772cbc19614d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ee2ba88f2032f599e9180dae5410772cbc19614d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18750) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org