You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/26 15:37:25 UTC

[GitHub] [hudi] codope opened a new pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

codope opened a new pull request #4693:
URL: https://github.com/apache/hudi/pull/4693


   ## What is the purpose of the pull request
   
   This PR is stacked on top of #4523 . Specifically, it has following changes:
   - An index planner in `ScheduleIndexActionExecutor`.
   - Index plan executor in `RunIndexActionExecutor`.
   - Adding `index` API in `HoodieTableMetadataWriter`.
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1022325469


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066499338


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e6e3e1612928fb0892d071ec4c3a26e31ce1ff76 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840) 
   * 4a036d809018043ed0d99adccbe0efdfd920284a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1068721225


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a9f8c1316b55b72c57d18fbe8d0c8103948a30bc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930) 
   * 0d6ad6e1d8767d66b15b31bb06d1318fb08e582c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1073760182


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077538114


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124) 
   * e58990e296aa5125807a4b96269fa7a06c885e69 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282) 
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836580003



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    partitionsToUpdate.addAll(Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    if (!partitionsToUpdate.isEmpty()) {
+      return partitionsToUpdate;
     }
+    // fallback to update files partition only if table config returned no partitions
+    partitionsToUpdate.add(MetadataPartitionType.FILES.getPartitionPath());
+    return partitionsToUpdate;
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    if (indexPartitionInfos.isEmpty()) {
+      LOG.warn("No partition to index in the plan");
+      return;
+    }
+    String indexUptoInstantTime = indexPartitionInfos.get(0).getIndexUptoInstant();
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        // filegroup should have already been initialized while scheduling index for this partition
+        if (!dataMetaClient.getFs().exists(new Path(metadataWriteConfig.getBasePath(), relativePartitionPath))) {
+          throw new HoodieIndexException(String.format("File group not initialized for metadata partition: %s, indexUptoInstant: %s. Looks like index scheduling failed!",
+              relativePartitionPath, indexUptoInstantTime));
+        }
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to check whether file group is initialized for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // return early and populate enabledPartitionTypes correctly (check in initialCommit)
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT));
+      if (!enabledPartitionTypes.contains(partitionType)) {
+        throw new HoodieIndexException(String.format("Indexing for metadata partition: %s is not enabled", partitionType));
+      }
+    });
+    // before initial commit update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), indexPartitionInfos.stream()

Review comment:
       done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1078957363


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d02e0c2ca65038f88ae753484dcb2642ef789f27 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334) 
   * 69071c6306ce336076aa6daa4337276990572ee4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1078957363


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d02e0c2ca65038f88ae753484dcb2642ef789f27 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334) 
   * 69071c6306ce336076aa6daa4337276990572ee4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1081066904


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 522a18caff448bcc9b127372d4526ee8f168f085 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452) 
   * ee361b1bf6b9b68e11f84f2af76625b847669ed2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1081174957


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ee361b1bf6b9b68e11f84f2af76625b847669ed2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1033224161


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7920cb15d99cd92ea2a3e6bd515249eb63040772 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025615583


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5c563ffa6625d610c9c6bd252457129ce5ccddc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1029256806


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5c563ffa6625d610c9c6bd252457129ce5ccddc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619) 
   * 06c6dd9db383efa291c999d5f0140e5d2493eeaf UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1081873988


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ee361b1bf6b9b68e11f84f2af76625b847669ed2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471) 
   * be08ba499bb88d8a00f20695b360336853be708e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839714252



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      catchupWithInflightWriters(metadataWriter, instantsToCatchup, metadataMetaClient, metadataCompletedTimestamps);
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      updateTableConfigAndTimeline(indexInstant, finalIndexPartitionInfos, indexCommitMetadata);
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      // abort gracefully
+      abort(indexInstant, indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private void abort(HoodieInstant indexInstant, Set<String> requestedPartitions) {
+    Set<String> inflightPartitions = getInflightMetadataPartitions(table.getMetaClient().getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(table.getMetaClient().getTableConfig());
+    // delete metadata partition
+    requestedPartitions.forEach(partition -> {
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(partition.toUpperCase(Locale.ROOT));
+      if (metadataPartitionExists(table.getMetaClient().getBasePath(), context, partitionType)) {
+        deleteMetadataPartition(table.getMetaClient().getBasePath(), context, partitionType);
+      }
+      inflightPartitions.remove(partition);
+      completedPartitions.remove(partition);
+    });
+    // update table config
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(table.getMetaClient().getFs(), new Path(table.getMetaClient().getMetaPath()), table.getMetaClient().getTableConfig().getProps());
+    // delete inflight instant
+    table.getMetaClient().reloadActiveTimeline().deleteInstantFileIfExists(HoodieTimeline.getIndexInflightInstant(indexInstant.getTimestamp()));
+  }
+
+  private List<HoodieInstant> getInstantsToCatchup(String indexUptoInstant) {
+    // since only write timeline was considered while scheduling index, which gives us the indexUpto instant
+    // here we consider other valid actions to pick catchupStart instant
+    Set<String> validActions = CollectionUtils.createSet(CLEAN_ACTION, RESTORE_ACTION, ROLLBACK_ACTION);

Review comment:
       > should we do it Such that any non write actions are picked up?
   
   Only savepoint is remaining right. deliberately avoided savepoint as it does not alter the filegroup in any way right (except for marking it so as to avoid cleaner). so i did not consider that. 
   
     > Also why not have scheduling consider non write actions?
     
   yes, that's the way to go.
   we consider non write actions to determine the catchup start instant. going back to the table we discussed https://github.com/apache/hudi/pull/4693/files#r837817961 we need both indexUpto and catchupStart instants. i plan to write them to the index plan rather than pass as parameters. i'm going to revamp the index plan schema so that the API exposes minimal arguments and the plan is the source of truth as we discussed.Tracking it in HUDI-3755




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083579784


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a3ee4cd75320e578235cea4490ed7470bb721ea5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570) 
   * 18b9acd3320e68ee6688ea4eec693676350a9e15 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1065111216


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7920cb15d99cd92ea2a3e6bd515249eb63040772 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801) 
   * e6e3e1612928fb0892d071ec4c3a26e31ce1ff76 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835783252



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/bloom/BloomFilter.java
##########
@@ -30,6 +34,13 @@
    */
   void add(String key);
 
+  /**
+   * Add secondary key to the {@link BloomFilter}.
+   *
+   * @param keys list of secondary keys to add to the {@link BloomFilter}
+   */
+  void add(@Nonnull List<String> keys);

Review comment:
       synced up f2f. looks like we are passing in list of secondary col fields here. 
   we might need to re-think this fully. from what I understand, we need to initialize one bloom per every sec key and add the respective col values to the corresponding bloom filter. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836041910



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -112,6 +114,25 @@ public HoodieDefaultTimeline getWriteTimeline() {
     return new HoodieDefaultTimeline(instants.stream().filter(s -> validActions.contains(s.getAction())), details);
   }
 
+  @Override
+  public HoodieDefaultTimeline getContiguousCompletedWriteTimeline() {

Review comment:
       yea, this could work. will add a UT.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835776948



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -659,20 +691,100 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();

Review comment:
       I see it in 3 ot 4 upgrade handler. but what if somone enables after few commits after upgrade? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836580660



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))

Review comment:
       oops, missed this one! will add in a subsequent commit. keeping it open for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1078955314


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d02e0c2ca65038f88ae753484dcb2642ef789f27 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334) 
   * 69071c6306ce336076aa6daa4337276990572ee4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835770502



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize filegroups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.retainAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {
+      LOG.error("Following partitions already exist or inflight: " + requestedPartitions);
+      return Option.empty();
+    }
+    // get last completed instant
+    Option<HoodieInstant> indexUptoInstant = table.getActiveTimeline().getContiguousCompletedWriteTimeline().lastInstant();
+    if (indexUptoInstant.isPresent()) {
+      final HoodieInstant indexInstant = HoodieTimeline.getIndexRequestedInstant(instantTime);
+      // for each partitionToIndex add that time to the plan
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = partitionsToIndex.stream()
+          .map(p -> new HoodieIndexPartitionInfo(LATEST_INDEX_PLAN_VERSION, p.getPartitionPath(), indexUptoInstant.get().getTimestamp()))
+          .collect(Collectors.toList());
+      HoodieIndexPlan indexPlan = new HoodieIndexPlan(LATEST_INDEX_PLAN_VERSION, indexPartitionInfos);
+      try {
+        table.getActiveTimeline().saveToPendingIndexCommit(indexInstant, TimelineMetadataUtils.serializeIndexPlan(indexPlan));
+      } catch (IOException e) {
+        LOG.error("Error while saving index requested file", e);
+        throw new HoodieIOException(e.getMessage(), e);
+      }
+      table.getMetaClient().reloadActiveTimeline();
+
+      // start initializing filegroups
+      // 1. get metadata writer
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to initialize filegroups for indexing for instant: %s", instantTime)));
+      // 2. take a lock --> begin tx (data table)
+      try {
+        this.txnManager.beginTransaction(Option.of(indexInstant), Option.empty());
+        // 3. initialize filegroups as per plan for the enabled partition types
+        metadataWriter.scheduleIndex(table.getMetaClient(), partitionsToIndex, indexInstant.getTimestamp());

Review comment:
       oh yeah, my bad.. sure we need to do this before writing to timeline.. in both schedule and run index, writing to timeline is the last thing. don't know know how i missed it. thanks for pointing out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835778402



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);

Review comment:
       lets also think through what is a user wants to initialize the entire MDT via hoodieIndexer. i.e. they are not bringing up any regular writers. But first bring up HoodieIndexer and wait for everything to be built out. and then start other regular writers. from what I see, its taken care of. but do verify it once. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1080670921


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 522a18caff448bcc9b127372d4526ee8f168f085 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835763700



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    partitionsToUpdate.addAll(Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    if (!partitionsToUpdate.isEmpty()) {
+      return partitionsToUpdate;
     }
+    // fallback to update files partition only if table config returned no partitions
+    partitionsToUpdate.add(MetadataPartitionType.FILES.getPartitionPath());
+    return partitionsToUpdate;
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    if (indexPartitionInfos.isEmpty()) {
+      LOG.warn("No partition to index in the plan");
+      return;
+    }
+    String indexUptoInstantTime = indexPartitionInfos.get(0).getIndexUptoInstant();
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        // filegroup should have already been initialized while scheduling index for this partition
+        if (!dataMetaClient.getFs().exists(new Path(metadataWriteConfig.getBasePath(), relativePartitionPath))) {
+          throw new HoodieIndexException(String.format("File group not initialized for metadata partition: %s, indexUptoInstant: %s. Looks like index scheduling failed!",
+              relativePartitionPath, indexUptoInstantTime));
+        }
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to check whether file group is initialized for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // return early and populate enabledPartitionTypes correctly (check in initialCommit)
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT));
+      if (!enabledPartitionTypes.contains(partitionType)) {
+        throw new HoodieIndexException(String.format("Indexing for metadata partition: %s is not enabled", partitionType));
+      }
+    });
+    // before initial commit update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), indexPartitionInfos.stream()

Review comment:
       hmm.. good point, we should append.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835213738



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -588,10 +609,87 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
-      commit(engineContext.parallelize(records, 1), MetadataPartitionType.FILES.partitionPath(), instantTime, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
+        commit(engineContext.parallelize(records, 1), p, instantTime, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();

Review comment:
       now the source of truth will be table config, which will have both inflight and completed partitions




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r838136779



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -511,24 +523,42 @@ private boolean initializeFromFilesystem(HoodieTableMetaClient dataMetaClient,
 
     initializeMetaClient(dataWriteConfig.getMetadataConfig().populateMetaFields());
     initTableMetadata();
-    initializeEnabledFileGroups(dataMetaClient, createInstantTime);
+    // if async metadata indexing is enabled,
+    // then only initialize files partition as other partitions will be built using HoodieIndexer
+    List<MetadataPartitionType> enabledPartitionTypes =  new ArrayList<>();
+    if (dataWriteConfig.isMetadataAsyncIndex()) {

Review comment:
       I did keep it that way before i.e. you could build files partition as well using the indexer. However, there was a suggestion earlier that files partition should always be indexed inline https://github.com/apache/hudi/pull/4693#discussion_r824509601 (due to its critical nature).
   So i pivoted to keeping files inline and only allow the other partitions async.
   
   Another secondary point was that files partitions takes much less time, other partitions could take a lot of time to build for large tables, so keep them disabled by default on the inline path.
   
   imo, from user standpoint, files partition can also be indexed async.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839800852



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionIndexTypes;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionIndexTypes) {
+    super(context, config, table, instantTime);
+    this.partitionIndexTypes = partitionIndexTypes;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    validateBeforeScheduling();
+    // make sure that it is idempotent, check with previously pending index operations.
+    HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+    Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+    indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+    Set<String> requestedPartitions = partitionIndexTypes.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {
+      LOG.warn(String.format("Following partitions already exist or inflight: %s. Going to index only these partitions: %s",
+          indexesInflightOrCompleted, requestedPartitions));
+    }
+    List<MetadataPartitionType> finalPartitionsToIndex = partitionIndexTypes.stream()

Review comment:
       need to pass List<MetadataPartitionType> instead of List<String>
   will change when we get to secondary indexes.. it should be List<String> i.e. list of partition paths




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839136167



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/ThreeToFourUpgradeHandler.java
##########
@@ -35,7 +40,12 @@
   @Override
   public Map<ConfigProperty, String> upgrade(HoodieWriteConfig config, HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade upgradeDowngradeHelper) {
     Map<ConfigProperty, String> tablePropsToAdd = new Hashtable<>();
-    tablePropsToAdd.put(HoodieTableConfig.TABLE_CHECKSUM, String.valueOf(HoodieTableConfig.generateChecksum(config.getProps())));
+    tablePropsToAdd.put(TABLE_CHECKSUM, String.valueOf(HoodieTableConfig.generateChecksum(config.getProps())));
+    // if metadata is enabled and files partition exist then update TABLE_METADATA_INDEX_COMPLETED
+    // schema for the files partition is same between the two versions
+    if (config.isMetadataTableEnabled() && metadataPartitionExists(config.getBasePath(), context, MetadataPartitionType.FILES)) {
+      tablePropsToAdd.put(TABLE_METADATA_PARTITIONS, MetadataPartitionType.FILES.getPartitionPath());
+    }

Review comment:
       Hi @codope Just thinking, when user set current version is 4 which means there is no need for upgrade/downgrade. Then how can we update the `TABLE_METADATA_PARTITIONS ` column here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839541590



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +681,34 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropMetadataPartitions(List<MetadataPartitionType> metadataPartitions) throws IOException {
+    Set<String> completedIndexes = getCompletedMetadataPartitions(dataMetaClient.getTableConfig());
+    Set<String> inflightIndexes = getInflightMetadataPartitions(dataMetaClient.getTableConfig());
+
+    for (MetadataPartitionType partitionType : metadataPartitions) {
+      String partitionPath = partitionType.getPartitionPath();
+      // first update table config
+      if (inflightIndexes.contains(partitionPath)) {
+        inflightIndexes.remove(partitionPath);
+        dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightIndexes));
+      } else if (completedIndexes.contains(partitionPath)) {
+        completedIndexes.remove(partitionPath);
+        dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedIndexes));
+      }
+      HoodieTableConfig.update(dataMetaClient.getFs(), new Path(dataMetaClient.getMetaPath()), dataMetaClient.getTableConfig().getProps());
+      LOG.warn("Deleting Metadata Table partitions: " + partitionPath);
+      dataMetaClient.getFs().delete(new Path(metadataWriteConfig.getBasePath(), partitionPath), true);

Review comment:
       What happens if delete fails midway before finishing? There is a follow on to use DELETE_PARTITION instead? Even there we could have that operation fail midway and we need some mechanism to reconcile/retry next time we tryto build that partition?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionIndexTypes;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionIndexTypes) {
+    super(context, config, table, instantTime);
+    this.partitionIndexTypes = partitionIndexTypes;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    validateBeforeScheduling();
+    // make sure that it is idempotent, check with previously pending index operations.
+    HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+    Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+    indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+    Set<String> requestedPartitions = partitionIndexTypes.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {

Review comment:
       return if empty?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      catchupWithInflightWriters(metadataWriter, instantsToCatchup, metadataMetaClient, metadataCompletedTimestamps);
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      updateTableConfigAndTimeline(indexInstant, finalIndexPartitionInfos, indexCommitMetadata);
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      // abort gracefully
+      abort(indexInstant, indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private void abort(HoodieInstant indexInstant, Set<String> requestedPartitions) {
+    Set<String> inflightPartitions = getInflightMetadataPartitions(table.getMetaClient().getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(table.getMetaClient().getTableConfig());
+    // delete metadata partition
+    requestedPartitions.forEach(partition -> {

Review comment:
       Think about what happens if this fails midway 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      catchupWithInflightWriters(metadataWriter, instantsToCatchup, metadataMetaClient, metadataCompletedTimestamps);
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      updateTableConfigAndTimeline(indexInstant, finalIndexPartitionInfos, indexCommitMetadata);
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      // abort gracefully
+      abort(indexInstant, indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private void abort(HoodieInstant indexInstant, Set<String> requestedPartitions) {
+    Set<String> inflightPartitions = getInflightMetadataPartitions(table.getMetaClient().getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(table.getMetaClient().getTableConfig());
+    // delete metadata partition
+    requestedPartitions.forEach(partition -> {
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(partition.toUpperCase(Locale.ROOT));
+      if (metadataPartitionExists(table.getMetaClient().getBasePath(), context, partitionType)) {
+        deleteMetadataPartition(table.getMetaClient().getBasePath(), context, partitionType);
+      }
+      inflightPartitions.remove(partition);
+      completedPartitions.remove(partition);
+    });
+    // update table config
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(table.getMetaClient().getFs(), new Path(table.getMetaClient().getMetaPath()), table.getMetaClient().getTableConfig().getProps());
+    // delete inflight instant
+    table.getMetaClient().reloadActiveTimeline().deleteInstantFileIfExists(HoodieTimeline.getIndexInflightInstant(indexInstant.getTimestamp()));
+  }
+
+  private List<HoodieInstant> getInstantsToCatchup(String indexUptoInstant) {
+    // since only write timeline was considered while scheduling index, which gives us the indexUpto instant
+    // here we consider other valid actions to pick catchupStart instant
+    Set<String> validActions = CollectionUtils.createSet(CLEAN_ACTION, RESTORE_ACTION, ROLLBACK_ACTION);

Review comment:
       should we do it Such that any non write actions are picked up? 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      catchupWithInflightWriters(metadataWriter, instantsToCatchup, metadataMetaClient, metadataCompletedTimestamps);
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      updateTableConfigAndTimeline(indexInstant, finalIndexPartitionInfos, indexCommitMetadata);
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      // abort gracefully
+      abort(indexInstant, indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private void abort(HoodieInstant indexInstant, Set<String> requestedPartitions) {
+    Set<String> inflightPartitions = getInflightMetadataPartitions(table.getMetaClient().getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(table.getMetaClient().getTableConfig());
+    // delete metadata partition
+    requestedPartitions.forEach(partition -> {
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(partition.toUpperCase(Locale.ROOT));
+      if (metadataPartitionExists(table.getMetaClient().getBasePath(), context, partitionType)) {
+        deleteMetadataPartition(table.getMetaClient().getBasePath(), context, partitionType);
+      }
+      inflightPartitions.remove(partition);
+      completedPartitions.remove(partition);
+    });
+    // update table config
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(table.getMetaClient().getFs(), new Path(table.getMetaClient().getMetaPath()), table.getMetaClient().getTableConfig().getProps());
+    // delete inflight instant
+    table.getMetaClient().reloadActiveTimeline().deleteInstantFileIfExists(HoodieTimeline.getIndexInflightInstant(indexInstant.getTimestamp()));
+  }
+
+  private List<HoodieInstant> getInstantsToCatchup(String indexUptoInstant) {
+    // since only write timeline was considered while scheduling index, which gives us the indexUpto instant
+    // here we consider other valid actions to pick catchupStart instant
+    Set<String> validActions = CollectionUtils.createSet(CLEAN_ACTION, RESTORE_ACTION, ROLLBACK_ACTION);

Review comment:
       Also why not have scheduling consider non write actions?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));

Review comment:
       This code Can move to a helper and shared with schedule A E ?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionIndexTypes;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionIndexTypes) {
+    super(context, config, table, instantTime);
+    this.partitionIndexTypes = partitionIndexTypes;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    validateBeforeScheduling();
+    // make sure that it is idempotent, check with previously pending index operations.
+    HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+    Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+    indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+    Set<String> requestedPartitions = partitionIndexTypes.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {
+      LOG.warn(String.format("Following partitions already exist or inflight: %s. Going to index only these partitions: %s",
+          indexesInflightOrCompleted, requestedPartitions));
+    }
+    List<MetadataPartitionType> finalPartitionsToIndex = partitionIndexTypes.stream()

Review comment:
       Why cant we just use requestedPartitions?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;

Review comment:
       rename: currentCaughtupInstant

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionIndexTypes;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionIndexTypes) {
+    super(context, config, table, instantTime);
+    this.partitionIndexTypes = partitionIndexTypes;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    validateBeforeScheduling();
+    // make sure that it is idempotent, check with previously pending index operations.
+    HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+    Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+    indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+    Set<String> requestedPartitions = partitionIndexTypes.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {

Review comment:
       if empty then return ?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      catchupWithInflightWriters(metadataWriter, instantsToCatchup, metadataMetaClient, metadataCompletedTimestamps);
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      updateTableConfigAndTimeline(indexInstant, finalIndexPartitionInfos, indexCommitMetadata);
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      // abort gracefully
+      abort(indexInstant, indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private void abort(HoodieInstant indexInstant, Set<String> requestedPartitions) {
+    Set<String> inflightPartitions = getInflightMetadataPartitions(table.getMetaClient().getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(table.getMetaClient().getTableConfig());
+    // delete metadata partition
+    requestedPartitions.forEach(partition -> {
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(partition.toUpperCase(Locale.ROOT));
+      if (metadataPartitionExists(table.getMetaClient().getBasePath(), context, partitionType)) {
+        deleteMetadataPartition(table.getMetaClient().getBasePath(), context, partitionType);
+      }
+      inflightPartitions.remove(partition);
+      completedPartitions.remove(partition);
+    });
+    // update table config
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(table.getMetaClient().getFs(), new Path(table.getMetaClient().getMetaPath()), table.getMetaClient().getTableConfig().getProps());
+    // delete inflight instant
+    table.getMetaClient().reloadActiveTimeline().deleteInstantFileIfExists(HoodieTimeline.getIndexInflightInstant(indexInstant.getTimestamp()));
+  }
+
+  private List<HoodieInstant> getInstantsToCatchup(String indexUptoInstant) {
+    // since only write timeline was considered while scheduling index, which gives us the indexUpto instant
+    // here we consider other valid actions to pick catchupStart instant
+    Set<String> validActions = CollectionUtils.createSet(CLEAN_ACTION, RESTORE_ACTION, ROLLBACK_ACTION);
+    HoodieInstant catchupStartInstant = table.getMetaClient().reloadActiveTimeline()
+        .getTimelineOfActions(validActions)
+        .filterInflightsAndRequested()
+        .findInstantsBefore(indexUptoInstant)
+        .firstInstant().orElseGet(() -> null);
+    // get all instants since the plan completed (both from active timeline and archived timeline)
+    List<HoodieInstant> instantsToIndex;
+    if (catchupStartInstant != null) {
+      instantsToIndex = getRemainingArchivedAndActiveInstantsSince(catchupStartInstant.getTimestamp(), table.getMetaClient());
+    } else {
+      instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+    }
+    return instantsToIndex;
+  }
+
+  private HoodieInstant validateAndGetIndexInstant() {
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    return table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+  }
+
+  private void updateTableConfigAndTimeline(HoodieInstant indexInstant,
+                                            List<HoodieIndexPartitionInfo> finalIndexPartitionInfos,
+                                            HoodieIndexCommitMetadata indexCommitMetadata) throws IOException {
+    try {
+      // update the table config and timeline in a lock as there could be another indexer running
+      txnManager.beginTransaction();
+      updateMetadataPartitionsTableConfig(table.getMetaClient(),
+          finalIndexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      table.getActiveTimeline().saveAsComplete(
+          new HoodieInstant(true, INDEXING_ACTION, indexInstant.getTimestamp()),
+          TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+    } finally {
+      txnManager.endTransaction();
+    }
+  }
+
+  private void catchupWithInflightWriters(HoodieTableMetadataWriter metadataWriter, List<HoodieInstant> instantsToIndex,
+                                          HoodieTableMetaClient metadataMetaClient, Set<String> metadataCompletedTimestamps) {
+    ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+    Future<?> indexingCatchupTaskFuture = executorService.submit(
+        new IndexingCatchupTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient(), metadataMetaClient));
+    try {
+      LOG.info("Starting index catchup task");
+      indexingCatchupTaskFuture.get(config.getIndexingCheckTimeoutSeconds(), TimeUnit.SECONDS);
+    } catch (Exception e) {
+      indexingCatchupTaskFuture.cancel(true);
+      throw new HoodieIndexException(String.format("Index catchup failed. Current indexed instant = %s. Aborting!", currentIndexedInstant), e);
+    } finally {
+      executorService.shutdownNow();
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline().getInstants()
+        .filter(i -> HoodieTimeline.compareTimestamps(i.getTimestamp(), GREATER_THAN_OR_EQUALS, instant))
+        .filter(i -> !INDEXING_ACTION.equals(i.getAction()))
+        .collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().findInstantsAfter(instant).getInstants()
+        .filter(i -> HoodieTimeline.compareTimestamps(i.getTimestamp(), GREATER_THAN_OR_EQUALS, instant))
+        .filter(i -> !INDEXING_ACTION.equals(i.getAction()))
+        .collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline().filterCompletedInstants().findInstantsAfter(instant)
+        .getInstants().filter(i -> !INDEXING_ACTION.equals(i.getAction())).collect(Collectors.toList());
+    completedInstants.addAll(metaClient.reloadActiveTimeline().filterCompletedInstants().findInstantsAfter(instant)
+        .getInstants().filter(i -> !INDEXING_ACTION.equals(i.getAction())).collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  private void updateMetadataPartitionsTableConfig(HoodieTableMetaClient metaClient, Set<String> metadataPartitions) {
+    // remove from inflight and update completed indexes
+    Set<String> inflightPartitions = getInflightMetadataPartitions(metaClient.getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(metaClient.getTableConfig());
+    inflightPartitions.removeAll(metadataPartitions);
+    completedPartitions.addAll(metadataPartitions);
+    // update table config
+    metaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    metaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(metaClient.getFs(), new Path(metaClient.getMetaPath()), metaClient.getTableConfig().getProps());
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCatchupTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+    private final Set<String> metadataCompletedInstants;
+    private final HoodieTableMetaClient metaClient;
+    private final HoodieTableMetaClient metadataMetaClient;
+
+    IndexingCatchupTask(HoodieTableMetadataWriter metadataWriter,
+                        List<HoodieInstant> instantsToIndex,
+                        Set<String> metadataCompletedInstants,
+                        HoodieTableMetaClient metaClient,
+                        HoodieTableMetaClient metadataMetaClient) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+      this.metadataCompletedInstants = metadataCompletedInstants;
+      this.metaClient = metaClient;
+      this.metadataMetaClient = metadataMetaClient;
+    }
+
+    @Override
+    public void run() {
+      for (HoodieInstant instant : instantsToIndex) {
+        // metadata index already updated for this instant
+        if (!metadataCompletedInstants.isEmpty() && metadataCompletedInstants.contains(instant.getTimestamp())) {
+          currentIndexedInstant = instant.getTimestamp();
+          continue;
+        }
+        while (!instant.isCompleted()) {
+          try {
+            LOG.warn("instant not completed, reloading timeline " + instant);
+            // reload timeline and fetch instant details again wait until timeout
+            String instantTime = instant.getTimestamp();
+            Option<HoodieInstant> currentInstant = metaClient.reloadActiveTimeline()
+                .filterCompletedInstants().filter(i -> i.getTimestamp().equals(instantTime)).firstInstant();
+            instant = currentInstant.orElse(instant);
+            // so that timeline is not reloaded very frequently
+            Thread.sleep(TIMELINE_RELOAD_INTERVAL_MILLIS);
+          } catch (InterruptedException e) {
+            throw new HoodieIndexException(String.format("Thread interrupted while running indexing check for instant: %s", instant), e);
+          }
+        }
+        // if instant completed, ensure that there was metadata commit, else update metadata for this completed instant
+        if (COMPLETED.equals(instant.getState())) {
+          String instantTime = instant.getTimestamp();
+          Option<HoodieInstant> metadataInstant = metadataMetaClient.reloadActiveTimeline()
+              .filterCompletedInstants().filter(i -> i.getTimestamp().equals(instantTime)).firstInstant();
+          if (metadataInstant.isPresent()) {
+            currentIndexedInstant = instantTime;
+            continue;
+          }
+          try {
+            // we need take a lock here as inflight writer could also try to update the timeline
+            txnManager.beginTransaction(Option.of(instant), Option.empty());
+            LOG.info("Updating metadata table for instant: " + instant);
+            switch (instant.getAction()) {

Review comment:
       is n't there a top level method in metadata writer to handle different instant types ? We can reuse that or move this code there

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      catchupWithInflightWriters(metadataWriter, instantsToCatchup, metadataMetaClient, metadataCompletedTimestamps);
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      updateTableConfigAndTimeline(indexInstant, finalIndexPartitionInfos, indexCommitMetadata);
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      // abort gracefully
+      abort(indexInstant, indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private void abort(HoodieInstant indexInstant, Set<String> requestedPartitions) {
+    Set<String> inflightPartitions = getInflightMetadataPartitions(table.getMetaClient().getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(table.getMetaClient().getTableConfig());
+    // delete metadata partition
+    requestedPartitions.forEach(partition -> {
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(partition.toUpperCase(Locale.ROOT));
+      if (metadataPartitionExists(table.getMetaClient().getBasePath(), context, partitionType)) {
+        deleteMetadataPartition(table.getMetaClient().getBasePath(), context, partitionType);
+      }
+      inflightPartitions.remove(partition);
+      completedPartitions.remove(partition);
+    });
+    // update table config
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(table.getMetaClient().getFs(), new Path(table.getMetaClient().getMetaPath()), table.getMetaClient().getTableConfig().getProps());
+    // delete inflight instant
+    table.getMetaClient().reloadActiveTimeline().deleteInstantFileIfExists(HoodieTimeline.getIndexInflightInstant(indexInstant.getTimestamp()));
+  }
+
+  private List<HoodieInstant> getInstantsToCatchup(String indexUptoInstant) {
+    // since only write timeline was considered while scheduling index, which gives us the indexUpto instant
+    // here we consider other valid actions to pick catchupStart instant
+    Set<String> validActions = CollectionUtils.createSet(CLEAN_ACTION, RESTORE_ACTION, ROLLBACK_ACTION);
+    HoodieInstant catchupStartInstant = table.getMetaClient().reloadActiveTimeline()

Review comment:
       Use Option?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      catchupWithInflightWriters(metadataWriter, instantsToCatchup, metadataMetaClient, metadataCompletedTimestamps);
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      updateTableConfigAndTimeline(indexInstant, finalIndexPartitionInfos, indexCommitMetadata);
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      // abort gracefully
+      abort(indexInstant, indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private void abort(HoodieInstant indexInstant, Set<String> requestedPartitions) {
+    Set<String> inflightPartitions = getInflightMetadataPartitions(table.getMetaClient().getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(table.getMetaClient().getTableConfig());
+    // delete metadata partition
+    requestedPartitions.forEach(partition -> {
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(partition.toUpperCase(Locale.ROOT));
+      if (metadataPartitionExists(table.getMetaClient().getBasePath(), context, partitionType)) {
+        deleteMetadataPartition(table.getMetaClient().getBasePath(), context, partitionType);
+      }
+      inflightPartitions.remove(partition);
+      completedPartitions.remove(partition);
+    });
+    // update table config
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(table.getMetaClient().getFs(), new Path(table.getMetaClient().getMetaPath()), table.getMetaClient().getTableConfig().getProps());
+    // delete inflight instant
+    table.getMetaClient().reloadActiveTimeline().deleteInstantFileIfExists(HoodieTimeline.getIndexInflightInstant(indexInstant.getTimestamp()));
+  }
+
+  private List<HoodieInstant> getInstantsToCatchup(String indexUptoInstant) {
+    // since only write timeline was considered while scheduling index, which gives us the indexUpto instant
+    // here we consider other valid actions to pick catchupStart instant
+    Set<String> validActions = CollectionUtils.createSet(CLEAN_ACTION, RESTORE_ACTION, ROLLBACK_ACTION);
+    HoodieInstant catchupStartInstant = table.getMetaClient().reloadActiveTimeline()
+        .getTimelineOfActions(validActions)
+        .filterInflightsAndRequested()
+        .findInstantsBefore(indexUptoInstant)
+        .firstInstant().orElseGet(() -> null);
+    // get all instants since the plan completed (both from active timeline and archived timeline)
+    List<HoodieInstant> instantsToIndex;
+    if (catchupStartInstant != null) {
+      instantsToIndex = getRemainingArchivedAndActiveInstantsSince(catchupStartInstant.getTimestamp(), table.getMetaClient());
+    } else {
+      instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+    }
+    return instantsToIndex;
+  }
+
+  private HoodieInstant validateAndGetIndexInstant() {
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    return table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+  }
+
+  private void updateTableConfigAndTimeline(HoodieInstant indexInstant,
+                                            List<HoodieIndexPartitionInfo> finalIndexPartitionInfos,
+                                            HoodieIndexCommitMetadata indexCommitMetadata) throws IOException {
+    try {
+      // update the table config and timeline in a lock as there could be another indexer running
+      txnManager.beginTransaction();
+      updateMetadataPartitionsTableConfig(table.getMetaClient(),
+          finalIndexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      table.getActiveTimeline().saveAsComplete(
+          new HoodieInstant(true, INDEXING_ACTION, indexInstant.getTimestamp()),
+          TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+    } finally {
+      txnManager.endTransaction();
+    }
+  }
+
+  private void catchupWithInflightWriters(HoodieTableMetadataWriter metadataWriter, List<HoodieInstant> instantsToIndex,
+                                          HoodieTableMetaClient metadataMetaClient, Set<String> metadataCompletedTimestamps) {
+    ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+    Future<?> indexingCatchupTaskFuture = executorService.submit(
+        new IndexingCatchupTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient(), metadataMetaClient));
+    try {
+      LOG.info("Starting index catchup task");
+      indexingCatchupTaskFuture.get(config.getIndexingCheckTimeoutSeconds(), TimeUnit.SECONDS);
+    } catch (Exception e) {
+      indexingCatchupTaskFuture.cancel(true);
+      throw new HoodieIndexException(String.format("Index catchup failed. Current indexed instant = %s. Aborting!", currentIndexedInstant), e);
+    } finally {
+      executorService.shutdownNow();
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline().getInstants()
+        .filter(i -> HoodieTimeline.compareTimestamps(i.getTimestamp(), GREATER_THAN_OR_EQUALS, instant))
+        .filter(i -> !INDEXING_ACTION.equals(i.getAction()))
+        .collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().findInstantsAfter(instant).getInstants()
+        .filter(i -> HoodieTimeline.compareTimestamps(i.getTimestamp(), GREATER_THAN_OR_EQUALS, instant))
+        .filter(i -> !INDEXING_ACTION.equals(i.getAction()))
+        .collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline().filterCompletedInstants().findInstantsAfter(instant)
+        .getInstants().filter(i -> !INDEXING_ACTION.equals(i.getAction())).collect(Collectors.toList());
+    completedInstants.addAll(metaClient.reloadActiveTimeline().filterCompletedInstants().findInstantsAfter(instant)
+        .getInstants().filter(i -> !INDEXING_ACTION.equals(i.getAction())).collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  private void updateMetadataPartitionsTableConfig(HoodieTableMetaClient metaClient, Set<String> metadataPartitions) {
+    // remove from inflight and update completed indexes
+    Set<String> inflightPartitions = getInflightMetadataPartitions(metaClient.getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(metaClient.getTableConfig());
+    inflightPartitions.removeAll(metadataPartitions);
+    completedPartitions.addAll(metadataPartitions);
+    // update table config
+    metaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    metaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(metaClient.getFs(), new Path(metaClient.getMetaPath()), metaClient.getTableConfig().getProps());
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCatchupTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+    private final Set<String> metadataCompletedInstants;
+    private final HoodieTableMetaClient metaClient;
+    private final HoodieTableMetaClient metadataMetaClient;
+
+    IndexingCatchupTask(HoodieTableMetadataWriter metadataWriter,
+                        List<HoodieInstant> instantsToIndex,
+                        Set<String> metadataCompletedInstants,
+                        HoodieTableMetaClient metaClient,
+                        HoodieTableMetaClient metadataMetaClient) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+      this.metadataCompletedInstants = metadataCompletedInstants;
+      this.metaClient = metaClient;
+      this.metadataMetaClient = metadataMetaClient;
+    }
+
+    @Override
+    public void run() {
+      for (HoodieInstant instant : instantsToIndex) {
+        // metadata index already updated for this instant
+        if (!metadataCompletedInstants.isEmpty() && metadataCompletedInstants.contains(instant.getTimestamp())) {
+          currentIndexedInstant = instant.getTimestamp();
+          continue;
+        }
+        while (!instant.isCompleted()) {
+          try {
+            LOG.warn("instant not completed, reloading timeline " + instant);
+            // reload timeline and fetch instant details again wait until timeout
+            String instantTime = instant.getTimestamp();
+            Option<HoodieInstant> currentInstant = metaClient.reloadActiveTimeline()
+                .filterCompletedInstants().filter(i -> i.getTimestamp().equals(instantTime)).firstInstant();
+            instant = currentInstant.orElse(instant);
+            // so that timeline is not reloaded very frequently
+            Thread.sleep(TIMELINE_RELOAD_INTERVAL_MILLIS);
+          } catch (InterruptedException e) {
+            throw new HoodieIndexException(String.format("Thread interrupted while running indexing check for instant: %s", instant), e);
+          }
+        }
+        // if instant completed, ensure that there was metadata commit, else update metadata for this completed instant

Review comment:
       So this is just so that any race causing the inflight to miss this is handled?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083582891


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a3ee4cd75320e578235cea4490ed7470bb721ea5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570) 
   * 18b9acd3320e68ee6688ea4eec693676350a9e15 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1084873773


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582",
       "triggerID" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "triggerType" : "PUSH"
     }, {
       "hash" : "01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fc9ac46f36a4df8d9d590845b9848d48af1f7cae Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582) 
   * 01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083940266


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582",
       "triggerID" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fc9ac46f36a4df8d9d590845b9848d48af1f7cae Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066499338


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e6e3e1612928fb0892d071ec4c3a26e31ce1ff76 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840) 
   * 4a036d809018043ed0d99adccbe0efdfd920284a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077540636


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e58990e296aa5125807a4b96269fa7a06c885e69 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282) 
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077538114


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124) 
   * e58990e296aa5125807a4b96269fa7a06c885e69 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282) 
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1071069583


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 680a99a669d9e2c2e81465efe8e491812e6c3012 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1073676898


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 680a99a669d9e2c2e81465efe8e491812e6c3012 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038) 
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1022322110


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 238b128260cab3ad11c8e00bd20871b45e112c83 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1073679460


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 680a99a669d9e2c2e81465efe8e491812e6c3012 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038) 
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1068721225


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a9f8c1316b55b72c57d18fbe8d0c8103948a30bc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930) 
   * 0d6ad6e1d8767d66b15b31bb06d1318fb08e582c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077725510


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca6f4c73d40497413fd38b6edd7fbf1de9b50cac Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296) 
   * c9295eeaffb5e804ee6c636b8617f754af1492d8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835754540



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -621,8 +635,14 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
 
     LOG.info(String.format("Creating %d file groups for partition %s with base fileId %s at instant time %s",
         fileGroupCount, metadataPartition.getPartitionPath(), metadataPartition.getFileIdPrefix(), instantTime));
+    HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+    List<FileSlice> fileSlices = HoodieTableMetadataUtil.getPartitionLatestFileSlices(metadataMetaClient, Option.ofNullable(fsView), metadataPartition.getPartitionPath());
     for (int i = 0; i < fileGroupCount; ++i) {
       final String fileGroupFileId = String.format("%s%04d", metadataPartition.getFileIdPrefix(), i);
+      // if a writer or async indexer had already initialized the filegroup then continue
+      if (!fileSlices.isEmpty() && fileSlices.stream().anyMatch(fileSlice -> fileGroupFileId.equals(fileSlice.getFileGroupId().getFileId()))) {

Review comment:
       With  initialization happening while scheduling index, we should not get into this case anymore.
   > Ideally initializeFileGroups should be called just once per MDT partition right ?or am I missing something.
   
   That's correct.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835776905



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -659,20 +691,100 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();

Review comment:
       scyced up offline. but can you show me, where do we update files partition to hoodieTable config




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077609304


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835763190



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +669,36 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    for (MetadataPartitionType partitionType : indexesToDrop) {
+      String partitionPath = partitionType.getPartitionPath();
+      if (inflightIndexes.contains(partitionPath)) {
+        LOG.error("Metadata indexing in progress: " + partitionPath);

Review comment:
       yes we can, but i was leaning more towards letting the indexer succeed so that user can only drop index from a clean state. Are you thinking about scenario when index building is taking time and user wants to simply abort that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1078597374


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c9295eeaffb5e804ee6c636b8617f754af1492d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298) 
   * d02e0c2ca65038f88ae753484dcb2642ef789f27 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1078814461


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d02e0c2ca65038f88ae753484dcb2642ef789f27 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077504732


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124) 
   * e58990e296aa5125807a4b96269fa7a06c885e69 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077556653


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e58990e296aa5125807a4b96269fa7a06c885e69 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282) 
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836575594



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);

Review comment:
       validated this by:
   1. writing 2 commits with metadata disabled.
   2. schedule and build index.
   3. do another upsert with metadata disabled.
   4. schedule index
   5. do another upsert with metadata disabled.
   6. run index (it also does catchup).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1081066904


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 522a18caff448bcc9b127372d4526ee8f168f085 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452) 
   * ee361b1bf6b9b68e11f84f2af76625b847669ed2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1080549439


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 69071c6306ce336076aa6daa4337276990572ee4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368) 
   * 522a18caff448bcc9b127372d4526ee8f168f085 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836766366



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -659,20 +691,100 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();
+      }
     }
+    // TODO: return only enabled partitions
+    return MetadataPartitionType.allPaths();

Review comment:
       this logic is updated. no more returning all partitions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836766955



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -659,20 +691,100 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);

Review comment:
       yes. Logic has been updated to initialize filegroups while scheduing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1022409891


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1029259900


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5c563ffa6625d610c9c6bd252457129ce5ccddc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619) 
   * 06c6dd9db383efa291c999d5f0140e5d2493eeaf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1065109086


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7920cb15d99cd92ea2a3e6bd515249eb63040772 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801) 
   * e6e3e1612928fb0892d071ec4c3a26e31ce1ff76 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r824571450



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -118,18 +124,18 @@
   /**
    * Hudi backed table metadata writer.
    *
-   * @param hadoopConf               - Hadoop configuration to use for the metadata writer
-   * @param writeConfig              - Writer config
-   * @param engineContext            - Engine context
-   * @param actionMetadata           - Optional action metadata to help decide bootstrap operations
-   * @param <T>                      - Action metadata types extending Avro generated SpecificRecordBase
+   * @param hadoopConf - Hadoop configuration to use for the metadata writer
+   * @param writeConfig - Writer config
+   * @param engineContext - Engine context
+   * @param actionMetadata - Optional action metadata to help decide bootstrap operations
+   * @param <T> - Action metadata types extending Avro generated SpecificRecordBase
    * @param inflightInstantTimestamp - Timestamp of any instant in progress
    */
   protected <T extends SpecificRecordBase> HoodieBackedTableMetadataWriter(Configuration hadoopConf,
-                                                                           HoodieWriteConfig writeConfig,
-                                                                           HoodieEngineContext engineContext,
-                                                                           Option<T> actionMetadata,
-                                                                           Option<String> inflightInstantTimestamp) {
+      HoodieWriteConfig writeConfig,

Review comment:
       +1 as we lose commit history of these lines and it bloats the diffs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066897806


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on a change in pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
manojpec commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r799116647



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -621,8 +635,14 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
 
     LOG.info(String.format("Creating %d file groups for partition %s with base fileId %s at instant time %s",
         fileGroupCount, metadataPartition.getPartitionPath(), metadataPartition.getFileIdPrefix(), instantTime));
+    HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+    List<FileSlice> fileSlices = HoodieTableMetadataUtil.getPartitionLatestFileSlices(metadataMetaClient, Option.ofNullable(fsView), metadataPartition.getPartitionPath());
     for (int i = 0; i < fileGroupCount; ++i) {
       final String fileGroupFileId = String.format("%s%04d", metadataPartition.getFileIdPrefix(), i);
+      // if a writer or async indexer had already initialized the filegroup then continue
+      if (!fileSlices.isEmpty() && fileSlices.stream().anyMatch(fileSlice -> fileGroupFileId.equals(fileSlice.getFileGroupId().getFileId()))) {

Review comment:
       We should not get into this case. Can you please explain how this can happen? Its either all filegroups are initialized or nothing is initialized. This block gives a sense that it can be partially initialized and the incomplete filgroups are inited here. Which shouldn't happen right?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -383,6 +391,12 @@ public void initTableMetadata() {
     }
 
     if (!exists) {
+      if (metadataWriteConfig.isMetadataAsyncIndex()) {
+        // with async metadata indexing enabled, there can be inflight writers
+        // TODO: schedule indexing only for enabled partition types

Review comment:
       There will be more merge conflicts with https://github.com/apache/hudi/pull/4746. Better to rebase sooner than later. Also, that PR takes care of init for all enabled partitions 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -588,10 +609,87 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
-      commit(engineContext.parallelize(records, 1), MetadataPartitionType.FILES.partitionPath(), instantTime, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
+        commit(engineContext.parallelize(records, 1), p, instantTime, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();
+      }
     }
+    // TODO: return only enabled partitions
+    return MetadataPartitionType.all();
+  }
+
+  private List<String> getExistingMetadataPartitions() {
+    return MetadataPartitionType.all().stream()
+        .filter(p -> {
+          try {
+            // TODO: avoid fs.exists() check
+            return metadataMetaClient.getFs().exists(FSUtils.getPartitionPath(metadataWriteConfig.getBasePath(), p));
+          } catch (IOException e) {
+            return false;
+          }
+        })
+        .collect(Collectors.toList());
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String indexUptoInstantTime = indexPartitionInfo.getIndexUptoInstant();
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        HoodieTableMetaClient.withPropertyBuilder()
+            .setTableType(HoodieTableType.MERGE_ON_READ)
+            .setTableName(tableName)
+            .setArchiveLogFolder(ARCHIVELOG_FOLDER.defaultValue())
+            .setPayloadClassName(HoodieMetadataPayload.class.getName())
+            .setBaseFileFormat(HoodieFileFormat.HFILE.toString())
+            .setRecordKeyFields(RECORD_KEY_FIELD_NAME)
+            .setPopulateMetaFields(dataWriteConfig.getMetadataConfig().populateMetaFields())
+            .setKeyGeneratorClassProp(HoodieTableMetadataKeyGenerator.class.getCanonicalName())
+            .initTable(hadoopConf.get(), metadataWriteConfig.getBasePath());
+        initTableMetadata();
+        initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT)), indexUptoInstantTime, 1);
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // List all partitions in the basePath of the containing dataset
+      LOG.info("Initializing metadata table by using file listings in " + dataWriteConfig.getBasePath());
+      engineContext.setJobStatus(this.getClass().getSimpleName(), "MetadataIndex: initializing metadata table by listing files and partitions");
+      List<DirectoryInfo> dirInfoList = listAllPartitions(dataMetaClient);
+
+      // During bootstrap, the list of files to be committed can be huge. So creating a HoodieCommitMetadata out of these
+      // large number of files and calling the existing update(HoodieCommitMetadata) function does not scale well.
+      // Hence, we have a special commit just for the bootstrap scenario.
+      bootstrapCommit(dirInfoList, indexUptoInstantTime, relativePartitionPath);

Review comment:
       Please take a look at https://github.com/apache/hudi/pull/4746 for this

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
##########
@@ -480,6 +482,25 @@ public abstract HoodieRollbackMetadata rollback(HoodieEngineContext context,
                                                   boolean deleteInstants,
                                                   boolean skipLocking);
 
+  /**
+   * Schedules Indexing for the table to the given instant.
+   *
+   * @param context HoodieEngineContext
+   * @param indexInstantTime Instant time for scheduling index action.
+   * @param partitionsToIndex List of {@link MetadataPartitionType#partitionPath()} that should be indexed.
+   * @return HoodieIndexPlan containing metadata partitions and instant upto which they should be indexed.
+   */
+  public abstract Option<HoodieIndexPlan> scheduleIndex(HoodieEngineContext context, String indexInstantTime, List<String> partitionsToIndex);
+
+  /**
+   * Execute requested index action.
+   *
+   * @param context HoodieEngineContext
+   * @param indexInstantTime Instant time for which index action was scheduled.
+   * @return HoodieIndexCommitMetadata containing write stats for each metadata partition.
+   */
+  public abstract Option<HoodieIndexCommitMetadata> index(HoodieEngineContext context, String indexInstantTime);

Review comment:
       nit: Just to go well with scheduleIndex() and to avoid confusion, how about runIndex() or something similar?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -652,20 +672,99 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();

Review comment:
       This is already a map of all enabled partitions to its records to be committed. And, we are doing this in the outer loop for each enabled partition. This will lead to duplicates. Instead convertMetadataFunction should be doing the getMetadataPartitionToUpdate(), right?
   
   Doing multiple commits for metadata table will bring in all new cases to consider. We can avoid that to keep the current model of single commit with all partition's HoodieData<>.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -118,18 +124,18 @@
   /**
    * Hudi backed table metadata writer.
    *
-   * @param hadoopConf               - Hadoop configuration to use for the metadata writer
-   * @param writeConfig              - Writer config
-   * @param engineContext            - Engine context
-   * @param actionMetadata           - Optional action metadata to help decide bootstrap operations
-   * @param <T>                      - Action metadata types extending Avro generated SpecificRecordBase
+   * @param hadoopConf - Hadoop configuration to use for the metadata writer
+   * @param writeConfig - Writer config
+   * @param engineContext - Engine context
+   * @param actionMetadata - Optional action metadata to help decide bootstrap operations
+   * @param <T> - Action metadata types extending Avro generated SpecificRecordBase
    * @param inflightInstantTimestamp - Timestamp of any instant in progress
    */
   protected <T extends SpecificRecordBase> HoodieBackedTableMetadataWriter(Configuration hadoopConf,
-                                                                           HoodieWriteConfig writeConfig,
-                                                                           HoodieEngineContext engineContext,
-                                                                           Option<T> actionMetadata,
-                                                                           Option<String> inflightInstantTimestamp) {
+      HoodieWriteConfig writeConfig,

Review comment:
       nit: can we avoid changing formatting of these lines if there are no changes to these lines? Is this format on save doing hudi style changes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025426843


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) 
   * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025537756


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) 
   * c5c563ffa6625d610c9c6bd252457129ce5ccddc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066666446


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4a036d809018043ed0d99adccbe0efdfd920284a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066934524


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925) 
   * 5c1c7e91b5f530907cda50135fef8286ee8a8e38 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1068757919


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d6ad6e1d8767d66b15b31bb06d1318fb08e582c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1071069583


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 680a99a669d9e2c2e81465efe8e491812e6c3012 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1073676898


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 680a99a669d9e2c2e81465efe8e491812e6c3012 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038) 
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1078955314


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d02e0c2ca65038f88ae753484dcb2642ef789f27 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334) 
   * 69071c6306ce336076aa6daa4337276990572ee4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835773830



##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java
##########
@@ -121,6 +121,15 @@ protected void initRegistry() {
     }
   }
 
+  @Override
+  protected void scheduleIndex(List<String> partitions) {
+    ValidationUtils.checkState(metadataMetaClient != null, "Metadata table is not fully initialized yet.");

Review comment:
       It doesn't matter whether user specifies files partition or not as one of the options for the tool. Files partition is always enabled as long as MT is enabled. Files partition initialization happens with the initialization of metadata writer, before any other partition indexing (or even their scheduling) can begin. For initialization of filegroups of other partitions, it happens within a lock.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077698651


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca6f4c73d40497413fd38b6edd7fbf1de9b50cac Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835785855



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();

Review comment:
       one thing (most conservative) we could do is, when any partition is being built out, we should guard MDT compaction. But major thing to consider here is. If an async index process crashes, the inflight index will stay for every and we should have heart beat timeout or something which complicates things. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835764478



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {

Review comment:
       no, i'll add it as a follow-up task https://issues.apache.org/jira/browse/HUDI-3727




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077609304


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835779326



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -392,6 +398,12 @@ public void initTableMetadata() {
     }
 
     if (!exists) {
+      if (metadataWriteConfig.isMetadataAsyncIndex()) {

Review comment:
       likely the fix is getEnabledPartitionTypes should not have the FILES partition. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836027951



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);

Review comment:
       missed throwing. done now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1079375166


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 69071c6306ce336076aa6daa4337276990572ee4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077706166


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca6f4c73d40497413fd38b6edd7fbf1de9b50cac Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296) 
   * c9295eeaffb5e804ee6c636b8617f754af1492d8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077698651


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca6f4c73d40497413fd38b6edd7fbf1de9b50cac Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836034276



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/bloom/BloomFilter.java
##########
@@ -30,6 +34,13 @@
    */
   void add(String key);
 
+  /**
+   * Add secondary key to the {@link BloomFilter}.
+   *
+   * @param keys list of secondary keys to add to the {@link BloomFilter}
+   */
+  void add(@Nonnull List<String> keys);

Review comment:
       going to remove it and create a separate patch. Reopened https://issues.apache.org/jira/browse/HUDI-3368
   to track this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836587474



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<HoodieInstant> metadataCompletedTimeline = metadataMetaClient.getActiveTimeline()
+          .getCommitsTimeline().filterCompletedInstants().getInstants().collect(Collectors.toSet());
+      List<HoodieInstant> finalRemainingInstantsToIndex = remainingInstantsToIndex.map(

Review comment:
       so now, we're checking against completed instants in MDT timeline (see getCompletedArchivedAndActiveInstantsAfter method). Only if an instant in data timeline that is yet to be indexed but not present in MDT, we wait till it gets completed (reload timeline pariodically until timeout).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r838165685



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {

Review comment:
       > how do we handle the scenario where we fail after updating the tableConfig/hoodie.props, but before writing the requested indexing to the timeline.
   
   Case 1: scheduling index
   No table config is updated while scehduling. What could happen here is for e.g. column_stats partition was initialized and executor failed just before writing the requested indexing to the timeline. When scheduling is re-triggered, it will begin with a new instant and redo the whole thing. So, this isn't truly idempotent. I need to add a check to ignore if initialization was complete.
   
   Case 2: building index
   "completed" partitions table config gets updated only after index catchup and partition is fully built out. If the table config gets updated but executor failed just before writing the completed indexing to the timeline, the regular writers will see that the partition is availbale for updates. There will be an inflight indexing instant in the timeline forever. When indexer is re-triggered, it will fail again because it will that there is already an inflight indexing instant with the same timestamp.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r838970577



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -538,15 +540,14 @@ private boolean initializeFromFilesystem(HoodieTableMetaClient dataMetaClient,
     // of these large number of files and calling the existing update(HoodieCommitMetadata) function does not scale
     // well. Hence, we have a special commit just for the initialization scenario.
     initialCommit(createInstantTime, enabledPartitionTypes);
-    updateCompletedIndexesInTableConfig(enabledPartitionTypes);
+    updateInitializedPartitionsInTableConfig(enabledPartitionTypes);
     return true;
   }
 
-  private void updateCompletedIndexesInTableConfig(List<MetadataPartitionType> partitionTypes) {
-    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
-        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
-    completedIndexes.addAll(partitionTypes.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toList()));
-    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), String.join(",", completedIndexes));
+  private void updateInitializedPartitionsInTableConfig(List<MetadataPartitionType> partitionTypes) {
+    Set<String> completedIndexes = getCompletedMetadataPartitions(dataMetaClient.getTableConfig());

Review comment:
       completedPartitions

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java
##########
@@ -55,7 +55,7 @@
   String COMPACTION_ACTION = "compaction";
   String REQUESTED_EXTENSION = ".requested";
   String RESTORE_ACTION = "restore";
-  String INDEX_ACTION = "index";
+  String INDEX_ACTION = "indexing";

Review comment:
       INDEXING_ACTION




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1081873988


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ee361b1bf6b9b68e11f84f2af76625b847669ed2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471) 
   * be08ba499bb88d8a00f20695b360336853be708e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083582891


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a3ee4cd75320e578235cea4490ed7470bb721ea5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570) 
   * 18b9acd3320e68ee6688ea4eec693676350a9e15 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1085038949


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582",
       "triggerID" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "triggerType" : "PUSH"
     }, {
       "hash" : "01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7652",
       "triggerID" : "01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7652) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1081869988


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ee361b1bf6b9b68e11f84f2af76625b847669ed2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471) 
   * be08ba499bb88d8a00f20695b360336853be708e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1029330925


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 06c6dd9db383efa291c999d5f0140e5d2493eeaf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1032499400


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 06c6dd9db383efa291c999d5f0140e5d2493eeaf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709) 
   * 7920cb15d99cd92ea2a3e6bd515249eb63040772 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1029259900


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5c563ffa6625d610c9c6bd252457129ce5ccddc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619) 
   * 06c6dd9db383efa291c999d5f0140e5d2493eeaf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025425611


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) 
   * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1022322110


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 238b128260cab3ad11c8e00bd20871b45e112c83 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835755588



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java
##########
@@ -120,7 +120,7 @@ public HoodieBloomIndex(HoodieWriteConfig config, BaseHoodieBloomIndexHelper blo
     // Step 2: Load all involved files as <Partition, filename> pairs
     List<Pair<String, BloomIndexFileInfo>> fileInfoList;
     if (config.getBloomIndexPruneByRanges()) {
-      fileInfoList = (config.getMetadataConfig().isColumnStatsIndexEnabled()
+      fileInfoList = (config.isMetadataColumnStatsIndexEnabled()

Review comment:
       yes, and the vice-versa is also possible. If we check for both partitions, we may not be able to use colstats, e.g. what if bloom_filters is disabled but column_stats is enbaled, we still need to `lookupIndex` and want to prune by column ranges right. Checking for both partitions won't help them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1080546882


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 69071c6306ce336076aa6daa4337276990572ee4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368) 
   * 522a18caff448bcc9b127372d4526ee8f168f085 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835782128



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<String> partitionsToIndex = Arrays.asList(cfg.indexTypes.split(","));
+    List<MetadataPartitionType> partitionTypes = partitionsToIndex.stream()
+        .map(p -> MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)))
+        .collect(Collectors.toList());
+    if (cfg.indexInstantTime != null) {
+      client.scheduleClusteringAtInstant(cfg.indexInstantTime, Option.empty());

Review comment:
       this should be scheduleIndexing




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835764267



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);

Review comment:
       not needed here.. already taking lock while initializing file groups.. after that indexer can run conurrently with other writers, except for doing the indexing check in the end.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835762079



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +669,36 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    for (MetadataPartitionType partitionType : indexesToDrop) {
+      String partitionPath = partitionType.getPartitionPath();
+      if (inflightIndexes.contains(partitionPath)) {
+        LOG.error("Metadata indexing in progress: " + partitionPath);
+        return;
+      }
+      LOG.warn("Deleting Metadata Table partitions: " + partitionPath);
+      dataMetaClient.getFs().delete(new Path(metadataWriteConfig.getBasePath(), partitionPath), true);
+      completedIndexes.remove(partitionPath);
+    }
+    // update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), String.join(",", completedIndexes));

Review comment:
       > should we not first update the table config and then delete the partitions
   
   yes yes good catch! i did fix this, not sure if i missed while rebasing.
   
   > Other writes who are holding on to an in memory table property are not going to get an updated value if we update here.
   
   Your idea is good but waiting for a minute only reduces the probability of failure. 
   Also note that, index is being dropped within a lock. I think drop index is not something which user would do very frequently. 
   
   To support fully conurrent writes, I know mysql lazily drops the index i.e. simply mark the current index as deleted and physically delete later whenever no other writer isr referencing the index. We can do something similar. Tracking here https://issues.apache.org/jira/browse/HUDI-3718




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836572771



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;

Review comment:
       done. check `getRequestedPartitionTypes` method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077501985


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124) 
   * e58990e296aa5125807a4b96269fa7a06c885e69 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1084873773


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582",
       "triggerID" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "triggerType" : "PUSH"
     }, {
       "hash" : "01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fc9ac46f36a4df8d9d590845b9848d48af1f7cae Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582) 
   * 01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835779585



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize filegroups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.retainAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {
+      LOG.error("Following partitions already exist or inflight: " + requestedPartitions);
+      return Option.empty();
+    }
+    // get last completed instant
+    Option<HoodieInstant> indexUptoInstant = table.getActiveTimeline().getContiguousCompletedWriteTimeline().lastInstant();
+    if (indexUptoInstant.isPresent()) {
+      final HoodieInstant indexInstant = HoodieTimeline.getIndexRequestedInstant(instantTime);
+      // for each partitionToIndex add that time to the plan
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = partitionsToIndex.stream()
+          .map(p -> new HoodieIndexPartitionInfo(LATEST_INDEX_PLAN_VERSION, p.getPartitionPath(), indexUptoInstant.get().getTimestamp()))
+          .collect(Collectors.toList());
+      HoodieIndexPlan indexPlan = new HoodieIndexPlan(LATEST_INDEX_PLAN_VERSION, indexPartitionInfos);
+      try {
+        table.getActiveTimeline().saveToPendingIndexCommit(indexInstant, TimelineMetadataUtils.serializeIndexPlan(indexPlan));
+      } catch (IOException e) {
+        LOG.error("Error while saving index requested file", e);
+        throw new HoodieIOException(e.getMessage(), e);
+      }
+      table.getMetaClient().reloadActiveTimeline();
+
+      // start initializing filegroups
+      // 1. get metadata writer
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)

Review comment:
       please add a java doc here that if in case FILES partition itself was not initialize before (i.e. metadata was never enabled), this will initialize synchronously 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836008617



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -392,6 +398,12 @@ public void initTableMetadata() {
     }
 
     if (!exists) {
+      if (metadataWriteConfig.isMetadataAsyncIndex()) {

Review comment:
       will remove this conditional.. it is not needed here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r838119416



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.common.util.StringUtils.isNullOrEmpty;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    // indexing should be done only if metadata is enabled
+    if (!props.getBoolean(HoodieMetadataConfig.ENABLE.key())) {
+      LOG.error(String.format("Metadata is not enabled. Please set %s to true.", HoodieMetadataConfig.ENABLE.key()));
+      return -1;
+    }
+
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleResponse(client.index(cfg.indexInstantTime)) ? 0 : 1;
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleResponse(client.index(indexingInstantTime.get())) ? 0 : 1;
+      } else {
+        return -1;
+      }
+    }
+  }
+
+  private int dropIndex(JavaSparkContext jsc) throws Exception {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);

Review comment:
       makes sense. 
   Currently, bloom_filters partition is built only for record key so we have just one partition, so this works. However, i get your meta point. As suggested earlier, i think it would be better to refactor indexing APIs in a way such that it only asks what users would expect and then construct the configs, partition paths, etc. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r838114020



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.common.util.StringUtils.isNullOrEmpty;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    // indexing should be done only if metadata is enabled
+    if (!props.getBoolean(HoodieMetadataConfig.ENABLE.key())) {
+      LOG.error(String.format("Metadata is not enabled. Please set %s to true.", HoodieMetadataConfig.ENABLE.key()));
+      return -1;
+    }
+
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleResponse(client.index(cfg.indexInstantTime)) ? 0 : 1;
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleResponse(client.index(indexingInstantTime.get())) ? 0 : 1;
+      } else {
+        return -1;

Review comment:
       yeah it's the exit code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083688168


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 18b9acd3320e68ee6688ea4eec693676350a9e15 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575) 
   * fc9ac46f36a4df8d9d590845b9848d48af1f7cae UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1081869988


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ee361b1bf6b9b68e11f84f2af76625b847669ed2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471) 
   * be08ba499bb88d8a00f20695b360336853be708e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083358835


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 010de76ddd6c0201db746a13a5b04fc5e94125d4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520) 
   * a3ee4cd75320e578235cea4490ed7470bb721ea5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083690942


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582",
       "triggerID" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 18b9acd3320e68ee6688ea4eec693676350a9e15 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575) 
   * fc9ac46f36a4df8d9d590845b9848d48af1f7cae Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1082033950


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be08ba499bb88d8a00f20695b360336853be708e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1065111787


   > Good progress on this one. Getting close to being complete.
   
   Thanks @prashantwason for reviewing. I'll address your comments soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1067120688


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a9f8c1316b55b72c57d18fbe8d0c8103948a30bc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1067032662


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925) 
   * 5c1c7e91b5f530907cda50135fef8286ee8a8e38 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929) 
   * a9f8c1316b55b72c57d18fbe8d0c8103948a30bc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066751880


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4a036d809018043ed0d99adccbe0efdfd920284a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918) 
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077845502


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c9295eeaffb5e804ee6c636b8617f754af1492d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1029256806


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5c563ffa6625d610c9c6bd252457129ce5ccddc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619) 
   * 06c6dd9db383efa291c999d5f0140e5d2493eeaf UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1033224161


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7920cb15d99cd92ea2a3e6bd515249eb63040772 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1070989765


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d6ad6e1d8767d66b15b31bb06d1318fb08e582c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990) 
   * 680a99a669d9e2c2e81465efe8e491812e6c3012 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835757648



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -608,7 +624,7 @@ private void initializeEnabledFileGroups(HoodieTableMetaClient dataMetaClient, S
    * File groups will be named as :
    *    record-index-bucket-0000, .... -> ..., record-index-bucket-0009
    */
-  private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, MetadataPartitionType metadataPartition, String instantTime,
+  public void initializeFileGroups(HoodieTableMetaClient dataMetaClient, MetadataPartitionType metadataPartition, String instantTime,

Review comment:
       even with each partition could be instantiated at different physical times, the logical times (hudi instatnt timestamp) will be the same. Are you taking from rescheduing index pov? Anyway, i think we should check run the same checks as in `initializeIfNeeded` before initializing file groups.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835776515



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -608,7 +624,7 @@ private void initializeEnabledFileGroups(HoodieTableMetaClient dataMetaClient, S
    * File groups will be named as :
    *    record-index-bucket-0000, .... -> ..., record-index-bucket-0009
    */
-  private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, MetadataPartitionType metadataPartition, String instantTime,
+  public void initializeFileGroups(HoodieTableMetaClient dataMetaClient, MetadataPartitionType metadataPartition, String instantTime,

Review comment:
       I get it. 
   what happens if someone trigger hoodie Indexer even w/o enabling MDT for regular writers?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1080549439


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 69071c6306ce336076aa6daa4337276990572ee4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368) 
   * 522a18caff448bcc9b127372d4526ee8f168f085 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835767529



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and return
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        txnManager.beginTransaction();
+        table.getActiveTimeline().saveAsComplete(
+            new HoodieInstant(true, INDEX_ACTION, indexInstant.getTimestamp()),
+            TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+      } finally {
+        txnManager.endTransaction();
+      }
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline()
+        .getWriteTimeline()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().getWriteTimeline().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline()
+        .filterCompletedInstants()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    completedInstants.addAll(metaClient.getActiveTimeline().filterCompletedInstants().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCheckTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+    private final Set<String> metadataCompletedInstants;
+    private final HoodieTableMetaClient metaClient;
+
+    IndexingCheckTask(HoodieTableMetadataWriter metadataWriter,
+                      List<HoodieInstant> instantsToIndex,
+                      Set<String> metadataCompletedInstants,
+                      HoodieTableMetaClient metaClient) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+      this.metadataCompletedInstants = metadataCompletedInstants;
+      this.metaClient = metaClient;
+    }
+
+    @Override
+    public void run() {
+      while (!Thread.interrupted()) {
+        for (HoodieInstant instant : instantsToIndex) {
+          // metadata index already updated for this instant
+          if (metadataCompletedInstants.contains(instant.getTimestamp())) {
+            currentIndexedInstant = instant.getTimestamp();
+            continue;
+          }
+          while (!instant.isCompleted()) {
+            // reload timeline and fetch instant details again wait until timeout

Review comment:
       #L138




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835785470



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();

Review comment:
       also, data table archival depends on MDT compaction. we need to think through this :( 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836027799



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();

Review comment:
       for now, i'm just going to guard triggering of any compaction or cleaning (conservative strategy as you suggested). Also, see https://issues.apache.org/jira/browse/HUDI-2458 where it's discussed that archival in datatable has to have dependency on compaction in MDT. This PR is not going to change that. We need to think through a bit more and it will depend how the bahior changes after HUDI-2458.
   Let's jam on this and take it up in the next release? 
   cc @vinothchandar   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836585336



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<HoodieInstant> metadataCompletedTimeline = metadataMetaClient.getActiveTimeline()
+          .getCommitsTimeline().filterCompletedInstants().getInstants().collect(Collectors.toSet());
+      List<HoodieInstant> finalRemainingInstantsToIndex = remainingInstantsToIndex.map(
+          instant -> new HoodieInstant(HoodieInstant.State.COMPLETED, HoodieTimeline.DELTA_COMMIT_ACTION, instant.getTimestamp())
+      ).filter(instant -> !metadataCompletedTimeline.contains(instant)).collect(Collectors.toList());
+
+      // index all remaining instants with a timeout
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(new PostRequestIndexingTask(metadataWriter, finalRemainingInstantsToIndex));
+      try {
+        // TODO: configure timeout
+        postRequestIndexingTaskFuture.get(60, TimeUnit.SECONDS);

Review comment:
       right now, configured it to be 5 minutes by default. i did 10 small deltastreamer commits (12 columns, 1000 records in each round) using ksql-datagen and it was fine. I understand this could be time-consuming. I'll run a scale test later and try to figure out a better default value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836577335



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -112,6 +114,25 @@ public HoodieDefaultTimeline getWriteTimeline() {
     return new HoodieDefaultTimeline(instants.stream().filter(s -> validActions.contains(s.getAction())), details);
   }
 
+  @Override
+  public HoodieDefaultTimeline getContiguousCompletedWriteTimeline() {

Review comment:
       added UT in TestHoodieActiveTimeline




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083688168


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 18b9acd3320e68ee6688ea4eec693676350a9e15 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575) 
   * fc9ac46f36a4df8d9d590845b9848d48af1f7cae UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1082166163


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be08ba499bb88d8a00f20695b360336853be708e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511) 
   * 010de76ddd6c0201db746a13a5b04fc5e94125d4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1082169589


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be08ba499bb88d8a00f20695b360336853be708e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511) 
   * 010de76ddd6c0201db746a13a5b04fc5e94125d4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083533723


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a3ee4cd75320e578235cea4490ed7470bb721ea5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077657869


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283) 
   * ca6f4c73d40497413fd38b6edd7fbf1de9b50cac Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077654508


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283) 
   * ca6f4c73d40497413fd38b6edd7fbf1de9b50cac UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1078814461


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d02e0c2ca65038f88ae753484dcb2642ef789f27 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1079375166


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 69071c6306ce336076aa6daa4337276990572ee4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1073760182


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077501985


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124) 
   * e58990e296aa5125807a4b96269fa7a06c885e69 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025615583


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5c563ffa6625d610c9c6bd252457129ce5ccddc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1065111216


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7920cb15d99cd92ea2a3e6bd515249eb63040772 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801) 
   * e6e3e1612928fb0892d071ec4c3a26e31ce1ff76 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1065194900


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e6e3e1612928fb0892d071ec4c3a26e31ce1ff76 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066754388


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4a036d809018043ed0d99adccbe0efdfd920284a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918) 
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1067032662


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925) 
   * 5c1c7e91b5f530907cda50135fef8286ee8a8e38 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929) 
   * a9f8c1316b55b72c57d18fbe8d0c8103948a30bc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on a change in pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
manojpec commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r797380164



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -588,10 +609,87 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
-      commit(engineContext.parallelize(records, 1), MetadataPartitionType.FILES.partitionPath(), instantTime, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        List<HoodieRecord> records = convertMetadataFunction.convertMetadata();

Review comment:
       This might work for FILES partition. But, bloom filters and columns stats partition initialization can be huge set of records and can blow up the driver memory. I hit this problem as well. We need to make use of the context to build this HoodieData<HoodieRecord> list and then commit the RDD via commit call. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -588,10 +609,87 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
-      commit(engineContext.parallelize(records, 1), MetadataPartitionType.FILES.partitionPath(), instantTime, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        List<HoodieRecord> records = convertMetadataFunction.convertMetadata();

Review comment:
       And, calling commit serially for each partition one after the other might not be ideal. They are all different paritions under the same table and we should be able to commit across all in one shot via HoodieData. https://github.com/apache/hudi/pull/4352 take care of this. We can discuss more.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -588,10 +609,87 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
-      commit(engineContext.parallelize(records, 1), MetadataPartitionType.FILES.partitionPath(), instantTime, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
+        commit(engineContext.parallelize(records, 1), p, instantTime, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();

Review comment:
       But, FS check is not fool proof. Parition directory might exists well before the full initialization right? Or, we take care of the partition dir renamed/exists only after successful initialization?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -855,6 +855,17 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+  public Option<String> scheduleIndexing(List<String> partitions) {

Review comment:
       Should the arg be from the metadata table partition enum types? To force only valid/supported ones can be indexed?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -460,6 +475,7 @@ private boolean bootstrapFromFilesystem(HoodieEngineContext engineContext, Hoodi
         .initTable(hadoopConf.get(), metadataWriteConfig.getBasePath());
 
     initTableMetadata();
+    // TODO: make it generic for all enabled partition types

Review comment:
       This is taken care in https://github.com/apache/hudi/pull/4352

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -588,10 +609,87 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
-      commit(engineContext.parallelize(records, 1), MetadataPartitionType.FILES.partitionPath(), instantTime, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
+        commit(engineContext.parallelize(records, 1), p, instantTime, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();
+      }
     }
+    // TODO: return only enabled partitions
+    return MetadataPartitionType.all();
+  }
+
+  private List<String> getExistingMetadataPartitions() {
+    return MetadataPartitionType.all().stream()
+        .filter(p -> {
+          try {
+            // TODO: avoid fs.exists() check
+            return metadataMetaClient.getFs().exists(FSUtils.getPartitionPath(metadataWriteConfig.getBasePath(), p));

Review comment:
       See comment above. We can return the partitions only after it is fully initialized. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1068719900


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a9f8c1316b55b72c57d18fbe8d0c8103948a30bc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930) 
   * 0d6ad6e1d8767d66b15b31bb06d1318fb08e582c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083940266


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582",
       "triggerID" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fc9ac46f36a4df8d9d590845b9848d48af1f7cae Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083647797


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 18b9acd3320e68ee6688ea4eec693676350a9e15 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083579784


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a3ee4cd75320e578235cea4490ed7470bb721ea5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570) 
   * 18b9acd3320e68ee6688ea4eec693676350a9e15 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839822034



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      catchupWithInflightWriters(metadataWriter, instantsToCatchup, metadataMetaClient, metadataCompletedTimestamps);
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      updateTableConfigAndTimeline(indexInstant, finalIndexPartitionInfos, indexCommitMetadata);
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      // abort gracefully
+      abort(indexInstant, indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private void abort(HoodieInstant indexInstant, Set<String> requestedPartitions) {
+    Set<String> inflightPartitions = getInflightMetadataPartitions(table.getMetaClient().getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(table.getMetaClient().getTableConfig());
+    // delete metadata partition
+    requestedPartitions.forEach(partition -> {

Review comment:
       there are multipe points of failure here.. to reduce the blast radius we will make changes to the table config first because after this patch we mostly rely on table configs. additionally, we need more cli commands to allow users to recover easily.. tracking in HUDI-3753




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066897806


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1070989765


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d6ad6e1d8767d66b15b31bb06d1318fb08e582c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990) 
   * 680a99a669d9e2c2e81465efe8e491812e6c3012 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1067036039


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5c1c7e91b5f530907cda50135fef8286ee8a8e38 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929) 
   * a9f8c1316b55b72c57d18fbe8d0c8103948a30bc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066939072


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925) 
   * 5c1c7e91b5f530907cda50135fef8286ee8a8e38 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066751880


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4a036d809018043ed0d99adccbe0efdfd920284a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918) 
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1070989765






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077504732


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124) 
   * e58990e296aa5125807a4b96269fa7a06c885e69 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836028462



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and return
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        txnManager.beginTransaction();
+        table.getActiveTimeline().saveAsComplete(

Review comment:
       table config is only being updated while creating or dropping index. Regular writers only read and won't update the table config. However, if we think of a scenario when users starts an indexer job for column_stats and separate job for bloom_filter partition, then I think locking would be necessary. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077845502


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c9295eeaffb5e804ee6c636b8617f754af1492d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1078598500


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c9295eeaffb5e804ee6c636b8617f754af1492d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298) 
   * d02e0c2ca65038f88ae753484dcb2642ef789f27 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077654508


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283) 
   * ca6f4c73d40497413fd38b6edd7fbf1de9b50cac UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077657869


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283) 
   * ca6f4c73d40497413fd38b6edd7fbf1de9b50cac Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1029330925


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 06c6dd9db383efa291c999d5f0140e5d2493eeaf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1080670921


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 522a18caff448bcc9b127372d4526ee8f168f085 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836580660



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))

Review comment:
       oops, missed this one! will add in a subsequent commit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835722655



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -915,6 +917,39 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+  public Option<String> scheduleIndexing(List<MetadataPartitionType> partitionTypes) {
+    String instantTime = HoodieActiveTimeline.createNewInstantTime();
+    return scheduleIndexingAtInstant(partitionTypes, instantTime) ? Option.of(instantTime) : Option.empty();
+  }
+
+  private boolean scheduleIndexingAtInstant(List<MetadataPartitionType> partitionTypes, String instantTime) throws HoodieIOException {
+    Option<HoodieIndexPlan> indexPlan = createTable(config, hadoopConf, config.isMetadataTableEnabled())
+        .scheduleIndex(context, instantTime, partitionTypes);
+    return indexPlan.isPresent();
+  }
+
+  public Option<HoodieIndexCommitMetadata> index(String indexInstantTime) {

Review comment:
       java docs for all public apis

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))

Review comment:
       can we expose a single api in tableConfig for 
   getInflightAndCompleteMetadataIndexes

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -175,6 +182,25 @@
           .sinceVersion("0.11.0")
           .withDocumentation("Parallelism to use, when generating column stats index.");
 
+  public static final ConfigProperty<String> COLUMN_STATS_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.column.stats.for.columns")
+      .defaultValue("")
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of columns for which column stats index will be built.");
+
+  public static final ConfigProperty<String> BLOOM_FILTER_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.bloom.filter.for.columns")
+      .defaultValue("")
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of columns for which bloom filter index will be built.");

Review comment:
       can we enhance docs to say whats the default behavior

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +669,36 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    for (MetadataPartitionType partitionType : indexesToDrop) {
+      String partitionPath = partitionType.getPartitionPath();
+      if (inflightIndexes.contains(partitionPath)) {
+        LOG.error("Metadata indexing in progress: " + partitionPath);
+        return;
+      }
+      LOG.warn("Deleting Metadata Table partitions: " + partitionPath);
+      dataMetaClient.getFs().delete(new Path(metadataWriteConfig.getBasePath(), partitionPath), true);
+      completedIndexes.remove(partitionPath);
+    }
+    // update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), String.join(",", completedIndexes));

Review comment:
       also, what incase the process crashes in between L684 and L688. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -112,6 +114,25 @@ public HoodieDefaultTimeline getWriteTimeline() {
     return new HoodieDefaultTimeline(instants.stream().filter(s -> validActions.contains(s.getAction())), details);
   }
 
+  @Override
+  public HoodieDefaultTimeline getContiguousCompletedWriteTimeline() {

Review comment:
       can we do something like this
   ```
    HoodieInstant earliestInflight = getWriteTimeline().filterInflightsAndRequested().firstInstant().get().getTimestamp();
    getWriteTimeline().filterCompletedInstants().filter( instant -> HoodieTimeline.compareTimestamps(instant.getTimestamp(), LESSER_THAN, earliestInflight));
   ```
   

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -915,6 +917,39 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+  public Option<String> scheduleIndexing(List<MetadataPartitionType> partitionTypes) {

Review comment:
       java docs please for all public apis

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -915,6 +917,39 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+  public Option<String> scheduleIndexing(List<MetadataPartitionType> partitionTypes) {
+    String instantTime = HoodieActiveTimeline.createNewInstantTime();
+    return scheduleIndexingAtInstant(partitionTypes, instantTime) ? Option.of(instantTime) : Option.empty();
+  }
+
+  private boolean scheduleIndexingAtInstant(List<MetadataPartitionType> partitionTypes, String instantTime) throws HoodieIOException {
+    Option<HoodieIndexPlan> indexPlan = createTable(config, hadoopConf, config.isMetadataTableEnabled())

Review comment:
       what happens if someone tries to trigger indexing twice? I expect we would fail the 2nd trigger conveying that already an indexing is in progress

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +669,36 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    for (MetadataPartitionType partitionType : indexesToDrop) {
+      String partitionPath = partitionType.getPartitionPath();
+      if (inflightIndexes.contains(partitionPath)) {
+        LOG.error("Metadata indexing in progress: " + partitionPath);

Review comment:
       can't we drop an index while its being built? what incase user wants to abort the index building ? 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java
##########
@@ -120,7 +120,7 @@ public HoodieBloomIndex(HoodieWriteConfig config, BaseHoodieBloomIndexHelper blo
     // Step 2: Load all involved files as <Partition, filename> pairs
     List<Pair<String, BloomIndexFileInfo>> fileInfoList;
     if (config.getBloomIndexPruneByRanges()) {
-      fileInfoList = (config.getMetadataConfig().isColumnStatsIndexEnabled()
+      fileInfoList = (config.isMetadataColumnStatsIndexEnabled()

Review comment:
       is it possible to enable just bloom index partition and disable col stats ? should we check for both partitions here? 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +669,36 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    for (MetadataPartitionType partitionType : indexesToDrop) {
+      String partitionPath = partitionType.getPartitionPath();
+      if (inflightIndexes.contains(partitionPath)) {
+        LOG.error("Metadata indexing in progress: " + partitionPath);
+        return;
+      }
+      LOG.warn("Deleting Metadata Table partitions: " + partitionPath);
+      dataMetaClient.getFs().delete(new Path(metadataWriteConfig.getBasePath(), partitionPath), true);
+      completedIndexes.remove(partitionPath);
+    }
+    // update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), String.join(",", completedIndexes));

Review comment:
       should we not first update the table config and then delete the partitions. what incase at L 686, another writer checks table props and tries to update the index. It will fail since we have deleted the partition at L 684. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -208,6 +208,18 @@
       .sinceVersion("0.11.0")
       .withDocumentation("Table checksum is used to guard against partial writes in HDFS. It is added as the last entry in hoodie.properties and then used to validate while reading table config.");
 
+  public static final ConfigProperty<String> TABLE_METADATA_INDEX_INFLIGHT = ConfigProperty
+      .key("hoodie.table.metadata.index.inflight")
+      .noDefaultValue()
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of metadata partitions whose indexing is in progress.");
+
+  public static final ConfigProperty<String> TABLE_METADATA_INDEX_COMPLETED = ConfigProperty
+      .key("hoodie.table.metadata.index.completed")

Review comment:
       same here

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -915,6 +917,39 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+  public Option<String> scheduleIndexing(List<MetadataPartitionType> partitionTypes) {
+    String instantTime = HoodieActiveTimeline.createNewInstantTime();
+    return scheduleIndexingAtInstant(partitionTypes, instantTime) ? Option.of(instantTime) : Option.empty();
+  }
+
+  private boolean scheduleIndexingAtInstant(List<MetadataPartitionType> partitionTypes, String instantTime) throws HoodieIOException {
+    Option<HoodieIndexPlan> indexPlan = createTable(config, hadoopConf, config.isMetadataTableEnabled())
+        .scheduleIndex(context, instantTime, partitionTypes);
+    return indexPlan.isPresent();
+  }
+
+  public Option<HoodieIndexCommitMetadata> index(String indexInstantTime) {
+    return createTable(config, hadoopConf, config.isMetadataTableEnabled()).index(context, indexInstantTime);
+  }
+
+  public void dropIndex(List<MetadataPartitionType> partitionTypes) {
+    HoodieTable table = createTable(config, hadoopConf);
+    String dropInstant = HoodieActiveTimeline.createNewInstantTime();
+    this.txnManager.beginTransaction();
+    try {
+      context.setJobStatus(this.getClass().getSimpleName(), "Dropping partitions from metadata table");
+      table.getMetadataWriter(dropInstant).ifPresent(w -> {
+        try {
+          ((HoodieTableMetadataWriter) w).dropIndex(partitionTypes);
+        } catch (IOException e) {
+          LOG.error("Failed to drop metadata index. ", e);

Review comment:
       can we throw here. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    partitionsToUpdate.addAll(Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    if (!partitionsToUpdate.isEmpty()) {
+      return partitionsToUpdate;
     }
+    // fallback to update files partition only if table config returned no partitions
+    partitionsToUpdate.add(MetadataPartitionType.FILES.getPartitionPath());
+    return partitionsToUpdate;
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    if (indexPartitionInfos.isEmpty()) {
+      LOG.warn("No partition to index in the plan");
+      return;
+    }
+    String indexUptoInstantTime = indexPartitionInfos.get(0).getIndexUptoInstant();
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        // filegroup should have already been initialized while scheduling index for this partition
+        if (!dataMetaClient.getFs().exists(new Path(metadataWriteConfig.getBasePath(), relativePartitionPath))) {
+          throw new HoodieIndexException(String.format("File group not initialized for metadata partition: %s, indexUptoInstant: %s. Looks like index scheduling failed!",
+              relativePartitionPath, indexUptoInstantTime));
+        }
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to check whether file group is initialized for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // return early and populate enabledPartitionTypes correctly (check in initialCommit)
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT));
+      if (!enabledPartitionTypes.contains(partitionType)) {
+        throw new HoodieIndexException(String.format("Indexing for metadata partition: %s is not enabled", partitionType));
+      }
+    });
+    // before initial commit update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), indexPartitionInfos.stream()
+        .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.joining(",")));
+    HoodieTableConfig.update(dataMetaClient.getFs(), new Path(dataMetaClient.getMetaPath()), dataMetaClient.getTableConfig().getProps());
+    // check here for enabled partition types whether filegroups initialized or not
+    initialCommit(indexUptoInstantTime);
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), "");

Review comment:
       lets use constant for empty string. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    partitionsToUpdate.addAll(Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    if (!partitionsToUpdate.isEmpty()) {
+      return partitionsToUpdate;
     }
+    // fallback to update files partition only if table config returned no partitions
+    partitionsToUpdate.add(MetadataPartitionType.FILES.getPartitionPath());
+    return partitionsToUpdate;
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    if (indexPartitionInfos.isEmpty()) {
+      LOG.warn("No partition to index in the plan");
+      return;
+    }
+    String indexUptoInstantTime = indexPartitionInfos.get(0).getIndexUptoInstant();
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        // filegroup should have already been initialized while scheduling index for this partition
+        if (!dataMetaClient.getFs().exists(new Path(metadataWriteConfig.getBasePath(), relativePartitionPath))) {
+          throw new HoodieIndexException(String.format("File group not initialized for metadata partition: %s, indexUptoInstant: %s. Looks like index scheduling failed!",
+              relativePartitionPath, indexUptoInstantTime));
+        }
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to check whether file group is initialized for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // return early and populate enabledPartitionTypes correctly (check in initialCommit)
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT));
+      if (!enabledPartitionTypes.contains(partitionType)) {
+        throw new HoodieIndexException(String.format("Indexing for metadata partition: %s is not enabled", partitionType));
+      }
+    });
+    // before initial commit update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), indexPartitionInfos.stream()

Review comment:
       do you think we should append new entries to already existing value in table config here? 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {

Review comment:
       do we need to add this check else where for regular writers? I mean, what incase user adds these configs just for the async indexer process, but misses to add them to regular writers? 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -112,6 +114,25 @@ public HoodieDefaultTimeline getWriteTimeline() {
     return new HoodieDefaultTimeline(instants.stream().filter(s -> validActions.contains(s.getAction())), details);
   }
 
+  @Override
+  public HoodieDefaultTimeline getContiguousCompletedWriteTimeline() {

Review comment:
       do we have UTs for these

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -208,6 +208,18 @@
       .sinceVersion("0.11.0")
       .withDocumentation("Table checksum is used to guard against partial writes in HDFS. It is added as the last entry in hoodie.properties and then used to validate while reading table config.");
 
+  public static final ConfigProperty<String> TABLE_METADATA_INDEX_INFLIGHT = ConfigProperty
+      .key("hoodie.table.metadata.index.inflight")

Review comment:
       should we make it plural.
   ...metadata.indexes.inflight 
   or 
   ... metadata.indices.inflight 
   

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    partitionsToUpdate.addAll(Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    if (!partitionsToUpdate.isEmpty()) {
+      return partitionsToUpdate;
     }
+    // fallback to update files partition only if table config returned no partitions
+    partitionsToUpdate.add(MetadataPartitionType.FILES.getPartitionPath());
+    return partitionsToUpdate;
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    if (indexPartitionInfos.isEmpty()) {
+      LOG.warn("No partition to index in the plan");
+      return;
+    }
+    String indexUptoInstantTime = indexPartitionInfos.get(0).getIndexUptoInstant();
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        // filegroup should have already been initialized while scheduling index for this partition
+        if (!dataMetaClient.getFs().exists(new Path(metadataWriteConfig.getBasePath(), relativePartitionPath))) {
+          throw new HoodieIndexException(String.format("File group not initialized for metadata partition: %s, indexUptoInstant: %s. Looks like index scheduling failed!",
+              relativePartitionPath, indexUptoInstantTime));
+        }
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to check whether file group is initialized for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // return early and populate enabledPartitionTypes correctly (check in initialCommit)
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT));
+      if (!enabledPartitionTypes.contains(partitionType)) {
+        throw new HoodieIndexException(String.format("Indexing for metadata partition: %s is not enabled", partitionType));
+      }
+    });
+    // before initial commit update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), indexPartitionInfos.stream()
+        .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.joining(",")));
+    HoodieTableConfig.update(dataMetaClient.getFs(), new Path(dataMetaClient.getMetaPath()), dataMetaClient.getTableConfig().getProps());
+    // check here for enabled partition types whether filegroups initialized or not
+    initialCommit(indexUptoInstantTime);
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), "");
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), indexPartitionInfos.stream()

Review comment:
       not sure if this is the right place to update table config. I was expecting we will update it towards the end after doing the catch up and ensuring all commits are caught up. Lets sync up f2f on this. may be I am missing something. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -175,6 +182,25 @@
           .sinceVersion("0.11.0")
           .withDocumentation("Parallelism to use, when generating column stats index.");
 
+  public static final ConfigProperty<String> COLUMN_STATS_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.column.stats.for.columns")
+      .defaultValue("")
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of columns for which column stats index will be built.");

Review comment:
       can we enhance docs to say whats default behavior. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize filegroups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.retainAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {
+      LOG.error("Following partitions already exist or inflight: " + requestedPartitions);
+      return Option.empty();
+    }
+    // get last completed instant
+    Option<HoodieInstant> indexUptoInstant = table.getActiveTimeline().getContiguousCompletedWriteTimeline().lastInstant();
+    if (indexUptoInstant.isPresent()) {
+      final HoodieInstant indexInstant = HoodieTimeline.getIndexRequestedInstant(instantTime);
+      // for each partitionToIndex add that time to the plan
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = partitionsToIndex.stream()
+          .map(p -> new HoodieIndexPartitionInfo(LATEST_INDEX_PLAN_VERSION, p.getPartitionPath(), indexUptoInstant.get().getTimestamp()))
+          .collect(Collectors.toList());
+      HoodieIndexPlan indexPlan = new HoodieIndexPlan(LATEST_INDEX_PLAN_VERSION, indexPartitionInfos);
+      try {
+        table.getActiveTimeline().saveToPendingIndexCommit(indexInstant, TimelineMetadataUtils.serializeIndexPlan(indexPlan));
+      } catch (IOException e) {
+        LOG.error("Error while saving index requested file", e);
+        throw new HoodieIOException(e.getMessage(), e);
+      }
+      table.getMetaClient().reloadActiveTimeline();
+
+      // start initializing filegroups
+      // 1. get metadata writer
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to initialize filegroups for indexing for instant: %s", instantTime)));
+      // 2. take a lock --> begin tx (data table)
+      try {
+        this.txnManager.beginTransaction(Option.of(indexInstant), Option.empty());
+        // 3. initialize filegroups as per plan for the enabled partition types
+        metadataWriter.scheduleIndex(table.getMetaClient(), partitionsToIndex, indexInstant.getTimestamp());
+      } catch (IOException e) {
+        LOG.error("Could not initialize file groups");

Review comment:
       can we suffix exception to the error log. 
   LOG.error("Could not initialize file groups", e);

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize filegroups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.retainAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {
+      LOG.error("Following partitions already exist or inflight: " + requestedPartitions);
+      return Option.empty();
+    }
+    // get last completed instant
+    Option<HoodieInstant> indexUptoInstant = table.getActiveTimeline().getContiguousCompletedWriteTimeline().lastInstant();
+    if (indexUptoInstant.isPresent()) {
+      final HoodieInstant indexInstant = HoodieTimeline.getIndexRequestedInstant(instantTime);
+      // for each partitionToIndex add that time to the plan
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = partitionsToIndex.stream()
+          .map(p -> new HoodieIndexPartitionInfo(LATEST_INDEX_PLAN_VERSION, p.getPartitionPath(), indexUptoInstant.get().getTimestamp()))
+          .collect(Collectors.toList());
+      HoodieIndexPlan indexPlan = new HoodieIndexPlan(LATEST_INDEX_PLAN_VERSION, indexPartitionInfos);
+      try {
+        table.getActiveTimeline().saveToPendingIndexCommit(indexInstant, TimelineMetadataUtils.serializeIndexPlan(indexPlan));
+      } catch (IOException e) {
+        LOG.error("Error while saving index requested file", e);
+        throw new HoodieIOException(e.getMessage(), e);
+      }
+      table.getMetaClient().reloadActiveTimeline();
+
+      // start initializing filegroups
+      // 1. get metadata writer
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to initialize filegroups for indexing for instant: %s", instantTime)));
+      // 2. take a lock --> begin tx (data table)
+      try {
+        this.txnManager.beginTransaction(Option.of(indexInstant), Option.empty());
+        // 3. initialize filegroups as per plan for the enabled partition types
+        metadataWriter.scheduleIndex(table.getMetaClient(), partitionsToIndex, indexInstant.getTimestamp());

Review comment:
       I thought we discussed that file groups should be initialized before adding the requested instant to timeline. So that, whenever any writer sees a new index being built, the file groups are already built out. but here I see we create the requested meta file in L 112 and then initialize file groups here. can you help me understand please

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);

Review comment:
       lets name the getter with units. 
   config.getIndexingCheckTimeoutSecs()

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<String> partitionsToIndex = Arrays.asList(cfg.indexTypes.split(","));
+    List<MetadataPartitionType> partitionTypes = partitionsToIndex.stream()
+        .map(p -> MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)))
+        .collect(Collectors.toList());
+    if (cfg.indexInstantTime != null) {
+      client.scheduleClusteringAtInstant(cfg.indexInstantTime, Option.empty());
+      return Option.of(cfg.indexInstantTime);
+    }
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (StringUtils.isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()

Review comment:
       again. what incase there are two processes which scheduled index building for two diff partitions. I feel runIndexing should take in the list of partitions to be built and fetch the earliest instant pertaining to that ?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -621,8 +635,14 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
 
     LOG.info(String.format("Creating %d file groups for partition %s with base fileId %s at instant time %s",
         fileGroupCount, metadataPartition.getPartitionPath(), metadataPartition.getFileIdPrefix(), instantTime));
+    HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+    List<FileSlice> fileSlices = HoodieTableMetadataUtil.getPartitionLatestFileSlices(metadataMetaClient, Option.ofNullable(fsView), metadataPartition.getPartitionPath());
     for (int i = 0; i < fileGroupCount; ++i) {
       final String fileGroupFileId = String.format("%s%04d", metadataPartition.getFileIdPrefix(), i);
+      // if a writer or async indexer had already initialized the filegroup then continue
+      if (!fileSlices.isEmpty() && fileSlices.stream().anyMatch(fileSlice -> fileGroupFileId.equals(fileSlice.getFileGroupId().getFileId()))) {

Review comment:
       with latest code, I guess fileGroups are initialized by the index schedule. So, regular writers by the time they see a new index being built, file groups should have been instantiated. given this, can you help me understand how this scenario could be possible i.e. already someone initialized a file group, but this process again calls initializeFileGroups. Ideally initializeFileGroups should be called just once per MDT partition right ?or am I missing something. 

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<String> partitionsToIndex = Arrays.asList(cfg.indexTypes.split(","));
+    List<MetadataPartitionType> partitionTypes = partitionsToIndex.stream()
+        .map(p -> MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)))
+        .collect(Collectors.toList());
+    if (cfg.indexInstantTime != null) {

Review comment:
       !StringUtils.isNullOrEmpty()

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and return
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        txnManager.beginTransaction();
+        table.getActiveTimeline().saveAsComplete(
+            new HoodieInstant(true, INDEX_ACTION, indexInstant.getTimestamp()),
+            TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+      } finally {
+        txnManager.endTransaction();
+      }
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline()
+        .getWriteTimeline()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().getWriteTimeline().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline()
+        .filterCompletedInstants()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    completedInstants.addAll(metaClient.getActiveTimeline().filterCompletedInstants().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCheckTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+    private final Set<String> metadataCompletedInstants;
+    private final HoodieTableMetaClient metaClient;
+
+    IndexingCheckTask(HoodieTableMetadataWriter metadataWriter,
+                      List<HoodieInstant> instantsToIndex,
+                      Set<String> metadataCompletedInstants,
+                      HoodieTableMetaClient metaClient) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+      this.metadataCompletedInstants = metadataCompletedInstants;
+      this.metaClient = metaClient;
+    }
+
+    @Override
+    public void run() {
+      while (!Thread.interrupted()) {
+        for (HoodieInstant instant : instantsToIndex) {
+          // metadata index already updated for this instant
+          if (metadataCompletedInstants.contains(instant.getTimestamp())) {
+            currentIndexedInstant = instant.getTimestamp();
+            continue;
+          }
+          while (!instant.isCompleted()) {
+            // reload timeline and fetch instant details again wait until timeout

Review comment:
       may I know where are we waiting ?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java
##########
@@ -19,17 +19,28 @@
 package org.apache.hudi.metadata;
 
 import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
 import org.apache.hudi.avro.model.HoodieRestoreMetadata;
 import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
 import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
 
+import java.io.IOException;
 import java.io.Serializable;
+import java.util.List;
 
 /**
  * Interface that supports updating metadata for a given table, as actions complete.
  */
 public interface HoodieTableMetadataWriter extends Serializable, AutoCloseable {
 
+  void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos);

Review comment:
       how about "buildIndex" instead of "index"

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);

Review comment:
       I was expecting us to throw/fail here. may I know why are we proceeding further ?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and return
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        txnManager.beginTransaction();
+        table.getActiveTimeline().saveAsComplete(
+            new HoodieInstant(true, INDEX_ACTION, indexInstant.getTimestamp()),
+            TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+      } finally {
+        txnManager.endTransaction();
+      }
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline()
+        .getWriteTimeline()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().getWriteTimeline().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline()
+        .filterCompletedInstants()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    completedInstants.addAll(metaClient.getActiveTimeline().filterCompletedInstants().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCheckTask implements Runnable {

Review comment:
       maybe we can name "IndexingCatchupTask"

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    partitionsToUpdate.addAll(Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    if (!partitionsToUpdate.isEmpty()) {
+      return partitionsToUpdate;
     }
+    // fallback to update files partition only if table config returned no partitions
+    partitionsToUpdate.add(MetadataPartitionType.FILES.getPartitionPath());
+    return partitionsToUpdate;
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    if (indexPartitionInfos.isEmpty()) {
+      LOG.warn("No partition to index in the plan");
+      return;
+    }
+    String indexUptoInstantTime = indexPartitionInfos.get(0).getIndexUptoInstant();
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        // filegroup should have already been initialized while scheduling index for this partition
+        if (!dataMetaClient.getFs().exists(new Path(metadataWriteConfig.getBasePath(), relativePartitionPath))) {
+          throw new HoodieIndexException(String.format("File group not initialized for metadata partition: %s, indexUptoInstant: %s. Looks like index scheduling failed!",
+              relativePartitionPath, indexUptoInstantTime));
+        }
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to check whether file group is initialized for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // return early and populate enabledPartitionTypes correctly (check in initialCommit)
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT));
+      if (!enabledPartitionTypes.contains(partitionType)) {
+        throw new HoodieIndexException(String.format("Indexing for metadata partition: %s is not enabled", partitionType));
+      }
+    });
+    // before initial commit update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), indexPartitionInfos.stream()

Review comment:
       what incase diff processes are started to build index for different partitions. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +669,36 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    for (MetadataPartitionType partitionType : indexesToDrop) {
+      String partitionPath = partitionType.getPartitionPath();
+      if (inflightIndexes.contains(partitionPath)) {
+        LOG.error("Metadata indexing in progress: " + partitionPath);
+        return;
+      }
+      LOG.warn("Deleting Metadata Table partitions: " + partitionPath);
+      dataMetaClient.getFs().delete(new Path(metadataWriteConfig.getBasePath(), partitionPath), true);
+      completedIndexes.remove(partitionPath);
+    }
+    // update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), String.join(",", completedIndexes));

Review comment:
       something to think about. Other writes who are holding on to an in memory table property are not going to get an updated value if we update here. So, won't they end up failing. i.e. lets say when they read the table properties, index was available, but when they are about to write to metadata partitions, lets say it was deleted. 
   
   What we need is some kind of notification trigger. but these are completely diff process altogether. So, I don't think there is any easy way. One simple (ugly) way we could do is, after updating the table config, wait for 1 min and then delete the MDT partitions here. So, we give 1 min for other writers to get the updated table config. 
   
   

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    partitionsToUpdate.addAll(Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    if (!partitionsToUpdate.isEmpty()) {
+      return partitionsToUpdate;
     }
+    // fallback to update files partition only if table config returned no partitions
+    partitionsToUpdate.add(MetadataPartitionType.FILES.getPartitionPath());
+    return partitionsToUpdate;
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    if (indexPartitionInfos.isEmpty()) {
+      LOG.warn("No partition to index in the plan");
+      return;
+    }
+    String indexUptoInstantTime = indexPartitionInfos.get(0).getIndexUptoInstant();
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        // filegroup should have already been initialized while scheduling index for this partition
+        if (!dataMetaClient.getFs().exists(new Path(metadataWriteConfig.getBasePath(), relativePartitionPath))) {
+          throw new HoodieIndexException(String.format("File group not initialized for metadata partition: %s, indexUptoInstant: %s. Looks like index scheduling failed!",
+              relativePartitionPath, indexUptoInstantTime));
+        }
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to check whether file group is initialized for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // return early and populate enabledPartitionTypes correctly (check in initialCommit)
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT));
+      if (!enabledPartitionTypes.contains(partitionType)) {
+        throw new HoodieIndexException(String.format("Indexing for metadata partition: %s is not enabled", partitionType));
+      }
+    });
+    // before initial commit update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), indexPartitionInfos.stream()
+        .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.joining(",")));
+    HoodieTableConfig.update(dataMetaClient.getFs(), new Path(dataMetaClient.getMetaPath()), dataMetaClient.getTableConfig().getProps());
+    // check here for enabled partition types whether filegroups initialized or not
+    initialCommit(indexUptoInstantTime);
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), "");
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), indexPartitionInfos.stream()

Review comment:
       also, generally think that, there could be different processes building diff indexes. So, when updating table config, we should append only current index partitions. and when clearing any configs, again we should clean up only partitions pertaining to current process. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and return
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        txnManager.beginTransaction();
+        table.getActiveTimeline().saveAsComplete(
+            new HoodieInstant(true, INDEX_ACTION, indexInstant.getTimestamp()),
+            TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+      } finally {
+        txnManager.endTransaction();
+      }
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline()
+        .getWriteTimeline()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().getWriteTimeline().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline()
+        .filterCompletedInstants()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    completedInstants.addAll(metaClient.getActiveTimeline().filterCompletedInstants().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCheckTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+    private final Set<String> metadataCompletedInstants;
+    private final HoodieTableMetaClient metaClient;
+
+    IndexingCheckTask(HoodieTableMetadataWriter metadataWriter,
+                      List<HoodieInstant> instantsToIndex,
+                      Set<String> metadataCompletedInstants,
+                      HoodieTableMetaClient metaClient) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+      this.metadataCompletedInstants = metadataCompletedInstants;
+      this.metaClient = metaClient;
+    }
+
+    @Override
+    public void run() {
+      while (!Thread.interrupted()) {
+        for (HoodieInstant instant : instantsToIndex) {
+          // metadata index already updated for this instant
+          if (metadataCompletedInstants.contains(instant.getTimestamp())) {
+            currentIndexedInstant = instant.getTimestamp();
+            continue;
+          }
+          while (!instant.isCompleted()) {
+            // reload timeline and fetch instant details again wait until timeout
+            String instantTime = instant.getTimestamp();
+            Option<HoodieInstant> currentInstant = metaClient.reloadActiveTimeline()
+                .filterCompletedInstants().filter(i -> i.getTimestamp().equals(instantTime)).firstInstant();
+            instant = currentInstant.orElse(instant);
+          }
+          // update metadata for this completed instant
+          if (COMPLETED.equals(instant.getState())) {
+            try {
+              // we need take a lock here as inflight writer could also try to update the timeline
+              txnManager.beginTransaction(Option.of(instant), Option.empty());
+              switch (instant.getAction()) {
+                case HoodieTimeline.COMMIT_ACTION:
+                case HoodieTimeline.DELTA_COMMIT_ACTION:
+                case HoodieTimeline.REPLACE_COMMIT_ACTION:
+                  HoodieCommitMetadata commitMetadata = HoodieCommitMetadata.fromBytes(
+                      table.getActiveTimeline().getInstantDetails(instant).get(), HoodieCommitMetadata.class);
+                  metadataWriter.update(commitMetadata, instant.getTimestamp(), false);
+                  break;
+                case HoodieTimeline.CLEAN_ACTION:
+                  HoodieCleanMetadata cleanMetadata = CleanerUtils.getCleanerMetadata(table.getMetaClient(), instant);
+                  metadataWriter.update(cleanMetadata, instant.getTimestamp());
+                  break;
+                case HoodieTimeline.RESTORE_ACTION:
+                  HoodieRestoreMetadata restoreMetadata = TimelineMetadataUtils.deserializeHoodieRestoreMetadata(
+                      table.getActiveTimeline().getInstantDetails(instant).get());
+                  metadataWriter.update(restoreMetadata, instant.getTimestamp());
+                  break;
+                case HoodieTimeline.ROLLBACK_ACTION:
+                  HoodieRollbackMetadata rollbackMetadata = TimelineMetadataUtils.deserializeHoodieRollbackMetadata(
+                      table.getActiveTimeline().getInstantDetails(instant).get());
+                  metadataWriter.update(rollbackMetadata, instant.getTimestamp());
+                  break;
+                default:
+                  throw new IllegalStateException("Unexpected value: " + instant.getAction());
+              }
+            } catch (IOException e) {
+              LOG.error("Could not update metadata partition for instant: " + instant);

Review comment:
       may be we can throw here. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835773987



##########
File path: hudi-common/src/main/avro/HoodieIndexPartitionInfo.avsc
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexPartitionInfo",
+  "fields": [
+    {
+      "name": "version",
+      "type": [
+        "int",
+        "null"
+      ],
+      "default": 1
+    },
+    {
+      "name": "metadataPartitionPath",
+      "type": [
+        "null",
+        "string"
+      ],
+      "default": null
+    },
+    {
+      "name": "indexUptoInstant",

Review comment:
       just kept it here as it is more granular level.. makes it flexible for future if we want to support index upto certain time for specific partitions. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835780780



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and return
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()

Review comment:
       we should be updating the completed list of partitions in tableConfig at the end. Thats what guards the readers to use the fully built out partitions. I see from current path, we update it before doing catch up. Lets take it towards the end when we are fully sure that certain partition is fully built and ready for readers to start cosuming. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r838170642



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {

Review comment:
       Meta point was - why do we need to track the inflight partitions in table metadata?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025540372


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) 
   * c5c563ffa6625d610c9c6bd252457129ce5ccddc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025426843


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) 
   * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025464066


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1022325469


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r838168027



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.common.util.StringUtils.isNullOrEmpty;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    // indexing should be done only if metadata is enabled
+    if (!props.getBoolean(HoodieMetadataConfig.ENABLE.key())) {
+      LOG.error(String.format("Metadata is not enabled. Please set %s to true.", HoodieMetadataConfig.ENABLE.key()));
+      return -1;
+    }
+
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleResponse(client.index(cfg.indexInstantTime)) ? 0 : 1;
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleResponse(client.index(indexingInstantTime.get())) ? 0 : 1;
+      } else {
+        return -1;
+      }
+    }
+  }
+
+  private int dropIndex(JavaSparkContext jsc) throws Exception {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);

Review comment:
       I think its better to do it more streamlined to begin with. Redoing/reworking adds a lot of overhead to understand side effects




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083354679


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 010de76ddd6c0201db746a13a5b04fc5e94125d4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520) 
   * a3ee4cd75320e578235cea4490ed7470bb721ea5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839136167



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/ThreeToFourUpgradeHandler.java
##########
@@ -35,7 +40,12 @@
   @Override
   public Map<ConfigProperty, String> upgrade(HoodieWriteConfig config, HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade upgradeDowngradeHelper) {
     Map<ConfigProperty, String> tablePropsToAdd = new Hashtable<>();
-    tablePropsToAdd.put(HoodieTableConfig.TABLE_CHECKSUM, String.valueOf(HoodieTableConfig.generateChecksum(config.getProps())));
+    tablePropsToAdd.put(TABLE_CHECKSUM, String.valueOf(HoodieTableConfig.generateChecksum(config.getProps())));
+    // if metadata is enabled and files partition exist then update TABLE_METADATA_INDEX_COMPLETED
+    // schema for the files partition is same between the two versions
+    if (config.isMetadataTableEnabled() && metadataPartitionExists(config.getBasePath(), context, MetadataPartitionType.FILES)) {
+      tablePropsToAdd.put(TABLE_METADATA_PARTITIONS, MetadataPartitionType.FILES.getPartitionPath());
+    }

Review comment:
       Hi @codope Just thinking, when users set current version 4 which means there is no need for upgrade/downgrade. Then how can we update the `TABLE_METADATA_PARTITIONS ` column here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1081069550


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 522a18caff448bcc9b127372d4526ee8f168f085 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452) 
   * ee361b1bf6b9b68e11f84f2af76625b847669ed2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r837790128



##########
File path: hudi-common/src/main/avro/HoodieIndexCommitMetadata.avsc
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexCommitMetadata",

Review comment:
       Better to call everything "Indexing" vs "index"

##########
File path: hudi-common/src/main/avro/HoodieIndexCommitMetadata.avsc
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexCommitMetadata",
+  "fields": [
+    {
+      "name": "version",
+      "doc": "This field replaces the field filesToBeDeletedPerPartition",
+      "type": [
+        "int",
+        "null"
+      ],
+      "default": 1
+    },
+    {

Review comment:
       do we need this

##########
File path: hudi-common/src/main/avro/HoodieIndexPartitionInfo.avsc
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexPartitionInfo",
+  "fields": [
+    {
+      "name": "version",
+      "type": [
+        "int",
+        "null"
+      ],
+      "default": 1
+    },
+    {
+      "name": "metadataPartitionPath",
+      "type": [
+        "null",
+        "string"
+      ],
+      "default": null
+    },
+    {
+      "name": "indexUptoInstant",
+      "type": [
+        "null",
+        "string"
+      ],
+      "default": null
+    }

Review comment:
       Should we also add a Map<String, String> to hold any index regeneration params?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -925,6 +928,53 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+
+  /**
+   * Schedules INDEX action.
+   *
+   * @param partitionTypes - list of {@link MetadataPartitionType} which needs to be indexed
+   * @return instant time for the requested INDEX action
+   */
+  public Option<String> scheduleIndexing(List<MetadataPartitionType> partitionTypes) {

Review comment:
       Should this api also take additional args for what kind of indexes to build?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -208,6 +208,18 @@
       .sinceVersion("0.11.0")
       .withDocumentation("Table checksum is used to guard against partial writes in HDFS. It is added as the last entry in hoodie.properties and then used to validate while reading table config.");
 
+  public static final ConfigProperty<String> TABLE_METADATA_INDEX_INFLIGHT = ConfigProperty
+      .key("hoodie.table.metadata.indexes.inflight")
+      .noDefaultValue()
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of metadata partitions whose indexing is in progress.");
+
+  public static final ConfigProperty<String> TABLE_METADATA_INDEX_COMPLETED = ConfigProperty
+      .key("hoodie.table.metadata.indexes.completed")
+      .noDefaultValue()
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of metadata partitions whose indexing is complete.");

Review comment:
       here, we can just stick to MT terminology, without using "indexing"?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -208,6 +208,18 @@
       .sinceVersion("0.11.0")
       .withDocumentation("Table checksum is used to guard against partial writes in HDFS. It is added as the last entry in hoodie.properties and then used to validate while reading table config.");
 
+  public static final ConfigProperty<String> TABLE_METADATA_INDEX_INFLIGHT = ConfigProperty
+      .key("hoodie.table.metadata.indexes.inflight")
+      .noDefaultValue()
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of metadata partitions whose indexing is in progress.");
+
+  public static final ConfigProperty<String> TABLE_METADATA_INDEX_COMPLETED = ConfigProperty

Review comment:
       rename: hoodie.table.metadata.partitions

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -585,6 +597,16 @@ private Long getTableChecksum() {
     return getLong(TABLE_CHECKSUM);
   }
 
+  public String getInflightMetadataIndexes() {
+    return getStringOrDefault(TABLE_METADATA_INDEX_INFLIGHT, "");
+  }
+
+  // TODO getInflightAndCompletedMetadataIndexes
+
+  public String getCompletedMetadataIndexes() {

Review comment:
       same here

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -961,6 +981,53 @@ public void testCleanerDeleteReplacedDataWithArchive(Boolean asyncClean) throws
     return config;
   }
 
+  private HoodieIndexer.Config buildIndexerConfig(String basePath,
+                                                  String tableName,
+                                                  String indexInstantTime,
+                                                  String runningMode,
+                                                  String indexTypes) {
+    HoodieIndexer.Config config = new HoodieIndexer.Config();
+    config.basePath = basePath;
+    config.tableName = tableName;
+    config.indexInstantTime = indexInstantTime;
+    config.propsFilePath = dfsBasePath + "/indexer.properties";
+    config.runningMode = runningMode;
+    config.indexTypes = indexTypes;
+    return config;
+  }
+
+  @Test
+  public void testHoodieIndexer() throws Exception {
+    String tableBasePath = dfsBasePath + "/asyncindexer";
+    HoodieDeltaStreamer ds = initialHoodieDeltaStreamer(tableBasePath, 1000, "false");
+
+    deltaStreamerTestRunner(ds, (r) -> {
+      TestHelpers.assertAtLeastNCommits(2, tableBasePath, dfs);
+
+      Option<String> scheduleIndexInstantTime = Option.empty();
+      try {
+        HoodieIndexer scheduleIndexingJob = new HoodieIndexer(jsc,
+            buildIndexerConfig(tableBasePath, ds.getConfig().targetTableName, null, SCHEDULE, "COLUMN_STATS"));
+        scheduleIndexInstantTime = scheduleIndexingJob.doSchedule();
+      } catch (Exception e) {
+        LOG.info("Schedule indexing failed", e);
+        return false;
+      }
+      if (scheduleIndexInstantTime.isPresent()) {
+        TestHelpers.assertPendingIndexCommit(tableBasePath, dfs);
+        LOG.info("Schedule indexing success, now build index with instant time " + scheduleIndexInstantTime.get());
+        HoodieIndexer runIndexingJob = new HoodieIndexer(jsc,

Review comment:
       Should we pull this into its own test class `TestHoodieIndexer`

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -343,16 +347,27 @@ private void processAppendResult(AppendResult result, List<IndexedRecord> record
       updateWriteStatus(stat, result);
     }
 
-    if (config.isMetadataIndexColumnStatsForAllColumnsEnabled()) {
+    if (config.isMetadataColumnStatsIndexEnabled()) {

Review comment:
       nts: follow up on all this code. needs to be more modular.

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/WriteOperationType.java
##########
@@ -48,6 +48,8 @@
   INSERT_OVERWRITE_TABLE("insert_overwrite_table"),
   // compact
   COMPACT("compact"),
+
+  INDEX("index"),

Review comment:
       INDEXING?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +680,38 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))

Review comment:
       we can avoid the code duplication here, by pulling the stream transformation into a lambda. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -595,13 +625,19 @@ private HoodieTableMetaClient initializeMetaClient(boolean populatMetaFields) th
    * @param createInstantTime - Metadata table create instant time
    * @throws IOException
    */
-  private void initializeEnabledFileGroups(HoodieTableMetaClient dataMetaClient, String createInstantTime) throws IOException {
-    for (MetadataPartitionType enabledPartitionType : this.enabledPartitionTypes) {
+  private void initializeEnabledFileGroups(HoodieTableMetaClient dataMetaClient, String createInstantTime, List<MetadataPartitionType> partitionTypes) throws IOException {
+    for (MetadataPartitionType enabledPartitionType : partitionTypes) {
       initializeFileGroups(dataMetaClient, enabledPartitionType, createInstantTime,
           enabledPartitionType.getFileGroupCount());
     }
   }
 
+  public void scheduleIndex(HoodieTableMetaClient dataMetaClient, List<MetadataPartitionType> metadataPartitions, String instantTime) throws IOException {

Review comment:
       this does not really do any scheduling of actions. rename:`initializeMetadataPartitions`

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +680,38 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+
+    for (MetadataPartitionType partitionType : indexesToDrop) {
+      String partitionPath = partitionType.getPartitionPath();
+      // first update table config
+      if (inflightIndexes.contains(partitionPath)) {
+        inflightIndexes.remove(partitionPath);
+        dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), String.join(",", inflightIndexes));
+      } else if (completedIndexes.contains(partitionPath)) {
+        completedIndexes.remove(partitionPath);
+        dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), String.join(",", completedIndexes));
+      }
+      HoodieTableConfig.update(dataMetaClient.getFs(), new Path(dataMetaClient.getMetaPath()), dataMetaClient.getTableConfig().getProps());
+      LOG.warn("Deleting Metadata Table partitions: " + partitionPath);
+      dataMetaClient.getFs().delete(new Path(metadataWriteConfig.getBasePath(), partitionPath), true);
+    }
+  }
+
   private MetadataRecordsGenerationParams getRecordsGenerationParams() {
     return new MetadataRecordsGenerationParams(
         dataMetaClient, enabledPartitionTypes, dataWriteConfig.getBloomFilterType(),
         dataWriteConfig.getBloomIndexParallelism(),
-        dataWriteConfig.isMetadataIndexColumnStatsForAllColumnsEnabled(),
-        dataWriteConfig.getColumnStatsIndexParallelism());
+        dataWriteConfig.isMetadataColumnStatsIndexEnabled(),
+        dataWriteConfig.getColumnStatsIndexParallelism(),
+        Stream.of(dataWriteConfig.getColumnsEnabledForColumnStatsIndex().split(","))

Review comment:
       all this split code can then be in place shared

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.common.util.StringUtils.isNullOrEmpty;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);

Review comment:
       any new tool, its good to have it work off `HoodieEngineContext` rather than the SparkContext directly

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -175,6 +182,25 @@
           .sinceVersion("0.11.0")
           .withDocumentation("Parallelism to use, when generating column stats index.");
 
+  public static final ConfigProperty<String> COLUMN_STATS_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.column.stats.for.columns")
+      .defaultValue("")
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of columns for which column stats index will be built. If not set, all columns will be indexed");
+
+  public static final ConfigProperty<String> BLOOM_FILTER_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.bloom.filter.for.columns")
+      .defaultValue("")
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of columns for which bloom filter index will be built. If not set, only record key will be indexed.");
+
+  public static final ConfigProperty<Integer> METADATA_INDEX_CHECK_TIMEOUT_SECONDS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.check.timeout.seconds")
+      .defaultValue(300)

Review comment:
       may be a higher value like `900`?

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.common.util.StringUtils.isNullOrEmpty;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    // indexing should be done only if metadata is enabled
+    if (!props.getBoolean(HoodieMetadataConfig.ENABLE.key())) {
+      LOG.error(String.format("Metadata is not enabled. Please set %s to true.", HoodieMetadataConfig.ENABLE.key()));
+      return -1;
+    }
+
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleResponse(client.index(cfg.indexInstantTime)) ? 0 : 1;
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleResponse(client.index(indexingInstantTime.get())) ? 0 : 1;
+      } else {
+        return -1;
+      }
+    }
+  }
+
+  private int dropIndex(JavaSparkContext jsc) throws Exception {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      client.dropIndex(partitionTypes);
+      return 0;
+    } catch (Exception e) {
+      LOG.error("Failed to drop index. ", e);
+      return -1;
+    }
+  }
+
+  private boolean handleResponse(Option<HoodieIndexCommitMetadata> commitMetadata) {
+    if (!commitMetadata.isPresent()) {
+      LOG.error("Indexing failed as no commit metadata present.");
+      return false;
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = commitMetadata.get().getIndexPartitionInfos();
+    LOG.info(String.format("Indexing complete for partitions: %s",
+        indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList())));
+    return isIndexBuiltForAllRequestedTypes(indexPartitionInfos);
+  }
+
+  boolean isIndexBuiltForAllRequestedTypes(List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    Set<String> indexedPartitions = indexPartitionInfos.stream()
+        .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+    Set<String> requestedPartitions = getRequestedPartitionTypes(cfg.indexTypes).stream()
+        .map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexedPartitions);
+    return requestedPartitions.isEmpty();
+  }
+
+  List<MetadataPartitionType> getRequestedPartitionTypes(String indexTypes) {
+    List<String> requestedIndexTypes = Arrays.asList(indexTypes.split(","));
+    return requestedIndexTypes.stream()
+        .map(p -> MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)))
+        // FILES partition is initialized synchronously while getting metadata writer

Review comment:
       but what if we need to rebuild `FILES` due to errors?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());

Review comment:
       does the normal writer go through this path as well. Wondering if we can do this inside the table service scheduling lock that already exists.

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -175,6 +182,25 @@
           .sinceVersion("0.11.0")
           .withDocumentation("Parallelism to use, when generating column stats index.");
 
+  public static final ConfigProperty<String> COLUMN_STATS_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.column.stats.for.columns")
+      .defaultValue("")
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of columns for which column stats index will be built. If not set, all columns will be indexed");
+
+  public static final ConfigProperty<String> BLOOM_FILTER_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.bloom.filter.for.columns")
+      .defaultValue("")
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of columns for which bloom filter index will be built. If not set, only record key will be indexed.");
+
+  public static final ConfigProperty<Integer> METADATA_INDEX_CHECK_TIMEOUT_SECONDS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.check.timeout.seconds")
+      .defaultValue(300)
+      .sinceVersion("0.11.0")
+      .withDocumentation("After the async indexer has finished indexing upto the base instant, it will reconcile with commits that happened after the base instant. "
+          + "This check could take finite amount of time depending on number of commits, so it needs to be bounded by a timeout which can configured with this key.");

Review comment:
       we need a better explanation to user. " base instant, to ensure that all inflight writers reliably write index updates as well. If this timeout expires, then the indexer will abort itself safely."

##########
File path: hudi-common/src/main/avro/HoodieIndexCommitMetadata.avsc
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexCommitMetadata",
+  "fields": [
+    {
+      "name": "version",
+      "doc": "This field replaces the field filesToBeDeletedPerPartition",

Review comment:
       fix docs

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java
##########
@@ -358,6 +360,19 @@ protected void removeReplacedFileIdsAtInstants(Set<String> instants) {
     return Option.ofNullable(fgIdToReplaceInstants.get(fileGroupId));
   }
 
+  /**
+   * Get the latest file slices for a given partition including the inflight ones.
+   *
+   * @param partitionPath
+   * @return Stream of latest {@link FileSlice} in the partition path.
+   */
+  public Stream<FileSlice> fetchLatestFileSlicesIncludingInflight(String partitionPath) {

Review comment:
       test for this?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -175,6 +182,25 @@
           .sinceVersion("0.11.0")
           .withDocumentation("Parallelism to use, when generating column stats index.");
 
+  public static final ConfigProperty<String> COLUMN_STATS_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.column.stats.for.columns")
+      .defaultValue("")

Review comment:
       noDefault?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -208,6 +208,18 @@
       .sinceVersion("0.11.0")
       .withDocumentation("Table checksum is used to guard against partial writes in HDFS. It is added as the last entry in hoodie.properties and then used to validate while reading table config.");
 
+  public static final ConfigProperty<String> TABLE_METADATA_INDEX_INFLIGHT = ConfigProperty
+      .key("hoodie.table.metadata.indexes.inflight")

Review comment:
       rename: hoodie.table.metadata.partitions.inflight

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -175,6 +182,25 @@
           .sinceVersion("0.11.0")
           .withDocumentation("Parallelism to use, when generating column stats index.");
 
+  public static final ConfigProperty<String> COLUMN_STATS_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.column.stats.for.columns")
+      .defaultValue("")
+      .sinceVersion("0.11.0")
+      .withDocumentation("Comma-separated list of columns for which column stats index will be built. If not set, all columns will be indexed");
+
+  public static final ConfigProperty<String> BLOOM_FILTER_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.bloom.filter.for.columns")

Review comment:
       rename: `.index.bloom.filter.column.list` 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java
##########
@@ -647,6 +651,65 @@ public void saveToRestoreRequested(HoodieInstant instant, Option<byte[]> content
     createFileInMetaPath(instant.getFileName(), content, false);
   }
 
+  /**
+   * Transition index instant state from requested to inflight.
+   *
+   * @param requestedInstant Inflight Instant
+   * @return inflight instant
+   */
+  public HoodieInstant transitionIndexRequestedToInflight(HoodieInstant requestedInstant, Option<byte[]> data) {
+    ValidationUtils.checkArgument(requestedInstant.getAction().equals(HoodieTimeline.INDEX_ACTION),
+        String.format("%s is not equal to %s action", requestedInstant.getAction(), INDEX_ACTION));
+    ValidationUtils.checkArgument(requestedInstant.isRequested(),
+        String.format("Instant %s not in requested state", requestedInstant.getTimestamp()));
+    HoodieInstant inflightInstant = new HoodieInstant(State.INFLIGHT, INDEX_ACTION, requestedInstant.getTimestamp());
+    transitionState(requestedInstant, inflightInstant, data);
+    return inflightInstant;
+  }
+
+  /**
+   * Transition index instant state from inflight to completed.
+   * @param inflightInstant Inflight Instant
+   * @return completed instant
+   */
+  public HoodieInstant transitionIndexInflightToComplete(HoodieInstant inflightInstant, Option<byte[]> data) {
+    ValidationUtils.checkArgument(inflightInstant.getAction().equals(HoodieTimeline.INDEX_ACTION),
+        String.format("%s is not equal to %s action", inflightInstant.getAction(), INDEX_ACTION));
+    ValidationUtils.checkArgument(inflightInstant.isInflight(),
+        String.format("Instant %s not inflight", inflightInstant.getTimestamp()));
+    HoodieInstant commitInstant = new HoodieInstant(State.COMPLETED, INDEX_ACTION, inflightInstant.getTimestamp());
+    transitionState(inflightInstant, commitInstant, data);
+    return commitInstant;
+  }
+
+  /**
+   * Revert index instant state from inflight to requested.
+   * @param inflightInstant Inflight Instant
+   * @return requested instant
+   */
+  public HoodieInstant revertIndexInflightToRequested(HoodieInstant inflightInstant) {
+    ValidationUtils.checkArgument(inflightInstant.getAction().equals(HoodieTimeline.INDEX_ACTION),
+        String.format("%s is not equal to %s action", inflightInstant.getAction(), INDEX_ACTION));
+    ValidationUtils.checkArgument(inflightInstant.isInflight(),
+        String.format("Instant %s not inflight", inflightInstant.getTimestamp()));
+    HoodieInstant requestedInstant = new HoodieInstant(State.REQUESTED, INDEX_ACTION, inflightInstant.getTimestamp());
+    if (metaClient.getTimelineLayoutVersion().isNullVersion()) {
+      transitionState(inflightInstant, requestedInstant, Option.empty());
+    } else {
+      deleteInflight(inflightInstant);
+    }
+    return requestedInstant;
+  }
+
+  /**
+   * Save content for inflight/requested index instant.
+   */
+  public void saveToPendingIndexCommit(HoodieInstant instant, Option<byte[]> content) {

Review comment:
       lets not overload "Commit" .. rename: saveToPendingIndexInstant?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -175,6 +182,25 @@
           .sinceVersion("0.11.0")
           .withDocumentation("Parallelism to use, when generating column stats index.");
 
+  public static final ConfigProperty<String> COLUMN_STATS_INDEX_FOR_COLUMNS = ConfigProperty
+      .key(METADATA_PREFIX + ".index.column.stats.for.columns")

Review comment:
       rename: `.stats.column.list` ?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -112,6 +113,16 @@ public HoodieDefaultTimeline getWriteTimeline() {
     return new HoodieDefaultTimeline(instants.stream().filter(s -> validActions.contains(s.getAction())), details);
   }
 
+  @Override
+  public HoodieTimeline getContiguousCompletedWriteTimeline() {
+    Option<HoodieInstant> earliestPending = getWriteTimeline().filterInflightsAndRequested().firstInstant();

Review comment:
       do we special case only `getWriteTimeline()`? or include all actions?

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -123,6 +123,22 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
     }
   }
 
+  /**
+   * Check if the given metadata partition exists.
+   *
+   * @param basePath base path of the dataset
+   * @param context  instance of {@link HoodieEngineContext}.
+   */
+  public static boolean metadataPartitionExists(String basePath, HoodieEngineContext context, MetadataPartitionType partitionType) {

Review comment:
       how often is this called? need to ensure this is not called from any executors, which can send a lot of RPC to storage

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -585,6 +597,16 @@ private Long getTableChecksum() {
     return getLong(TABLE_CHECKSUM);
   }
 
+  public String getInflightMetadataIndexes() {
+    return getStringOrDefault(TABLE_METADATA_INDEX_INFLIGHT, "");

Review comment:
       special value? that denotes nothing inflight? then add some constant? `StringUtils.EMPTY_STRING`

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.common.util.StringUtils.isNullOrEmpty;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    // indexing should be done only if metadata is enabled
+    if (!props.getBoolean(HoodieMetadataConfig.ENABLE.key())) {
+      LOG.error(String.format("Metadata is not enabled. Please set %s to true.", HoodieMetadataConfig.ENABLE.key()));
+      return -1;
+    }
+
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleResponse(client.index(cfg.indexInstantTime)) ? 0 : 1;
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleResponse(client.index(indexingInstantTime.get())) ? 0 : 1;
+      } else {
+        return -1;
+      }
+    }
+  }
+
+  private int dropIndex(JavaSparkContext jsc) throws Exception {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);

Review comment:
       we should we dropping by actual partition path and not the type?

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
##########
@@ -466,8 +471,7 @@ public static SchemaProviderWithPostProcessor wrapSchemaProviderWithPostProcesso
         Option.ofNullable(createSchemaPostProcessor(schemaPostProcessorClass, cfg, jssc)));
   }
 
-  public static SchemaProvider createRowBasedSchemaProvider(StructType structType,
-                                                            TypedProperties cfg, JavaSparkContext jssc) {
+  public static SchemaProvider createRowBasedSchemaProvider(StructType structType, TypedProperties cfg, JavaSparkContext jssc) {

Review comment:
       avoid these whitespace changes?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -511,24 +523,42 @@ private boolean initializeFromFilesystem(HoodieTableMetaClient dataMetaClient,
 
     initializeMetaClient(dataWriteConfig.getMetadataConfig().populateMetaFields());
     initTableMetadata();
-    initializeEnabledFileGroups(dataMetaClient, createInstantTime);
+    // if async metadata indexing is enabled,
+    // then only initialize files partition as other partitions will be built using HoodieIndexer
+    List<MetadataPartitionType> enabledPartitionTypes =  new ArrayList<>();
+    if (dataWriteConfig.isMetadataAsyncIndex()) {
+      enabledPartitionTypes.add(MetadataPartitionType.FILES);
+    } else {
+      // all enabled ones should be initialized
+      enabledPartitionTypes = this.enabledPartitionTypes;
+    }
+    initializeEnabledFileGroups(dataMetaClient, createInstantTime, enabledPartitionTypes);
 
     // During cold startup, the list of files to be committed can be huge. So creating a HoodieCommitMetadata out
     // of these large number of files and calling the existing update(HoodieCommitMetadata) function does not scale
     // well. Hence, we have a special commit just for the initialization scenario.
-    initialCommit(createInstantTime);
+    initialCommit(createInstantTime, enabledPartitionTypes);
+    updateCompletedIndexesInTableConfig(enabledPartitionTypes);

Review comment:
       rename: `updatedInitializedIndexesInTableConfig` ?

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/MetadataRecordsGenerationParams.java
##########
@@ -35,15 +35,19 @@
   private final int bloomIndexParallelism;
   private final boolean isAllColumnStatsIndexEnabled;
   private final int columnStatsIndexParallelism;
+  private final List<String> columnsToIndex;

Review comment:
       rename: statsIndexColList

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -925,6 +928,53 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+
+  /**
+   * Schedules INDEX action.
+   *
+   * @param partitionTypes - list of {@link MetadataPartitionType} which needs to be indexed
+   * @return instant time for the requested INDEX action
+   */
+  public Option<String> scheduleIndexing(List<MetadataPartitionType> partitionTypes) {
+    String instantTime = HoodieActiveTimeline.createNewInstantTime();
+    Option<HoodieIndexPlan> indexPlan = createTable(config, hadoopConf, config.isMetadataTableEnabled())
+        .scheduleIndex(context, instantTime, partitionTypes);
+    return indexPlan.isPresent() ? Option.of(instantTime) : Option.empty();
+  }
+
+  /**
+   * Runs INDEX action to build out the metadata partitions as planned for the given instant time.
+   *
+   * @param indexInstantTime - instant time for the requested INDEX action
+   * @return {@link Option<HoodieIndexCommitMetadata>} after successful indexing.
+   */
+  public Option<HoodieIndexCommitMetadata> index(String indexInstantTime) {
+    return createTable(config, hadoopConf, config.isMetadataTableEnabled()).index(context, indexInstantTime);
+  }
+
+  /**
+   * Drops the index and removes the metadata partitions.
+   *
+   * @param partitionTypes - list of {@link MetadataPartitionType} which needs to be indexed
+   */
+  public void dropIndex(List<MetadataPartitionType> partitionTypes) {

Review comment:
       are there tests for these APIs?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -585,6 +597,16 @@ private Long getTableChecksum() {
     return getLong(TABLE_CHECKSUM);
   }
 
+  public String getInflightMetadataIndexes() {
+    return getStringOrDefault(TABLE_METADATA_INDEX_INFLIGHT, "");
+  }
+
+  // TODO getInflightAndCompletedMetadataIndexes

Review comment:
       remove

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -511,24 +523,42 @@ private boolean initializeFromFilesystem(HoodieTableMetaClient dataMetaClient,
 
     initializeMetaClient(dataWriteConfig.getMetadataConfig().populateMetaFields());
     initTableMetadata();
-    initializeEnabledFileGroups(dataMetaClient, createInstantTime);
+    // if async metadata indexing is enabled,
+    // then only initialize files partition as other partitions will be built using HoodieIndexer
+    List<MetadataPartitionType> enabledPartitionTypes =  new ArrayList<>();
+    if (dataWriteConfig.isMetadataAsyncIndex()) {

Review comment:
       I am not sure if this is direction we want to go. whether the index is built async or inline should be immaterial right? We should be able to schedule and run indexing from either indexer or the actual writer, with all the necessary locking for safety?
   
   

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -961,6 +981,53 @@ public void testCleanerDeleteReplacedDataWithArchive(Boolean asyncClean) throws
     return config;
   }
 
+  private HoodieIndexer.Config buildIndexerConfig(String basePath,
+                                                  String tableName,
+                                                  String indexInstantTime,
+                                                  String runningMode,
+                                                  String indexTypes) {
+    HoodieIndexer.Config config = new HoodieIndexer.Config();
+    config.basePath = basePath;
+    config.tableName = tableName;
+    config.indexInstantTime = indexInstantTime;
+    config.propsFilePath = dfsBasePath + "/indexer.properties";
+    config.runningMode = runningMode;
+    config.indexTypes = indexTypes;
+    return config;
+  }
+
+  @Test
+  public void testHoodieIndexer() throws Exception {
+    String tableBasePath = dfsBasePath + "/asyncindexer";
+    HoodieDeltaStreamer ds = initialHoodieDeltaStreamer(tableBasePath, 1000, "false");

Review comment:
       rename: `initializeDeltaStreamer`?

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/MetadataRecordsGenerationParams.java
##########
@@ -35,15 +35,19 @@
   private final int bloomIndexParallelism;
   private final boolean isAllColumnStatsIndexEnabled;
   private final int columnStatsIndexParallelism;
+  private final List<String> columnsToIndex;
+  private final List<String> bloomSecondaryKeys;

Review comment:
       rename: bloomIndexColList

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieIndexer.java
##########
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.HoodieReadClient;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.client.common.HoodieSparkEngineContext;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.testutils.HoodieCommonTestHarness;
+import org.apache.hudi.common.testutils.HoodieTestUtils;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.testutils.providers.SparkProvider;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+import org.apache.spark.sql.SparkSession;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+public class TestHoodieIndexer extends HoodieCommonTestHarness implements SparkProvider {

Review comment:
       there is a lot more failure scenarios to test?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -511,24 +523,42 @@ private boolean initializeFromFilesystem(HoodieTableMetaClient dataMetaClient,
 
     initializeMetaClient(dataWriteConfig.getMetadataConfig().populateMetaFields());
     initTableMetadata();
-    initializeEnabledFileGroups(dataMetaClient, createInstantTime);
+    // if async metadata indexing is enabled,
+    // then only initialize files partition as other partitions will be built using HoodieIndexer
+    List<MetadataPartitionType> enabledPartitionTypes =  new ArrayList<>();
+    if (dataWriteConfig.isMetadataAsyncIndex()) {
+      enabledPartitionTypes.add(MetadataPartitionType.FILES);
+    } else {
+      // all enabled ones should be initialized
+      enabledPartitionTypes = this.enabledPartitionTypes;
+    }
+    initializeEnabledFileGroups(dataMetaClient, createInstantTime, enabledPartitionTypes);
 
     // During cold startup, the list of files to be committed can be huge. So creating a HoodieCommitMetadata out
     // of these large number of files and calling the existing update(HoodieCommitMetadata) function does not scale
     // well. Hence, we have a special commit just for the initialization scenario.
-    initialCommit(createInstantTime);
+    initialCommit(createInstantTime, enabledPartitionTypes);
+    updateCompletedIndexesInTableConfig(enabledPartitionTypes);
     return true;
   }
 
-  private HoodieTableMetaClient initializeMetaClient(boolean populatMetaFields) throws IOException {
+  private void updateCompletedIndexesInTableConfig(List<MetadataPartitionType> partitionTypes) {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))

Review comment:
       can we push this parsing into `getCompletedMetadataIndexes()` to return a List/Set?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +724,87 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    // if indexing is inflight then do not trigger table service
+    boolean doNotTriggerTableService = partitionsToUpdate.stream().anyMatch(inflightIndexes::contains);
+
     if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      // convert metadata and filter only the entries whose partition path are in partitionsToUpdate
+      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata().entrySet().stream()
+          .filter(entry -> partitionsToUpdate.contains(entry.getKey().getPartitionPath())).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
+      commit(instantTime, partitionRecordsMap, !doNotTriggerTableService && canTriggerTableService);
+    }
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    // add inflight indexes as well because the file groups have already been initialized, so writers can log updates
+    partitionsToUpdate.addAll(Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    if (!partitionsToUpdate.isEmpty()) {
+      return partitionsToUpdate;
+    }
+    // fallback to all enabled partitions if table config returned no partitions
+    partitionsToUpdate.addAll(getEnabledPartitionTypes().stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toList()));

Review comment:
       just a single return statement, would be easier on the eyes?
   
   `return getEnabledPartitionTypes().stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toList())`

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -363,6 +368,18 @@ public void initTableMetadata() {
                                                                    Option<String> inflightInstantTimestamp) throws IOException {
     HoodieTimer timer = new HoodieTimer().startTimer();
 
+    boolean exists = metadataExists(dataMetaClient, actionMetadata);
+
+    if (!exists) {
+      // Initialize for the first time by listing partitions and files directly from the file system
+      if (initializeFromFilesystem(dataMetaClient, inflightInstantTimestamp)) {
+        metrics.ifPresent(m -> m.updateMetrics(HoodieMetadataMetrics.INITIALIZE_STR, timer.endTimer()));
+      }
+    }
+  }
+
+  private <T extends SpecificRecordBase> boolean metadataExists(HoodieTableMetaClient dataMetaClient,

Review comment:
       rename: metadataTableExists

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -877,32 +1011,34 @@ private void initialCommit(String createInstantTime) {
       return;
     }
 
-    HoodieData<HoodieRecord> filesPartitionRecords = engineContext.parallelize(Arrays.asList(allPartitionRecord), 1);
-    if (!partitionInfoList.isEmpty()) {
-      HoodieData<HoodieRecord> fileListRecords = engineContext.parallelize(partitionInfoList, partitionInfoList.size()).map(partitionInfo -> {
-        Map<String, Long> fileNameToSizeMap = partitionInfo.getFileNameToSizeMap();
-        // filter for files that are part of the completed commits
-        Map<String, Long> validFileNameToSizeMap = fileNameToSizeMap.entrySet().stream().filter(fileSizePair -> {
-          String commitTime = FSUtils.getCommitTime(fileSizePair.getKey());
-          return HoodieTimeline.compareTimestamps(commitTime, HoodieTimeline.LESSER_THAN_OR_EQUALS, createInstantTime);
-        }).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
-
-        // Record which saves files within a partition
-        return HoodieMetadataPayload.createPartitionFilesRecord(
-            HoodieTableMetadataUtil.getPartition(partitionInfo.getRelativePath()), Option.of(validFileNameToSizeMap), Option.empty());
-      });
-      filesPartitionRecords = filesPartitionRecords.union(fileListRecords);
+    if (partitionTypes.contains(MetadataPartitionType.FILES)) {

Review comment:
       move to a separate method

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +680,38 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+
+    for (MetadataPartitionType partitionType : indexesToDrop) {

Review comment:
       nit: I think we can write a Pojo to maintain the list of partitions, add, drop etc. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java
##########
@@ -19,45 +19,79 @@
 package org.apache.hudi.metadata;
 
 import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
 import org.apache.hudi.avro.model.HoodieRestoreMetadata;
 import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
 import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
 
+import java.io.IOException;
 import java.io.Serializable;
+import java.util.List;
 
 /**
  * Interface that supports updating metadata for a given table, as actions complete.
  */
 public interface HoodieTableMetadataWriter extends Serializable, AutoCloseable {
 
+  /**
+   * Execute the index action for the given partitions.
+   *
+   * @param engineContext
+   * @param indexPartitionInfos - partitions to index
+   */
+  void buildIndex(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos);

Review comment:
       Food for thought. at this layer, should we talk about them as metadata partitions or index?

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -397,6 +401,22 @@ static void assertAtLeastNReplaceCommits(int minExpected, String tablePath, File
       assertTrue(minExpected <= numDeltaCommits, "Got=" + numDeltaCommits + ", exp >=" + minExpected);
     }
 
+    static void assertPendingIndexCommit(String tablePath, FileSystem fs) {
+      HoodieTableMetaClient meta = HoodieTableMetaClient.builder().setConf(fs.getConf()).setBasePath(tablePath).setLoadActiveTimelineOnLoad(true).build();
+      HoodieTimeline timeline = meta.getActiveTimeline().getAllCommitsTimeline().filterPendingIndexTimeline();
+      LOG.info("Timeline Instants=" + meta.getActiveTimeline().getInstants().collect(Collectors.toList()));
+      int numIndexCommits = (int) timeline.getInstants().count();
+      assertEquals(1, numIndexCommits, "Got=" + numIndexCommits + ", exp=1");
+    }
+
+    static void assertCompletedIndexCommit(String tablePath, FileSystem fs) {

Review comment:
       share code between the two?

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.common.util.StringUtils.isNullOrEmpty;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    // indexing should be done only if metadata is enabled
+    if (!props.getBoolean(HoodieMetadataConfig.ENABLE.key())) {
+      LOG.error(String.format("Metadata is not enabled. Please set %s to true.", HoodieMetadataConfig.ENABLE.key()));
+      return -1;
+    }
+
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleResponse(client.index(cfg.indexInstantTime)) ? 0 : 1;
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleResponse(client.index(indexingInstantTime.get())) ? 0 : 1;
+      } else {
+        return -1;

Review comment:
       return something else? or is that the process exit code?

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.common.util.StringUtils.isNullOrEmpty;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    // indexing should be done only if metadata is enabled
+    if (!props.getBoolean(HoodieMetadataConfig.ENABLE.key())) {
+      LOG.error(String.format("Metadata is not enabled. Please set %s to true.", HoodieMetadataConfig.ENABLE.key()));
+      return -1;
+    }
+
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))

Review comment:
       why is this check needed? pending should be fine already right

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {

Review comment:
       I still find a bit odd, that we pass in `partitionTypes` as an API arg and take all the different configs for them from writeConfig. 
   
   Can't we deduce `partitionsToIndex`  from `config`. e.g if there are valid values for list of columns for stats or bloom filter, it does mean that those types are enabled?

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.common.util.StringUtils.isNullOrEmpty;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    // indexing should be done only if metadata is enabled
+    if (!props.getBoolean(HoodieMetadataConfig.ENABLE.key())) {
+      LOG.error(String.format("Metadata is not enabled. Please set %s to true.", HoodieMetadataConfig.ENABLE.key()));
+      return -1;
+    }
+
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleResponse(client.index(cfg.indexInstantTime)) ? 0 : 1;
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleResponse(client.index(indexingInstantTime.get())) ? 0 : 1;
+      } else {
+        return -1;
+      }
+    }
+  }
+
+  private int dropIndex(JavaSparkContext jsc) throws Exception {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      client.dropIndex(partitionTypes);
+      return 0;
+    } catch (Exception e) {
+      LOG.error("Failed to drop index. ", e);
+      return -1;
+    }
+  }
+
+  private boolean handleResponse(Option<HoodieIndexCommitMetadata> commitMetadata) {
+    if (!commitMetadata.isPresent()) {
+      LOG.error("Indexing failed as no commit metadata present.");
+      return false;
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = commitMetadata.get().getIndexPartitionInfos();
+    LOG.info(String.format("Indexing complete for partitions: %s",
+        indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList())));
+    return isIndexBuiltForAllRequestedTypes(indexPartitionInfos);
+  }
+
+  boolean isIndexBuiltForAllRequestedTypes(List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    Set<String> indexedPartitions = indexPartitionInfos.stream()
+        .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+    Set<String> requestedPartitions = getRequestedPartitionTypes(cfg.indexTypes).stream()
+        .map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexedPartitions);
+    return requestedPartitions.isEmpty();
+  }
+
+  List<MetadataPartitionType> getRequestedPartitionTypes(String indexTypes) {

Review comment:
       should this be written as `MetadataPartitionType.fromListString(indexTypes)`? and maintained closer to the enum

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))

Review comment:
       one more place to reuse the trimming/parsing code

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getCompletedMetadataIndexes().split(","))

Review comment:
       need a helper for getting both inflight and completed in a single set

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);

Review comment:
       partition types or index types? Consistent terminology?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {
+      LOG.warn(String.format("Following partitions already exist or inflight: %s. Going to index only these partitions: %s",
+          indexesInflightOrCompleted, requestedPartitions));
+    }
+    List<MetadataPartitionType> finalPartitionsToIndex = partitionsToIndex.stream()
+        .filter(p -> requestedPartitions.contains(p.getPartitionPath())).collect(Collectors.toList());
+    // get last completed instant
+    Option<HoodieInstant> indexUptoInstant = table.getActiveTimeline().getContiguousCompletedWriteTimeline().lastInstant();
+    if (indexUptoInstant.isPresent()) {
+      final HoodieInstant indexInstant = HoodieTimeline.getIndexRequestedInstant(instantTime);
+      // start initializing file groups
+      // in case FILES partition itself was not initialized before (i.e. metadata was never enabled), this will initialize synchronously
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to initialize filegroups for indexing for instant: %s", instantTime)));
+      try {
+        this.txnManager.beginTransaction(Option.of(indexInstant), Option.empty());
+        metadataWriter.scheduleIndex(table.getMetaClient(), finalPartitionsToIndex, indexInstant.getTimestamp());
+      } catch (IOException e) {
+        LOG.error("Could not initialize file groups", e);
+        throw new HoodieIOException(e.getMessage(), e);
+      } finally {
+        this.txnManager.endTransaction(Option.of(indexInstant));
+      }
+      // for each partitionToIndex add that time to the plan

Review comment:
       suggest we do the whole thing within the lock, to keep reasoning simpler. wdyt

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()

Review comment:
       what happens if we fail after INFLIGHT

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {

Review comment:
       this method is too long/monolithic

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      LOG.info("Starting Index Building");
+      metadataWriter.buildIndex(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();

Review comment:
       should the uptoInstant be in a higher level struct then, to avoid `get(0)` 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      LOG.info("Starting Index Building");
+      metadataWriter.buildIndex(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+      LOG.info("Total remaining instants to index: " + instantsToIndex.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> indexingCatchupTaskFuture = executorService.submit(
+          new IndexingCatchupTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient(), metadataMetaClient));
+      try {
+        LOG.info("Starting index catchup task");
+        indexingCatchupTaskFuture.get(config.getIndexingCheckTimeoutSeconds(), TimeUnit.SECONDS);
+      } catch (Exception e) {
+        indexingCatchupTaskFuture.cancel(true);
+        throw new HoodieIndexException(String.format("Index catchup failed. Current indexed instant = %s. Aborting!", currentIndexedInstant), e);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        // update the table config and timeline in a lock as there could be another indexer running
+        txnManager.beginTransaction();
+        updateTableConfig(table.getMetaClient(), finalIndexPartitionInfos);
+        table.getActiveTimeline().saveAsComplete(
+            new HoodieInstant(true, INDEX_ACTION, indexInstant.getTimestamp()),
+            TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+      } finally {
+        txnManager.endTransaction();
+      }
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {

Review comment:
       tests for all these methods

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {

Review comment:
       So this assumes, we drop the existing partitions if we need to recreate

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {
+      LOG.warn(String.format("Following partitions already exist or inflight: %s. Going to index only these partitions: %s",
+          indexesInflightOrCompleted, requestedPartitions));
+    }
+    List<MetadataPartitionType> finalPartitionsToIndex = partitionsToIndex.stream()
+        .filter(p -> requestedPartitions.contains(p.getPartitionPath())).collect(Collectors.toList());
+    // get last completed instant
+    Option<HoodieInstant> indexUptoInstant = table.getActiveTimeline().getContiguousCompletedWriteTimeline().lastInstant();

Review comment:
       should this be guarded inside the lock as well

##########
File path: hudi-common/src/main/avro/HoodieIndexPartitionInfo.avsc
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexPartitionInfo",
+  "fields": [
+    {
+      "name": "version",

Review comment:
       yes every datum has a version in case we need to write some evolution code

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))

Review comment:
       that leaks business logic of indexing into TableConfig. TableConfig APIs should just do lightweight, splitting/parsing/getting/setting/defaults

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      LOG.info("Starting Index Building");

Review comment:
       add more context to log message?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      LOG.info("Starting Index Building");
+      metadataWriter.buildIndex(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+      LOG.info("Total remaining instants to index: " + instantsToIndex.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> indexingCatchupTaskFuture = executorService.submit(
+          new IndexingCatchupTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient(), metadataMetaClient));
+      try {
+        LOG.info("Starting index catchup task");
+        indexingCatchupTaskFuture.get(config.getIndexingCheckTimeoutSeconds(), TimeUnit.SECONDS);
+      } catch (Exception e) {
+        indexingCatchupTaskFuture.cancel(true);
+        throw new HoodieIndexException(String.format("Index catchup failed. Current indexed instant = %s. Aborting!", currentIndexedInstant), e);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        // update the table config and timeline in a lock as there could be another indexer running
+        txnManager.beginTransaction();
+        updateTableConfig(table.getMetaClient(), finalIndexPartitionInfos);
+        table.getActiveTimeline().saveAsComplete(
+            new HoodieInstant(true, INDEX_ACTION, indexInstant.getTimestamp()),
+            TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));

Review comment:
       what happens if we fail before reaching L165. i.e the timeline saving. How do we recover/reconcile

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -112,6 +113,16 @@ public HoodieDefaultTimeline getWriteTimeline() {
     return new HoodieDefaultTimeline(instants.stream().filter(s -> validActions.contains(s.getAction())), details);
   }
 
+  @Override
+  public HoodieTimeline getContiguousCompletedWriteTimeline() {
+    Option<HoodieInstant> earliestPending = getWriteTimeline().filterInflightsAndRequested().firstInstant();

Review comment:
       We should ensure that its okay that we have duplicate entries in base and log files. for .eg at t=9, when we list the table to find all files to be add to `FILES` MT partition, then we will also get f8 written as part of DC/8. 
   This same file will also be written to the log later on during catch up phase

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      LOG.info("Starting Index Building");
+      metadataWriter.buildIndex(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+      LOG.info("Total remaining instants to index: " + instantsToIndex.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);

Review comment:
       double check to shut this down in all failure scenarios

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -112,6 +113,16 @@ public HoodieDefaultTimeline getWriteTimeline() {
     return new HoodieDefaultTimeline(instants.stream().filter(s -> validActions.contains(s.getAction())), details);
   }
 
+  @Override
+  public HoodieTimeline getContiguousCompletedWriteTimeline() {
+    Option<HoodieInstant> earliestPending = getWriteTimeline().filterInflightsAndRequested().firstInstant();

Review comment:
       the catch up phase will get DC/8, in a lock. Ideal outcome for us is 
   
   - `base file`  : DC 1-3, 5-6, 8
   - `logs:`  CLEAN 4, COMPACT 7, DC 8 (after waiting for 4, 7 to complete) any instant > 9
   
   

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)

Review comment:
       @prashantwason is saying why cant we just pass this using --executor-memory flag of spark-submit?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      LOG.info("Starting Index Building");
+      metadataWriter.buildIndex(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+      LOG.info("Total remaining instants to index: " + instantsToIndex.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> indexingCatchupTaskFuture = executorService.submit(
+          new IndexingCatchupTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient(), metadataMetaClient));
+      try {
+        LOG.info("Starting index catchup task");
+        indexingCatchupTaskFuture.get(config.getIndexingCheckTimeoutSeconds(), TimeUnit.SECONDS);
+      } catch (Exception e) {
+        indexingCatchupTaskFuture.cancel(true);
+        throw new HoodieIndexException(String.format("Index catchup failed. Current indexed instant = %s. Aborting!", currentIndexedInstant), e);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        // update the table config and timeline in a lock as there could be another indexer running
+        txnManager.beginTransaction();
+        updateTableConfig(table.getMetaClient(), finalIndexPartitionInfos);
+        table.getActiveTimeline().saveAsComplete(
+            new HoodieInstant(true, INDEX_ACTION, indexInstant.getTimestamp()),
+            TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+      } finally {
+        txnManager.endTransaction();
+      }
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline()
+        .getWriteTimeline()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().getWriteTimeline().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline()
+        .filterCompletedInstants()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    completedInstants.addAll(metaClient.reloadActiveTimeline().filterCompletedInstants().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  private void updateTableConfig(HoodieTableMetaClient metaClient, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    // remove from inflight and update completed indexes
+    Set<String> inflightIndexes = Stream.of(metaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> completedIndexes = Stream.of(metaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> indexesRequested = indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+    inflightIndexes.removeAll(indexesRequested);
+    completedIndexes.addAll(indexesRequested);
+    // update table config
+    metaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), String.join(",", inflightIndexes));
+    metaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), String.join(",", completedIndexes));
+    HoodieTableConfig.update(metaClient.getFs(), new Path(metaClient.getMetaPath()), metaClient.getTableConfig().getProps());
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCatchupTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+    private final Set<String> metadataCompletedInstants;
+    private final HoodieTableMetaClient metaClient;
+    private final HoodieTableMetaClient metadataMetaClient;
+
+    IndexingCatchupTask(HoodieTableMetadataWriter metadataWriter,
+                        List<HoodieInstant> instantsToIndex,
+                        Set<String> metadataCompletedInstants,
+                        HoodieTableMetaClient metaClient,
+                        HoodieTableMetaClient metadataMetaClient) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+      this.metadataCompletedInstants = metadataCompletedInstants;
+      this.metaClient = metaClient;
+      this.metadataMetaClient = metadataMetaClient;
+    }
+
+    @Override
+    public void run() {
+      for (HoodieInstant instant : instantsToIndex) {
+        // metadata index already updated for this instant
+        if (!metadataCompletedInstants.isEmpty() && metadataCompletedInstants.contains(instant.getTimestamp())) {
+          currentIndexedInstant = instant.getTimestamp();
+          continue;
+        }
+        while (!instant.isCompleted()) {
+          try {
+            LOG.warn("instant not completed, reloading timeline " + instant);
+            // reload timeline and fetch instant details again wait until timeout
+            String instantTime = instant.getTimestamp();
+            Option<HoodieInstant> currentInstant = metaClient.reloadActiveTimeline()
+                .filterCompletedInstants().filter(i -> i.getTimestamp().equals(instantTime)).firstInstant();
+            instant = currentInstant.orElse(instant);
+            // so that timeline is not reloaded very frequently
+            Thread.sleep(5000);

Review comment:
       constant

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleAndExecute\" to generate an indexing plan first and execute that plan immediately")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for "
+        + "hoodie client for compacting")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  private int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);

Review comment:
       +1. we need some checks also on not allowing the same metadata partition from being re-scheduled again

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {

Review comment:
       Also rename: partitionIndexTypes?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -925,6 +928,53 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+
+  /**
+   * Schedules INDEX action.
+   *
+   * @param partitionTypes - list of {@link MetadataPartitionType} which needs to be indexed
+   * @return instant time for the requested INDEX action
+   */
+  public Option<String> scheduleIndexing(List<MetadataPartitionType> partitionTypes) {

Review comment:
       consistent use of `indexing` vs `index`

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);

Review comment:
       >But first bring up HoodieIndexer and wait for everything to be built out.
   
   But we should be able to support fully async mode of operation right

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -112,6 +113,16 @@ public HoodieDefaultTimeline getWriteTimeline() {
     return new HoodieDefaultTimeline(instants.stream().filter(s -> validActions.contains(s.getAction())), details);
   }
 
+  @Override
+  public HoodieTimeline getContiguousCompletedWriteTimeline() {
+    Option<HoodieInstant> earliestPending = getWriteTimeline().filterInflightsAndRequested().firstInstant();

Review comment:
       | DC  | DC | DC | CLEAN | DC | DC | COMPACT | DC| INDEX | DC |
   | ---- | --- |--- |-------- |---- |--- |----------- | ----|-------- | ---|
   | 1      |    2  |3  |      4  | 5 | 6 | 7 | 8| 9| 10 |
   | C  | C  |C  | I | C  |C  | R | C| R | I |
   
   `indexUpto` = 6 ; indexer looks for inflights after 6. but can then not check cleaning at 4, properly writing to log files before completing?
   
   
   If we were to consider cleaning as well. then `indexUpto` becomes 3. We probably need one more variable here e.g `catchUpStartInstant=3`, while keeping `indexUpto=6`?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {

Review comment:
        Originally in my mind, I only had one table config, that tracks "completed" partitions/indexes (in this PRs terminology) and the timeline is the source of truth for whats inflight. how do we handle the scenario where we fail after updating  the tableConfig/hoodie.props, but before writing the requested indexing to the timeline.

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.common.util.StringUtils.isNullOrEmpty;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  public int start(int retry) {
+    // indexing should be done only if metadata is enabled
+    if (!props.getBoolean(HoodieMetadataConfig.ENABLE.key())) {
+      LOG.error(String.format("Metadata is not enabled. Please set %s to true.", HoodieMetadataConfig.ENABLE.key()));
+      return -1;
+    }
+
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        case DROP_INDEX: {
+          LOG.info("Running Mode: [" + DROP_INDEX + "];");
+          return dropIndex(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  @TestOnly
+  public Option<String> doSchedule() throws Exception {
+    return this.scheduleIndexing(jsc);
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleResponse(client.index(cfg.indexInstantTime)) ? 0 : 1;
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleResponse(client.index(indexingInstantTime.get())) ? 0 : 1;
+      } else {
+        return -1;
+      }
+    }
+  }
+
+  private int dropIndex(JavaSparkContext jsc) throws Exception {
+    List<MetadataPartitionType> partitionTypes = getRequestedPartitionTypes(cfg.indexTypes);

Review comment:
       what if we have two bloom filter partitions/indexes on two columns A, B?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1082166163


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be08ba499bb88d8a00f20695b360336853be708e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511) 
   * 010de76ddd6c0201db746a13a5b04fc5e94125d4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1082033950


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be08ba499bb88d8a00f20695b360336853be708e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r838170357



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!EnumSet.allOf(MetadataPartitionType.class).containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+    // make sure that it is idempotent, check with previously pending index operations.
+    Set<String> indexesInflightOrCompleted = Stream.of(table.getMetaClient().getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    indexesInflightOrCompleted.addAll(Stream.of(table.getMetaClient().getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    Set<String> requestedPartitions = partitionsToIndex.stream().map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    requestedPartitions.removeAll(indexesInflightOrCompleted);
+    if (!requestedPartitions.isEmpty()) {

Review comment:
       Don't we need to handle case 2? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083533723


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a3ee4cd75320e578235cea4490ed7470bb721ea5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1080546882


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 69071c6306ce336076aa6daa4337276990572ee4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368) 
   * 522a18caff448bcc9b127372d4526ee8f168f085 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1083690942


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582",
       "triggerID" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 18b9acd3320e68ee6688ea4eec693676350a9e15 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575) 
   * fc9ac46f36a4df8d9d590845b9848d48af1f7cae Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1082169589


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be08ba499bb88d8a00f20695b360336853be708e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511) 
   * 010de76ddd6c0201db746a13a5b04fc5e94125d4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836581277



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +669,36 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    for (MetadataPartitionType partitionType : indexesToDrop) {
+      String partitionPath = partitionType.getPartitionPath();
+      if (inflightIndexes.contains(partitionPath)) {
+        LOG.error("Metadata indexing in progress: " + partitionPath);

Review comment:
       updated to support dropping while indexing is inflight.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1084877385


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7570",
       "triggerID" : "a3ee4cd75320e578235cea4490ed7470bb721ea5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7575",
       "triggerID" : "18b9acd3320e68ee6688ea4eec693676350a9e15",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582",
       "triggerID" : "fc9ac46f36a4df8d9d590845b9848d48af1f7cae",
       "triggerType" : "PUSH"
     }, {
       "hash" : "01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7652",
       "triggerID" : "01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fc9ac46f36a4df8d9d590845b9848d48af1f7cae Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7582) 
   * 01120c1b4a0dacaec5f3b968ac421f5faa0bc1b9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7652) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839747121



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +681,34 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropMetadataPartitions(List<MetadataPartitionType> metadataPartitions) throws IOException {
+    Set<String> completedIndexes = getCompletedMetadataPartitions(dataMetaClient.getTableConfig());
+    Set<String> inflightIndexes = getInflightMetadataPartitions(dataMetaClient.getTableConfig());
+
+    for (MetadataPartitionType partitionType : metadataPartitions) {
+      String partitionPath = partitionType.getPartitionPath();
+      // first update table config
+      if (inflightIndexes.contains(partitionPath)) {
+        inflightIndexes.remove(partitionPath);
+        dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightIndexes));
+      } else if (completedIndexes.contains(partitionPath)) {
+        completedIndexes.remove(partitionPath);
+        dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedIndexes));
+      }
+      HoodieTableConfig.update(dataMetaClient.getFs(), new Path(dataMetaClient.getMetaPath()), dataMetaClient.getTableConfig().getProps());
+      LOG.warn("Deleting Metadata Table partitions: " + partitionPath);
+      dataMetaClient.getFs().delete(new Path(metadataWriteConfig.getBasePath(), partitionPath), true);

Review comment:
       yes, this will be replaced by DELETE_PARTITION path. we just got the [lazy deletion of partitions](#4489) landed. 
   indeed there are multiple point of failure but unlike schedule/run index delete is a bit safer in terms of partial failures. we would be in trouble if partition gets deleted but table cnfig is not updated.. so we update the table config first.. if table config is updated but partitions is not fully deleted, users can re-trigger drop.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839147377



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/ThreeToFourUpgradeHandler.java
##########
@@ -35,7 +40,12 @@
   @Override
   public Map<ConfigProperty, String> upgrade(HoodieWriteConfig config, HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade upgradeDowngradeHelper) {
     Map<ConfigProperty, String> tablePropsToAdd = new Hashtable<>();
-    tablePropsToAdd.put(HoodieTableConfig.TABLE_CHECKSUM, String.valueOf(HoodieTableConfig.generateChecksum(config.getProps())));
+    tablePropsToAdd.put(TABLE_CHECKSUM, String.valueOf(HoodieTableConfig.generateChecksum(config.getProps())));
+    // if metadata is enabled and files partition exist then update TABLE_METADATA_INDEX_COMPLETED
+    // schema for the files partition is same between the two versions
+    if (config.isMetadataTableEnabled() && metadataPartitionExists(config.getBasePath(), context, MetadataPartitionType.FILES)) {
+      tablePropsToAdd.put(TABLE_METADATA_PARTITIONS, MetadataPartitionType.FILES.getPartitionPath());
+    }

Review comment:
       @zhangyue19921010 Good question! So, if no upgrade is required.. or let's say you upgraded to current version with metadata disabled and then later after few commits metadata was enabled, then this table config will get update in the metadata initialization path i.e. where `HoodieBackedTableMetadataWriter#updateInitializedPartitionsInTableConfig` is called.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on a change in pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
prashantwason commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r824490905



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -855,6 +856,21 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+  public Option<String> scheduleIndexing(List<String> partitions) {
+    String instantTime = HoodieActiveTimeline.createNewInstantTime();
+    return scheduleIndexingAtInstant(partitions, instantTime) ? Option.of(instantTime) : Option.empty();
+  }
+
+  private boolean scheduleIndexingAtInstant(List<String> partitionsToIndex, String instantTime) throws HoodieIOException {

Review comment:
       This being a private function only called from the above function, why not merge it to scheduleIndexing?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<HoodieInstant> metadataCompletedTimeline = metadataMetaClient.getActiveTimeline()
+          .getCommitsTimeline().filterCompletedInstants().getInstants().collect(Collectors.toSet());
+      List<HoodieInstant> finalRemainingInstantsToIndex = remainingInstantsToIndex.map(
+          instant -> new HoodieInstant(HoodieInstant.State.COMPLETED, HoodieTimeline.DELTA_COMMIT_ACTION, instant.getTimestamp())
+      ).filter(instant -> !metadataCompletedTimeline.contains(instant)).collect(Collectors.toList());
+
+      // index all remaining instants with a timeout
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(new PostRequestIndexingTask(metadataWriter, finalRemainingInstantsToIndex));
+      try {
+        // TODO: configure timeout
+        postRequestIndexingTaskFuture.get(60, TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      Option<HoodieInstant> lastMetadataInstant = metadataMetaClient.reloadActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();

Review comment:
       Can you add some comments describing the logic here? 

##########
File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java
##########
@@ -337,6 +339,16 @@ public HoodieRollbackMetadata rollback(HoodieEngineContext context, String rollb
     return new CopyOnWriteRollbackActionExecutor(context, config, this, rollbackInstantTime, commitInstant, deleteInstants, skipLocking).execute();
   }
 
+  @Override
+  public Option<HoodieIndexPlan> scheduleIndex(HoodieEngineContext context, String indexInstantTime, List<String> partitionsToIndex) {
+    throw new HoodieNotSupportedException("Indexing is not supported for a Flink table yet.");

Review comment:
       Metadata indexing

##########
File path: hudi-common/src/main/avro/HoodieIndexPartitionInfo.avsc
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexPartitionInfo",
+  "fields": [

Review comment:
       please add "doc" for each field

##########
File path: hudi-common/src/main/avro/HoodieIndexPartitionInfo.avsc
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexPartitionInfo",
+  "fields": [
+    {
+      "name": "version",

Review comment:
       what is the use of version here?

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleAndExecute\" to generate an indexing plan first and execute that plan immediately")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for "
+        + "hoodie client for compacting")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  private int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");

Review comment:
       This could be simply scheduleIndexing(..); followed by runIndexing(..)
      

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -855,6 +855,17 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+  public Option<String> scheduleIndexing(List<String> partitions) {

Review comment:
       +1
   
   

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java
##########
@@ -121,6 +121,15 @@ protected void initRegistry() {
     }
   }
 
+  @Override
+  protected void scheduleIndex(List<String> partitions) {
+    ValidationUtils.checkState(metadataMetaClient != null, "Metadata table is not fully initialized yet.");

Review comment:
       This means that the "files" partition cannot be indexed through async indexing? (RFC says so).
   
   I feel the files partition is integral to any other partition and is always present if MT is enabled. Also, the time to create files partition is very low. So I prefer if we keep it simple and always index files partition inline.
   
   

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),

Review comment:
       Better check may be to ensure instant state is requested 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.

Review comment:
       this is being already done in line 83 above

##########
File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java
##########
@@ -337,6 +339,16 @@ public HoodieRollbackMetadata rollback(HoodieEngineContext context, String rollb
     return new CopyOnWriteRollbackActionExecutor(context, config, this, rollbackInstantTime, commitInstant, deleteInstants, skipLocking).execute();
   }
 
+  @Override
+  public Option<HoodieIndexPlan> scheduleIndex(HoodieEngineContext context, String indexInstantTime, List<String> partitionsToIndex) {
+    throw new HoodieNotSupportedException("Indexing is not supported for a Flink table yet.");
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> index(HoodieEngineContext context, String indexInstantTime) {
+    throw new HoodieNotSupportedException("Indexing is not supported for a Flink table yet.");

Review comment:
       Metadata indexing

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkCopyOnWriteTable.java
##########
@@ -343,6 +347,16 @@ public HoodieRollbackMetadata rollback(HoodieEngineContext context, String rollb
         deleteInstants, skipLocking).execute();
   }
 
+  @Override
+  public Option<HoodieIndexPlan> scheduleIndex(HoodieEngineContext context, String indexInstantTime, List<String> partitionsToIndex) {
+    return new ScheduleIndexActionExecutor<>(context, config, this, indexInstantTime, partitionsToIndex).execute();
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> index(HoodieEngineContext context, String indexInstantTime) {
+    return new RunIndexActionExecutor<>(context, config, this, indexInstantTime).execute();

Review comment:
       Duplicate code with Java Table. Is there a base class that we can move this to?

##########
File path: hudi-common/src/main/avro/HoodieIndexCommitMetadata.avsc
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexCommitMetadata",
+  "fields": [

Review comment:
       please add "doc" for each field

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleAndExecute\" to generate an indexing plan first and execute that plan immediately")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for "
+        + "hoodie client for compacting")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  private int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");

Review comment:
       if you move all this code into scheduleIndexing(), you can simplify SCHEDULE_AND_EXECUTE and remove some code duplication.
   
   

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkCopyOnWriteTable.java
##########
@@ -343,6 +347,16 @@ public HoodieRollbackMetadata rollback(HoodieEngineContext context, String rollb
         deleteInstants, skipLocking).execute();
   }
 
+  @Override
+  public Option<HoodieIndexPlan> scheduleIndex(HoodieEngineContext context, String indexInstantTime, List<String> partitionsToIndex) {
+    return new ScheduleIndexActionExecutor<>(context, config, this, indexInstantTime, partitionsToIndex).execute();

Review comment:
       Duplicate code with Java Table. Is there a base class that we can move this to?

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")

Review comment:
       probably choose a more appropriate name like "partitions" or partitions-to-index
   

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleAndExecute\" to generate an indexing plan first and execute that plan immediately")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for "
+        + "hoodie client for compacting")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  private int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);

Review comment:
       Maybe allow only one indexing operation (or one indexing operation per partition) to be scheduled at a time.

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)

Review comment:
       why is this required to be passed here? I assume the spark-submit is a better place to specify spark params.

##########
File path: hudi-common/src/main/avro/HoodieIndexPartitionInfo.avsc
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexPartitionInfo",
+  "fields": [
+    {
+      "name": "version",
+      "type": [
+        "int",
+        "null"
+      ],
+      "default": 1
+    },
+    {
+      "name": "metadataPartitionPath",
+      "type": [
+        "null",
+        "string"
+      ],
+      "default": null
+    },
+    {
+      "name": "indexUptoInstant",

Review comment:
       Since this is assumed common across all partitions being indexed, does it make sense to move this to HoodieIndexCommitMetadata?

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleAndExecute\" to generate an indexing plan first and execute that plan immediately")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for "
+        + "hoodie client for compacting")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  private int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<String> partitionsToIndex = Arrays.asList(cfg.indexTypes.split(","));
+    Option<String> indexingInstant = client.scheduleIndexing(partitionsToIndex);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (StringUtils.isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleError(client.index(cfg.indexInstantTime));
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);

Review comment:
       code duplication here. I suggested above on how to remove this function.

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleAndExecute\" to generate an indexing plan first and execute that plan immediately")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for "
+        + "hoodie client for compacting")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  private int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<String> partitionsToIndex = Arrays.asList(cfg.indexTypes.split(","));
+    Option<String> indexingInstant = client.scheduleIndexing(partitionsToIndex);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (StringUtils.isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleError(client.index(cfg.indexInstantTime));
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleError(client.index(indexingInstantTime.get()));
+      } else {
+        return -1;
+      }
+    }
+  }
+
+  private int handleError(Option<HoodieIndexCommitMetadata> commitMetadata) {

Review comment:
       boolean seems a better return value from this func.

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleAndExecute\" to generate an indexing plan first and execute that plan immediately")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for "
+        + "hoodie client for compacting")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  private int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<String> partitionsToIndex = Arrays.asList(cfg.indexTypes.split(","));
+    Option<String> indexingInstant = client.scheduleIndexing(partitionsToIndex);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (StringUtils.isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleError(client.index(cfg.indexInstantTime));
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleError(client.index(indexingInstantTime.get()));
+      } else {
+        return -1;
+      }
+    }
+  }
+
+  private int handleError(Option<HoodieIndexCommitMetadata> commitMetadata) {
+    if (!commitMetadata.isPresent()) {
+      LOG.error("Indexing failed as no commit metadata present.");
+      return -1;
+    }

Review comment:
       Another error check should  be that indexing completed for all required partitions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1068719900


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a9f8c1316b55b72c57d18fbe8d0c8103948a30bc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930) 
   * 0d6ad6e1d8767d66b15b31bb06d1318fb08e582c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r827496666



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -659,20 +691,100 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()

Review comment:
       guess we have to fix this to read from table Properties ?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -659,20 +691,100 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();
+      }
     }
+    // TODO: return only enabled partitions
+    return MetadataPartitionType.allPaths();

Review comment:
       why we return all partitions? what incase of the following:
   1. if someone migrated to 0.11 from 0.10. but files partition was already present.
   2. (1) + added 1 new metadata partition and is inflight. 
   3. (2) + 1 new partition is completed. 
   
   can you help me understand what this method would return in all these cases. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()

Review comment:
       can we move the catch up indexing to a separate method 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -588,10 +609,87 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
-      commit(engineContext.parallelize(records, 1), MetadataPartitionType.FILES.partitionPath(), instantTime, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
+        commit(engineContext.parallelize(records, 1), p, instantTime, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();

Review comment:
       we can't fallback to fetching all partitions right. some could be inflight and not fully completed wrt index building. or am I missing something

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -620,8 +636,14 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
 
     LOG.info(String.format("Creating %d file groups for partition %s with base fileId %s at instant time %s",
         fileGroupCount, metadataPartition.getPartitionPath(), metadataPartition.getFileIdPrefix(), instantTime));
+    HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+    List<FileSlice> fileSlices = HoodieTableMetadataUtil.getPartitionLatestFileSlices(metadataMetaClient, Option.ofNullable(fsView), metadataPartition.getPartitionPath());
     for (int i = 0; i < fileGroupCount; ++i) {
       final String fileGroupFileId = String.format("%s%04d", metadataPartition.getFileIdPrefix(), i);
+      // if a writer or async indexer had already initialized the filegroup then continue
+      if (!fileSlices.isEmpty() && fileSlices.stream().anyMatch(fileSlice -> fileGroupFileId.equals(fileSlice.getFileGroupId().getFileId()))) {
+        continue;

Review comment:
       can you help me understand how does partially failed filegroup instantiation is handled. Do we clean up all file groups and start from scratch or do we continue from where we left ? I mean, if indexer restarts next time around. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -659,20 +691,100 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);

Review comment:
       why are we initializing file groups here? if I am not wrong, this is called in synchronous code path where data table is looking to apply a commit to MDT. with async metadata indexing, wouldn't the scheduling takes responsibility of initializing the file groups. 
   

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -659,20 +691,100 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();

Review comment:
       how does this work for a table that migrated from 0.10.0 for eg. they may not have  added "files" partition to table properties right? i.e. list of fully completed metadata partitions. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/bloom/BloomFilter.java
##########
@@ -30,6 +34,13 @@
    */
   void add(String key);
 
+  /**
+   * Add secondary key to the {@link BloomFilter}.
+   *
+   * @param keys list of secondary keys to add to the {@link BloomFilter}
+   */
+  void add(@Nonnull List<String> keys);

Review comment:
       can you help me understand the purpose of adding the secondary keys. bcoz, I don't see similar method to mightContain for secondary keys. 
   Also, do you think we can name the method conveying secondary keys. 
   addSecondaryKeys() may be.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -659,20 +691,100 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();
+      }
     }
+    // TODO: return only enabled partitions
+    return MetadataPartitionType.allPaths();
+  }
+
+  private List<String> getExistingMetadataPartitions() {
+    return MetadataPartitionType.allPaths().stream()
+        .filter(p -> {
+          try {
+            // TODO: avoid fs.exists() check
+            return metadataMetaClient.getFs().exists(FSUtils.getPartitionPath(metadataWriteConfig.getBasePath(), p));
+          } catch (IOException e) {
+            return false;
+          }
+        })
+        .collect(Collectors.toList());
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String indexUptoInstantTime = indexPartitionInfo.getIndexUptoInstant();
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        HoodieTableMetaClient.withPropertyBuilder()
+            .setTableType(HoodieTableType.MERGE_ON_READ)
+            .setTableName(tableName)
+            .setArchiveLogFolder(ARCHIVELOG_FOLDER.defaultValue())
+            .setPayloadClassName(HoodieMetadataPayload.class.getName())
+            .setBaseFileFormat(HoodieFileFormat.HFILE.toString())
+            .setRecordKeyFields(RECORD_KEY_FIELD_NAME)
+            .setPopulateMetaFields(dataWriteConfig.getMetadataConfig().populateMetaFields())
+            .setKeyGeneratorClassProp(HoodieTableMetadataKeyGenerator.class.getCanonicalName())
+            .initTable(hadoopConf.get(), metadataWriteConfig.getBasePath());

Review comment:
       shouldn't we initTable only for the first time when MDT is getting instantiated for the first time. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -641,12 +663,22 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    // TODO: update table config and do it in a transaction

Review comment:
       please file a tracking ticket if we don't have one. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java
##########
@@ -19,17 +19,28 @@
 package org.apache.hudi.metadata;
 
 import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
 import org.apache.hudi.avro.model.HoodieRestoreMetadata;
 import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
 import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
 
+import java.io.IOException;
 import java.io.Serializable;
+import java.util.List;
 
 /**
  * Interface that supports updating metadata for a given table, as actions complete.
  */
 public interface HoodieTableMetadataWriter extends Serializable, AutoCloseable {
 
+  void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos);

Review comment:
       java docs

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ *   1. Fetch last completed instant on data timeline.
+ *   2. Write the index plan to the <instant>.index.requested.
+ *   3. Initialize filegroups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!MetadataPartitionType.allPaths().containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // get last completed instant
+    Option<HoodieInstant> indexUptoInstant = table.getActiveTimeline().filterCompletedInstants().lastInstant();
+    if (indexUptoInstant.isPresent()) {
+      final HoodieInstant indexInstant = HoodieTimeline.getIndexRequestedInstant(instantTime);
+      // for each partitionToIndex add that time to the plan
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = partitionsToIndex.stream()
+          .map(p -> new HoodieIndexPartitionInfo(LATEST_INDEX_PLAN_VERSION, p.getPartitionPath(), indexUptoInstant.get().getTimestamp()))
+          .collect(Collectors.toList());
+      HoodieIndexPlan indexPlan = new HoodieIndexPlan(LATEST_INDEX_PLAN_VERSION, indexPartitionInfos);
+      try {
+        table.getActiveTimeline().saveToPendingIndexCommit(indexInstant, TimelineMetadataUtils.serializeIndexPlan(indexPlan));
+      } catch (IOException e) {
+        LOG.error("Error while saving index requested file", e);
+        throw new HoodieIOException(e.getMessage(), e);
+      }
+      table.getMetaClient().reloadActiveTimeline();
+
+      // start initializing filegroups
+      // 1. get metadata writer
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // 2. take a lock --> begin tx (data table)
+      try {
+        this.txnManager.beginTransaction(Option.of(indexInstant), Option.empty());
+        // 3. initialize filegroups as per plan for the enabled partition types
+        for (MetadataPartitionType partitionType : partitionsToIndex) {
+          metadataWriter.initializeFileGroups(table.getMetaClient(), partitionType, indexInstant.getTimestamp(), 1);
+        }
+      } catch (IOException e) {
+        LOG.error("Could not initialize file groups");
+        throw new HoodieIOException(e.getMessage(), e);
+      } finally {
+        this.txnManager.endTransaction(Option.of(indexInstant));
+      }
+      return Option.of(indexPlan);
+    }
+    return Option.empty();

Review comment:
       if someone triggers this for an empty table, whats the expected behavior? do we update tableConfig that index building is complete? 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -659,20 +691,100 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();
+      }
     }
+    // TODO: return only enabled partitions
+    return MetadataPartitionType.allPaths();
+  }
+
+  private List<String> getExistingMetadataPartitions() {
+    return MetadataPartitionType.allPaths().stream()
+        .filter(p -> {
+          try {
+            // TODO: avoid fs.exists() check
+            return metadataMetaClient.getFs().exists(FSUtils.getPartitionPath(metadataWriteConfig.getBasePath(), p));
+          } catch (IOException e) {
+            return false;
+          }
+        })
+        .collect(Collectors.toList());
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String indexUptoInstantTime = indexPartitionInfo.getIndexUptoInstant();
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        HoodieTableMetaClient.withPropertyBuilder()
+            .setTableType(HoodieTableType.MERGE_ON_READ)
+            .setTableName(tableName)
+            .setArchiveLogFolder(ARCHIVELOG_FOLDER.defaultValue())
+            .setPayloadClassName(HoodieMetadataPayload.class.getName())
+            .setBaseFileFormat(HoodieFileFormat.HFILE.toString())
+            .setRecordKeyFields(RECORD_KEY_FIELD_NAME)
+            .setPopulateMetaFields(dataWriteConfig.getMetadataConfig().populateMetaFields())
+            .setKeyGeneratorClassProp(HoodieTableMetadataKeyGenerator.class.getCanonicalName())
+            .initTable(hadoopConf.get(), metadataWriteConfig.getBasePath());
+        initTableMetadata();
+        // this part now moves to scheduling
+        initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT)), indexUptoInstantTime, 1);
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // List all partitions in the basePath of the containing dataset
+      LOG.info("Initializing metadata table by using file listings in " + dataWriteConfig.getBasePath());
+      engineContext.setJobStatus(this.getClass().getSimpleName(), "MetadataIndex: initializing metadata table by listing files and partitions");
+      List<DirectoryInfo> dirInfoList = listAllPartitions(dataMetaClient);
+
+      // During bootstrap, the list of files to be committed can be huge. So creating a HoodieCommitMetadata out of these
+      // large number of files and calling the existing update(HoodieCommitMetadata) function does not scale well.
+      // Hence, we have a special commit just for the bootstrap scenario.
+      initialCommit(indexUptoInstantTime);

Review comment:
       is this applicable only for the initialization of first partition in the metadata table? 
   If not, for subsequent partitions, shouldn't the intialCommit take in a list of metadata partitions to be initialized? 
   sorry. guess I am missing something here.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {

Review comment:
       Did we add any additional/explicit metrics for async metadata indexer? time for base file initialization, time for catch up etc. 
   

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();

Review comment:
       this might need some thought. Lets think about all diff scenarios. 
   MDT partition1 was already built out. 
   MDT partition2 is triggered index building. 
   In this case, would compaction kick in just for partition1 in MDT or do we block any compaction in general? 
   Also, how do we guard the archival in MDT timeline in this case. 
   

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleAndExecute\" to generate an indexing plan first and execute that plan immediately")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for "
+        + "hoodie client for compacting")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
+    HoodieIndexer indexer = new HoodieIndexer(jsc, cfg);
+    int result = indexer.start(cfg.retry);
+    String resultMsg = String.format("Indexing with basePath: %s, tableName: %s, runningMode: %s",
+        cfg.basePath, cfg.tableName, cfg.runningMode);
+    if (result == -1) {
+      LOG.error(resultMsg + " failed");
+    } else {
+      LOG.info(resultMsg + " success");
+    }
+    jsc.stop();
+  }
+
+  private int start(int retry) {
+    return UtilHelpers.retry(retry, () -> {
+      switch (cfg.runningMode.toLowerCase()) {
+        case SCHEDULE: {
+          LOG.info("Running Mode: [" + SCHEDULE + "]; Do schedule");
+          Option<String> instantTime = scheduleIndexing(jsc);
+          int result = instantTime.isPresent() ? 0 : -1;
+          if (result == 0) {
+            LOG.info("The schedule instant time is " + instantTime.get());
+          }
+          return result;
+        }
+        case SCHEDULE_AND_EXECUTE: {
+          LOG.info("Running Mode: [" + SCHEDULE_AND_EXECUTE + "]");
+          return scheduleAndRunIndexing(jsc);
+        }
+        case EXECUTE: {
+          LOG.info("Running Mode: [" + EXECUTE + "];");
+          return runIndexing(jsc);
+        }
+        default: {
+          LOG.info("Unsupported running mode [" + cfg.runningMode + "], quit the job directly");
+          return -1;
+        }
+      }
+    }, "Indexer failed");
+  }
+
+  private Option<String> scheduleIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      return doSchedule(client);
+    }
+  }
+
+  private Option<String> doSchedule(SparkRDDWriteClient<HoodieRecordPayload> client) {
+    List<String> partitionsToIndex = Arrays.asList(cfg.indexTypes.split(","));
+    List<MetadataPartitionType> partitionTypes = partitionsToIndex.stream()
+        .map(MetadataPartitionType::valueOf).collect(Collectors.toList());
+    Option<String> indexingInstant = client.scheduleIndexing(partitionTypes);
+    if (!indexingInstant.isPresent()) {
+      LOG.error("Scheduling of index action did not return any instant.");
+    }
+    return indexingInstant;
+  }
+
+  private int runIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      if (StringUtils.isNullOrEmpty(cfg.indexInstantTime)) {
+        // Instant time is not specified
+        // Find the earliest scheduled indexing instant for execution
+        Option<HoodieInstant> earliestPendingIndexInstant = metaClient.getActiveTimeline()
+            .filterPendingIndexTimeline()
+            .filter(i -> !(i.isCompleted() || INFLIGHT.equals(i.getState())))
+            .firstInstant();
+        if (earliestPendingIndexInstant.isPresent()) {
+          cfg.indexInstantTime = earliestPendingIndexInstant.get().getTimestamp();
+          LOG.info("Found the earliest scheduled indexing instant which will be executed: "
+              + cfg.indexInstantTime);
+        } else {
+          throw new HoodieIndexException("There is no scheduled indexing in the table.");
+        }
+      }
+      return handleError(client.index(cfg.indexInstantTime));
+    }
+  }
+
+  private int scheduleAndRunIndexing(JavaSparkContext jsc) throws Exception {
+    String schemaStr = UtilHelpers.getSchemaFromLatestInstant(metaClient);
+    try (SparkRDDWriteClient<HoodieRecordPayload> client = UtilHelpers.createHoodieClient(jsc, cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+      Option<String> indexingInstantTime = doSchedule(client);
+      if (indexingInstantTime.isPresent()) {
+        return handleError(client.index(indexingInstantTime.get()));

Review comment:
       handleResponse may be better name

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -608,7 +624,7 @@ private void initializeEnabledFileGroups(HoodieTableMetaClient dataMetaClient, S
    * File groups will be named as :
    *    record-index-bucket-0000, .... -> ..., record-index-bucket-0009
    */
-  private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, MetadataPartitionType metadataPartition, String instantTime,
+  public void initializeFileGroups(HoodieTableMetaClient dataMetaClient, MetadataPartitionType metadataPartition, String instantTime,

Review comment:
       Can we check the bootstrapping code snippet. for eg, we check latest synced instant in metadata table and check if its already archived in data table. 
   With multiple partitions, each partition could be instantiated at different points in time. Can we check all such guards/conditions and ensure its all intact with latest state of metadata table. 
   

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "

Review comment:
       how is this diff from runSchedule param. its bit confusing. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -889,6 +890,33 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
     }
   }
 
+  /**
+   * Get the column names for the table for column stats indexing
+   *
+   * @param recordsGenerationParams - all parameters required to generate metadata index for enabled index types
+   * @return List of column names for which column stats index is enabled
+   */
+  private static List<String> getColumnsToIndex(MetadataRecordsGenerationParams recordsGenerationParams) {
+    if (!recordsGenerationParams.isAllColumnStatsIndexEnabled()
+        || recordsGenerationParams.getDataMetaClient().getCommitsTimeline().filterCompletedInstants().countInstants() < 1) {
+      return Arrays.asList(recordsGenerationParams.getDataMetaClient().getTableConfig().getRecordKeyFieldProp().split(","));
+    }
+
+    if (!recordsGenerationParams.getColumnsToIndex().isEmpty()) {
+      return recordsGenerationParams.getColumnsToIndex();
+    }
+
+    TableSchemaResolver schemaResolver = new TableSchemaResolver(recordsGenerationParams.getDataMetaClient());
+    // consider nested fields as well. if column stats is enabled only for a subset of columns,

Review comment:
       guess part of the comment can be removed. 

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleAndExecute\" to generate an indexing plan first and execute that plan immediately")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for "
+        + "hoodie client for compacting")

Review comment:
       minor. "compacting" -> "indexing"

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<HoodieInstant> metadataCompletedTimeline = metadataMetaClient.getActiveTimeline()
+          .getCommitsTimeline().filterCompletedInstants().getInstants().collect(Collectors.toSet());
+      List<HoodieInstant> finalRemainingInstantsToIndex = remainingInstantsToIndex.map(
+          instant -> new HoodieInstant(HoodieInstant.State.COMPLETED, HoodieTimeline.DELTA_COMMIT_ACTION, instant.getTimestamp())
+      ).filter(instant -> !metadataCompletedTimeline.contains(instant)).collect(Collectors.toList());
+
+      // index all remaining instants with a timeout
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(new PostRequestIndexingTask(metadataWriter, finalRemainingInstantsToIndex));
+      try {
+        // TODO: configure timeout
+        postRequestIndexingTaskFuture.get(60, TimeUnit.SECONDS);

Review comment:
       60 secs is too short. if there are 100+ instants to catch up, would we complete in 60 secs.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -641,12 +663,22 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    // TODO: update table config and do it in a transaction

Review comment:
       If a writer is holding onto an instance of hoodieTableConfig, it may not refresh from time to time right. So, if a partition was deleted mid-way, when the writer tries to apply a commit to metadata table, wont hoodieTableConfig.getMetadataPartitionsToUpdate() return stale values? 
   Do we ensure such flow succeeds even if there are partitions to update, but actual MD partition is deleted? 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ *   1. Fetch last completed instant on data timeline.
+ *   2. Write the index plan to the <instant>.index.requested.
+ *   3. Initialize filegroups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!MetadataPartitionType.allPaths().containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // get last completed instant
+    Option<HoodieInstant> indexUptoInstant = table.getActiveTimeline().filterCompletedInstants().lastInstant();
+    if (indexUptoInstant.isPresent()) {
+      final HoodieInstant indexInstant = HoodieTimeline.getIndexRequestedInstant(instantTime);
+      // for each partitionToIndex add that time to the plan
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = partitionsToIndex.stream()
+          .map(p -> new HoodieIndexPartitionInfo(LATEST_INDEX_PLAN_VERSION, p.getPartitionPath(), indexUptoInstant.get().getTimestamp()))
+          .collect(Collectors.toList());
+      HoodieIndexPlan indexPlan = new HoodieIndexPlan(LATEST_INDEX_PLAN_VERSION, indexPartitionInfos);
+      try {
+        table.getActiveTimeline().saveToPendingIndexCommit(indexInstant, TimelineMetadataUtils.serializeIndexPlan(indexPlan));
+      } catch (IOException e) {
+        LOG.error("Error while saving index requested file", e);
+        throw new HoodieIOException(e.getMessage(), e);
+      }
+      table.getMetaClient().reloadActiveTimeline();
+
+      // start initializing filegroups
+      // 1. get metadata writer
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // 2. take a lock --> begin tx (data table)
+      try {
+        this.txnManager.beginTransaction(Option.of(indexInstant), Option.empty());
+        // 3. initialize filegroups as per plan for the enabled partition types
+        for (MetadataPartitionType partitionType : partitionsToIndex) {
+          metadataWriter.initializeFileGroups(table.getMetaClient(), partitionType, indexInstant.getTimestamp(), 1);

Review comment:
       guess last arg is partitionType.getFileGroupCount()
   

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")

Review comment:
       +1

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()

Review comment:
       there could be some instants in data table timeline that got archived. did we consider those scenarios.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<HoodieInstant> metadataCompletedTimeline = metadataMetaClient.getActiveTimeline()
+          .getCommitsTimeline().filterCompletedInstants().getInstants().collect(Collectors.toSet());
+      List<HoodieInstant> finalRemainingInstantsToIndex = remainingInstantsToIndex.map(

Review comment:
       I see we fetch all instants (pending, complete) at L106. so, I assume finalRemainingInstantsToIndex could contain inflight commits as well. And so, there are chances that when executing PostRequestIndexingTask, the actual writer would have already applied the commit to MDT. have we considered this scenario. 
   

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--schedule", "-sc"}, description = "Schedule indexing")
+    public Boolean runSchedule = false;
+    @Parameter(names = {"--strategy", "-st"}, description = "Comma-separated index types to be built, e.g. BLOOM,FILES,COLSTATS")
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleAndExecute\" to generate an indexing plan first and execute that plan immediately")

Review comment:
       is there a necessity to add cancelIndexing operation ?

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java
##########
@@ -121,6 +121,15 @@ protected void initRegistry() {
     }
   }
 
+  @Override
+  protected void scheduleIndex(List<String> partitions) {
+    ValidationUtils.checkState(metadataMetaClient != null, "Metadata table is not fully initialized yet.");

Review comment:
       can you confirm this. for "files", we always do synchronous initialization is it? 
   what happens, if during synchronous initialization of metadata table, someone schedules "col_stats" partition indexing via the tool. Do we guard the writes/critical section w/ a lock? 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<HoodieInstant> metadataCompletedTimeline = metadataMetaClient.getActiveTimeline()
+          .getCommitsTimeline().filterCompletedInstants().getInstants().collect(Collectors.toSet());
+      List<HoodieInstant> finalRemainingInstantsToIndex = remainingInstantsToIndex.map(
+          instant -> new HoodieInstant(HoodieInstant.State.COMPLETED, HoodieTimeline.DELTA_COMMIT_ACTION, instant.getTimestamp())
+      ).filter(instant -> !metadataCompletedTimeline.contains(instant)).collect(Collectors.toList());
+
+      // index all remaining instants with a timeout
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(new PostRequestIndexingTask(metadataWriter, finalRemainingInstantsToIndex));
+      try {
+        // TODO: configure timeout
+        postRequestIndexingTaskFuture.get(60, TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      Option<HoodieInstant> lastMetadataInstant = metadataMetaClient.reloadActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
+      if (lastMetadataInstant.isPresent() && indexUptoInstant.equals(lastMetadataInstant.get().getTimestamp())) {
+        return Option.of(HoodieIndexCommitMetadata.newBuilder()
+            .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(indexPartitionInfos).build());
+      }
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              lastMetadataInstant.get().getTimestamp())).collect(Collectors.toList());
+      return Option.of(HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build());
+    } catch (IOException e) {
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  class PostRequestIndexingTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+
+    PostRequestIndexingTask(HoodieTableMetadataWriter metadataWriter, List<HoodieInstant> instantsToIndex) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+    }
+
+    @Override
+    public void run() {
+      while (!Thread.interrupted()) {
+        for (HoodieInstant instant : instantsToIndex) {

Review comment:
       don't we need to take a lock here? 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<HoodieInstant> metadataCompletedTimeline = metadataMetaClient.getActiveTimeline()
+          .getCommitsTimeline().filterCompletedInstants().getInstants().collect(Collectors.toSet());
+      List<HoodieInstant> finalRemainingInstantsToIndex = remainingInstantsToIndex.map(
+          instant -> new HoodieInstant(HoodieInstant.State.COMPLETED, HoodieTimeline.DELTA_COMMIT_ACTION, instant.getTimestamp())
+      ).filter(instant -> !metadataCompletedTimeline.contains(instant)).collect(Collectors.toList());
+
+      // index all remaining instants with a timeout
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(new PostRequestIndexingTask(metadataWriter, finalRemainingInstantsToIndex));
+      try {
+        // TODO: configure timeout
+        postRequestIndexingTaskFuture.get(60, TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      Option<HoodieInstant> lastMetadataInstant = metadataMetaClient.reloadActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
+      if (lastMetadataInstant.isPresent() && indexUptoInstant.equals(lastMetadataInstant.get().getTimestamp())) {
+        return Option.of(HoodieIndexCommitMetadata.newBuilder()
+            .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(indexPartitionInfos).build());
+      }
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              lastMetadataInstant.get().getTimestamp())).collect(Collectors.toList());
+      return Option.of(HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build());
+    } catch (IOException e) {

Review comment:
       sorry, where are we checking the holes and aborting the index building ? 

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkCopyOnWriteTable.java
##########
@@ -343,6 +347,16 @@ public HoodieRollbackMetadata rollback(HoodieEngineContext context, String rollb
         deleteInstants, skipLocking).execute();
   }
 
+  @Override
+  public Option<HoodieIndexPlan> scheduleIndex(HoodieEngineContext context, String indexInstantTime, List<String> partitionsToIndex) {
+    return new ScheduleIndexActionExecutor<>(context, config, this, indexInstantTime, partitionsToIndex).execute();

Review comment:
       this is just 1 line. don't think its a must to move to base class. will leave it to you though. Already we have similar code across all engines. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);

Review comment:
       don't we need locking here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1068757919


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d6ad6e1d8767d66b15b31bb06d1318fb08e582c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835762079



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,12 +669,36 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  public void dropIndex(List<MetadataPartitionType> indexesToDrop) throws IOException {
+    Set<String> completedIndexes = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    Set<String> inflightIndexes = Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    for (MetadataPartitionType partitionType : indexesToDrop) {
+      String partitionPath = partitionType.getPartitionPath();
+      if (inflightIndexes.contains(partitionPath)) {
+        LOG.error("Metadata indexing in progress: " + partitionPath);
+        return;
+      }
+      LOG.warn("Deleting Metadata Table partitions: " + partitionPath);
+      dataMetaClient.getFs().delete(new Path(metadataWriteConfig.getBasePath(), partitionPath), true);
+      completedIndexes.remove(partitionPath);
+    }
+    // update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), String.join(",", completedIndexes));

Review comment:
       > should we not first update the table config and then delete the partitions
   
   yes yes good catch! i did fix this, not sure if i missed while rebasing.
   
   > Other writes who are holding on to an in memory table property are not going to get an updated value if we update here.
   
   Your idea is good but waiting for a minute only reuces the probability of failure. 
   Also note that, index is being dropped within a lock. I think drop index is not something which user would do very frequently. 
   
   To support fully conurrent writes, I know mysql lazily drops the index i.e. simply mark the current index as deleted and physically delete later whenever no other writer isr referencing the index. We can do something similar. Tracking here https://issues.apache.org/jira/browse/HUDI-3718




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835767080



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<HoodieInstant> metadataCompletedTimeline = metadataMetaClient.getActiveTimeline()
+          .getCommitsTimeline().filterCompletedInstants().getInstants().collect(Collectors.toSet());
+      List<HoodieInstant> finalRemainingInstantsToIndex = remainingInstantsToIndex.map(
+          instant -> new HoodieInstant(HoodieInstant.State.COMPLETED, HoodieTimeline.DELTA_COMMIT_ACTION, instant.getTimestamp())
+      ).filter(instant -> !metadataCompletedTimeline.contains(instant)).collect(Collectors.toList());
+
+      // index all remaining instants with a timeout
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(new PostRequestIndexingTask(metadataWriter, finalRemainingInstantsToIndex));
+      try {
+        // TODO: configure timeout
+        postRequestIndexingTaskFuture.get(60, TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      Option<HoodieInstant> lastMetadataInstant = metadataMetaClient.reloadActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
+      if (lastMetadataInstant.isPresent() && indexUptoInstant.equals(lastMetadataInstant.get().getTimestamp())) {
+        return Option.of(HoodieIndexCommitMetadata.newBuilder()
+            .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(indexPartitionInfos).build());
+      }
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              lastMetadataInstant.get().getTimestamp())).collect(Collectors.toList());
+      return Option.of(HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build());
+    } catch (IOException e) {
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  class PostRequestIndexingTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+
+    PostRequestIndexingTask(HoodieTableMetadataWriter metadataWriter, List<HoodieInstant> instantsToIndex) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+    }
+
+    @Override
+    public void run() {
+      while (!Thread.interrupted()) {
+        for (HoodieInstant instant : instantsToIndex) {

Review comment:
       yes, added a lock.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835775436



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/bloom/BloomFilter.java
##########
@@ -30,6 +34,13 @@
    */
   void add(String key);
 
+  /**
+   * Add secondary key to the {@link BloomFilter}.
+   *
+   * @param keys list of secondary keys to add to the {@link BloomFilter}
+   */
+  void add(@Nonnull List<String> keys);

Review comment:
       it's just a wrapper to call the original `add(Key)`.. membership test will still happen per key.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835776000



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -208,6 +208,18 @@
       .sinceVersion("0.11.0")
       .withDocumentation("Table checksum is used to guard against partial writes in HDFS. It is added as the last entry in hoodie.properties and then used to validate while reading table config.");
 
+  public static final ConfigProperty<String> TABLE_METADATA_INDEX_INFLIGHT = ConfigProperty
+      .key("hoodie.table.metadata.index.inflight")

Review comment:
       will do.. `indexes`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835778181



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;
+    @Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set \"schedule\" to generate an indexing plan; "
+        + "Set \"execute\" to execute the indexing plan at the given instant, which means --instant-time is required here; "
+        + "Set \"scheduleandExecute\" to generate an indexing plan first and execute that plan immediately;"
+        + "Set \"dropindex\" to drop the index types specified in --index-types;")
+    public String runningMode = null;
+    @Parameter(names = {"--help", "-h"}, help = true)
+    public Boolean help = false;
+
+    @Parameter(names = {"--props"}, description = "path to properties file on localfs or dfs, with configurations for hoodie client for indexing")
+    public String propsFilePath = null;
+
+    @Parameter(names = {"--hoodie-conf"}, description = "Any configuration that can be set in the properties file "
+        + "(using the CLI parameter \"--props\") can also be passed command line using this parameter. This can be repeated",
+        splitter = IdentitySplitter.class)
+    public List<String> configs = new ArrayList<>();
+  }
+
+  public static void main(String[] args) {
+    final HoodieIndexer.Config cfg = new HoodieIndexer.Config();
+    JCommander cmd = new JCommander(cfg, null, args);
+
+    if (cfg.help || args.length == 0) {
+      cmd.usage();
+      System.exit(1);
+    }
+
+    final JavaSparkContext jsc = UtilHelpers.buildSparkContext("indexing-" + cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);

Review comment:
       can we validate hoodie.metadata.enable is set to true. if not, let's throw an exception. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835778602



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.MetadataPartitionType;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.jetbrains.annotations.TestOnly;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+/**
+ * A tool to run metadata indexing asynchronously.
+ * <p>
+ * Example command (assuming indexer.properties contains related index configs, see {@link org.apache.hudi.common.config.HoodieMetadataConfig} for configs):
+ * <p>
+ * spark-submit \
+ * --class org.apache.hudi.utilities.HoodieIndexer \
+ * /path/to/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar \
+ * --props /path/to/indexer.properties \
+ * --mode scheduleAndExecute \
+ * --base-path /tmp/hudi_trips_cow \
+ * --table-name hudi_trips_cow \
+ * --index-types COLUMN_STATS \
+ * --parallelism 1 \
+ * --spark-memory 1g
+ * <p>
+ * A sample indexer.properties file:
+ * <p>
+ * hoodie.metadata.index.async=true
+ * hoodie.metadata.index.column.stats.enable=true
+ * hoodie.metadata.index.check.timeout.seconds=60
+ * hoodie.write.concurrency.mode=optimistic_concurrency_control
+ * hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
+ */
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+  private static final String DROP_INDEX = "dropindex";
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)
+    public String sparkMemory = null;
+    @Parameter(names = {"--retry", "-rt"}, description = "number of retries")
+    public int retry = 0;
+    @Parameter(names = {"--index-types", "-ixt"}, description = "Comma-separated index types to be built, e.g. BLOOM_FILTERS,COLUMN_STATS", required = true)
+    public String indexTypes = null;

Review comment:
       can we remove FILES partition if someone added to this list. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835779230



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -392,6 +398,12 @@ public void initTableMetadata() {
     }
 
     if (!exists) {
+      if (metadataWriteConfig.isMetadataAsyncIndex()) {

Review comment:
       guess we missed one flow here. 
   lets say someone brought down every writer. And starts HoodieIndexer for the first time by enabling all partitions with the intention that everything will get built out. I see we come to this code path and eventually call scheduleIndex in L404, including FILES partition. Guess the plan is to always follow synchronous code path for FILES partition. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835780906



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);

Review comment:
       add a line/doc here that this does the base data file building. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835785960



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();

Review comment:
       we could do something similar for archival as well. if some partition is being built out, we should pause the archival in MDT.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077540636


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e58990e296aa5125807a4b96269fa7a06c885e69 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282) 
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1032499400


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 06c6dd9db383efa291c999d5f0140e5d2493eeaf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709) 
   * 7920cb15d99cd92ea2a3e6bd515249eb63040772 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1022409891


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025540372


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) 
   * c5c563ffa6625d610c9c6bd252457129ce5ccddc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025537756


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) 
   * c5c563ffa6625d610c9c6bd252457129ce5ccddc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1067120688


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a9f8c1316b55b72c57d18fbe8d0c8103948a30bc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r838101454



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -925,6 +928,53 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+
+  /**
+   * Schedules INDEX action.
+   *
+   * @param partitionTypes - list of {@link MetadataPartitionType} which needs to be indexed
+   * @return instant time for the requested INDEX action
+   */
+  public Option<String> scheduleIndexing(List<MetadataPartitionType> partitionTypes) {
+    String instantTime = HoodieActiveTimeline.createNewInstantTime();
+    Option<HoodieIndexPlan> indexPlan = createTable(config, hadoopConf, config.isMetadataTableEnabled())
+        .scheduleIndex(context, instantTime, partitionTypes);
+    return indexPlan.isPresent() ? Option.of(instantTime) : Option.empty();
+  }
+
+  /**
+   * Runs INDEX action to build out the metadata partitions as planned for the given instant time.
+   *
+   * @param indexInstantTime - instant time for the requested INDEX action
+   * @return {@link Option<HoodieIndexCommitMetadata>} after successful indexing.
+   */
+  public Option<HoodieIndexCommitMetadata> index(String indexInstantTime) {
+    return createTable(config, hadoopConf, config.isMetadataTableEnabled()).index(context, indexInstantTime);
+  }
+
+  /**
+   * Drops the index and removes the metadata partitions.
+   *
+   * @param partitionTypes - list of {@link MetadataPartitionType} which needs to be indexed
+   */
+  public void dropIndex(List<MetadataPartitionType> partitionTypes) {

Review comment:
       will add a test for dropIndex.. the scheduleIndex and buildIndex APIs are covered in a deltastreamer test. i'll add more failure scenarios in TestHoodieIndexer.

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -123,6 +123,22 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
     }
   }
 
+  /**
+   * Check if the given metadata partition exists.
+   *
+   * @param basePath base path of the dataset
+   * @param context  instance of {@link HoodieEngineContext}.
+   */
+  public static boolean metadataPartitionExists(String basePath, HoodieEngineContext context, MetadataPartitionType partitionType) {

Review comment:
       this is called only in the table upgrade path.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r838143398



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.EnumSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ * 1. Fetch last completed instant on data timeline.
+ * 2. Write the index plan to the <instant>.index.requested.
+ * 3. Initialize file groups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {

Review comment:
       yes, we can deduce from the config. So basically all that users should be concerned about: a) what they wanna do (create/drop), b) which indexes, c) which columns
   and we populate config and write this to the plan. I can take it up as a follow-up.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1082347853


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be08ba499bb88d8a00f20695b360336853be708e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7511",
       "triggerID" : "be08ba499bb88d8a00f20695b360336853be708e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520",
       "triggerID" : "010de76ddd6c0201db746a13a5b04fc5e94125d4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 010de76ddd6c0201db746a13a5b04fc5e94125d4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7520) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1068757919






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1070988022


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d6ad6e1d8767d66b15b31bb06d1318fb08e582c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990) 
   * 680a99a669d9e2c2e81465efe8e491812e6c3012 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066939072


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925) 
   * 5c1c7e91b5f530907cda50135fef8286ee8a8e38 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066934524


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925) 
   * 5c1c7e91b5f530907cda50135fef8286ee8a8e38 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066754388


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4a036d809018043ed0d99adccbe0efdfd920284a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918) 
   * 6a577410251d17a1f2b9e782ded4908fec9977a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066666446


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4a036d809018043ed0d99adccbe0efdfd920284a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839834098



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      catchupWithInflightWriters(metadataWriter, instantsToCatchup, metadataMetaClient, metadataCompletedTimestamps);
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      updateTableConfigAndTimeline(indexInstant, finalIndexPartitionInfos, indexCommitMetadata);
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      // abort gracefully
+      abort(indexInstant, indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private void abort(HoodieInstant indexInstant, Set<String> requestedPartitions) {
+    Set<String> inflightPartitions = getInflightMetadataPartitions(table.getMetaClient().getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(table.getMetaClient().getTableConfig());
+    // delete metadata partition
+    requestedPartitions.forEach(partition -> {
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(partition.toUpperCase(Locale.ROOT));
+      if (metadataPartitionExists(table.getMetaClient().getBasePath(), context, partitionType)) {
+        deleteMetadataPartition(table.getMetaClient().getBasePath(), context, partitionType);
+      }
+      inflightPartitions.remove(partition);
+      completedPartitions.remove(partition);
+    });
+    // update table config
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(table.getMetaClient().getFs(), new Path(table.getMetaClient().getMetaPath()), table.getMetaClient().getTableConfig().getProps());
+    // delete inflight instant
+    table.getMetaClient().reloadActiveTimeline().deleteInstantFileIfExists(HoodieTimeline.getIndexInflightInstant(indexInstant.getTimestamp()));
+  }
+
+  private List<HoodieInstant> getInstantsToCatchup(String indexUptoInstant) {
+    // since only write timeline was considered while scheduling index, which gives us the indexUpto instant
+    // here we consider other valid actions to pick catchupStart instant
+    Set<String> validActions = CollectionUtils.createSet(CLEAN_ACTION, RESTORE_ACTION, ROLLBACK_ACTION);
+    HoodieInstant catchupStartInstant = table.getMetaClient().reloadActiveTimeline()
+        .getTimelineOfActions(validActions)
+        .filterInflightsAndRequested()
+        .findInstantsBefore(indexUptoInstant)
+        .firstInstant().orElseGet(() -> null);
+    // get all instants since the plan completed (both from active timeline and archived timeline)
+    List<HoodieInstant> instantsToIndex;
+    if (catchupStartInstant != null) {
+      instantsToIndex = getRemainingArchivedAndActiveInstantsSince(catchupStartInstant.getTimestamp(), table.getMetaClient());
+    } else {
+      instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+    }
+    return instantsToIndex;
+  }
+
+  private HoodieInstant validateAndGetIndexInstant() {
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    return table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+  }
+
+  private void updateTableConfigAndTimeline(HoodieInstant indexInstant,
+                                            List<HoodieIndexPartitionInfo> finalIndexPartitionInfos,
+                                            HoodieIndexCommitMetadata indexCommitMetadata) throws IOException {
+    try {
+      // update the table config and timeline in a lock as there could be another indexer running
+      txnManager.beginTransaction();
+      updateMetadataPartitionsTableConfig(table.getMetaClient(),
+          finalIndexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      table.getActiveTimeline().saveAsComplete(
+          new HoodieInstant(true, INDEXING_ACTION, indexInstant.getTimestamp()),
+          TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+    } finally {
+      txnManager.endTransaction();
+    }
+  }
+
+  private void catchupWithInflightWriters(HoodieTableMetadataWriter metadataWriter, List<HoodieInstant> instantsToIndex,
+                                          HoodieTableMetaClient metadataMetaClient, Set<String> metadataCompletedTimestamps) {
+    ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+    Future<?> indexingCatchupTaskFuture = executorService.submit(
+        new IndexingCatchupTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient(), metadataMetaClient));
+    try {
+      LOG.info("Starting index catchup task");
+      indexingCatchupTaskFuture.get(config.getIndexingCheckTimeoutSeconds(), TimeUnit.SECONDS);
+    } catch (Exception e) {
+      indexingCatchupTaskFuture.cancel(true);
+      throw new HoodieIndexException(String.format("Index catchup failed. Current indexed instant = %s. Aborting!", currentIndexedInstant), e);
+    } finally {
+      executorService.shutdownNow();
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline().getInstants()
+        .filter(i -> HoodieTimeline.compareTimestamps(i.getTimestamp(), GREATER_THAN_OR_EQUALS, instant))
+        .filter(i -> !INDEXING_ACTION.equals(i.getAction()))
+        .collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().findInstantsAfter(instant).getInstants()
+        .filter(i -> HoodieTimeline.compareTimestamps(i.getTimestamp(), GREATER_THAN_OR_EQUALS, instant))
+        .filter(i -> !INDEXING_ACTION.equals(i.getAction()))
+        .collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline().filterCompletedInstants().findInstantsAfter(instant)
+        .getInstants().filter(i -> !INDEXING_ACTION.equals(i.getAction())).collect(Collectors.toList());
+    completedInstants.addAll(metaClient.reloadActiveTimeline().filterCompletedInstants().findInstantsAfter(instant)
+        .getInstants().filter(i -> !INDEXING_ACTION.equals(i.getAction())).collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  private void updateMetadataPartitionsTableConfig(HoodieTableMetaClient metaClient, Set<String> metadataPartitions) {
+    // remove from inflight and update completed indexes
+    Set<String> inflightPartitions = getInflightMetadataPartitions(metaClient.getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(metaClient.getTableConfig());
+    inflightPartitions.removeAll(metadataPartitions);
+    completedPartitions.addAll(metadataPartitions);
+    // update table config
+    metaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    metaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(metaClient.getFs(), new Path(metaClient.getMetaPath()), metaClient.getTableConfig().getProps());
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCatchupTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+    private final Set<String> metadataCompletedInstants;
+    private final HoodieTableMetaClient metaClient;
+    private final HoodieTableMetaClient metadataMetaClient;
+
+    IndexingCatchupTask(HoodieTableMetadataWriter metadataWriter,
+                        List<HoodieInstant> instantsToIndex,
+                        Set<String> metadataCompletedInstants,
+                        HoodieTableMetaClient metaClient,
+                        HoodieTableMetaClient metadataMetaClient) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+      this.metadataCompletedInstants = metadataCompletedInstants;
+      this.metaClient = metaClient;
+      this.metadataMetaClient = metadataMetaClient;
+    }
+
+    @Override
+    public void run() {
+      for (HoodieInstant instant : instantsToIndex) {
+        // metadata index already updated for this instant
+        if (!metadataCompletedInstants.isEmpty() && metadataCompletedInstants.contains(instant.getTimestamp())) {
+          currentIndexedInstant = instant.getTimestamp();
+          continue;
+        }
+        while (!instant.isCompleted()) {
+          try {
+            LOG.warn("instant not completed, reloading timeline " + instant);
+            // reload timeline and fetch instant details again wait until timeout
+            String instantTime = instant.getTimestamp();
+            Option<HoodieInstant> currentInstant = metaClient.reloadActiveTimeline()
+                .filterCompletedInstants().filter(i -> i.getTimestamp().equals(instantTime)).firstInstant();
+            instant = currentInstant.orElse(instant);
+            // so that timeline is not reloaded very frequently
+            Thread.sleep(TIMELINE_RELOAD_INTERVAL_MILLIS);
+          } catch (InterruptedException e) {
+            throw new HoodieIndexException(String.format("Thread interrupted while running indexing check for instant: %s", instant), e);
+          }
+        }
+        // if instant completed, ensure that there was metadata commit, else update metadata for this completed instant
+        if (COMPLETED.equals(instant.getState())) {
+          String instantTime = instant.getTimestamp();
+          Option<HoodieInstant> metadataInstant = metadataMetaClient.reloadActiveTimeline()
+              .filterCompletedInstants().filter(i -> i.getTimestamp().equals(instantTime)).firstInstant();
+          if (metadataInstant.isPresent()) {
+            currentIndexedInstant = instantTime;
+            continue;
+          }
+          try {
+            // we need take a lock here as inflight writer could also try to update the timeline
+            txnManager.beginTransaction(Option.of(instant), Option.empty());
+            LOG.info("Updating metadata table for instant: " + instant);
+            switch (instant.getAction()) {

Review comment:
       no, will take this refactoring along with a followup task.. it needs a little bit more than extracting to a method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r839715062



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.CLEAN_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEXING_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.RESTORE_ACTION;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.ROLLBACK_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+import static org.apache.hudi.metadata.HoodieTableMetadata.getMetadataTableBasePath;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataPartition;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getCompletedMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.getInflightMetadataPartitions;
+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartitionExists;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+  private static final int TIMELINE_RELOAD_INTERVAL_MILLIS = 5000;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = validateAndGetIndexInstant();
+    // read HoodieIndexPlan
+    HoodieIndexPlan indexPlan;
+    try {
+      indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+    } catch (IOException e) {
+      throw new HoodieIndexException("Failed to read the index plan for instant: " + indexInstant);
+    }
+    List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+    try {
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // ensure the metadata partitions for the requested indexes are not already available (or inflight)
+      HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+      Set<String> indexesInflightOrCompleted = getInflightMetadataPartitions(tableConfig);
+      indexesInflightOrCompleted.addAll(getCompletedMetadataPartitions(tableConfig));
+      Set<String> requestedPartitions = indexPartitionInfos.stream()
+          .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet());
+      requestedPartitions.retainAll(indexesInflightOrCompleted);
+      if (!requestedPartitions.isEmpty()) {
+        throw new HoodieIndexException(String.format("Following partitions already exist or inflight: %s", requestedPartitions));
+      }
+
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // this will only build index upto base instant as generated by the plan, we will be doing catchup later
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      LOG.info("Starting Index Building with base instant: " + indexUptoInstant);
+      metadataWriter.buildMetadataPartitions(context, indexPartitionInfos);
+
+      // get remaining instants to catchup
+      List<HoodieInstant> instantsToCatchup = getInstantsToCatchup(indexUptoInstant);
+      LOG.info("Total remaining instants to index: " + instantsToCatchup.size());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index catchup for all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      catchupWithInflightWriters(metadataWriter, instantsToCatchup, metadataMetaClient, metadataCompletedTimestamps);
+      // save index commit metadata and update table config
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      updateTableConfigAndTimeline(indexInstant, finalIndexPartitionInfos, indexCommitMetadata);
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      // abort gracefully
+      abort(indexInstant, indexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private void abort(HoodieInstant indexInstant, Set<String> requestedPartitions) {
+    Set<String> inflightPartitions = getInflightMetadataPartitions(table.getMetaClient().getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(table.getMetaClient().getTableConfig());
+    // delete metadata partition
+    requestedPartitions.forEach(partition -> {
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(partition.toUpperCase(Locale.ROOT));
+      if (metadataPartitionExists(table.getMetaClient().getBasePath(), context, partitionType)) {
+        deleteMetadataPartition(table.getMetaClient().getBasePath(), context, partitionType);
+      }
+      inflightPartitions.remove(partition);
+      completedPartitions.remove(partition);
+    });
+    // update table config
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    table.getMetaClient().getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(table.getMetaClient().getFs(), new Path(table.getMetaClient().getMetaPath()), table.getMetaClient().getTableConfig().getProps());
+    // delete inflight instant
+    table.getMetaClient().reloadActiveTimeline().deleteInstantFileIfExists(HoodieTimeline.getIndexInflightInstant(indexInstant.getTimestamp()));
+  }
+
+  private List<HoodieInstant> getInstantsToCatchup(String indexUptoInstant) {
+    // since only write timeline was considered while scheduling index, which gives us the indexUpto instant
+    // here we consider other valid actions to pick catchupStart instant
+    Set<String> validActions = CollectionUtils.createSet(CLEAN_ACTION, RESTORE_ACTION, ROLLBACK_ACTION);
+    HoodieInstant catchupStartInstant = table.getMetaClient().reloadActiveTimeline()
+        .getTimelineOfActions(validActions)
+        .filterInflightsAndRequested()
+        .findInstantsBefore(indexUptoInstant)
+        .firstInstant().orElseGet(() -> null);
+    // get all instants since the plan completed (both from active timeline and archived timeline)
+    List<HoodieInstant> instantsToIndex;
+    if (catchupStartInstant != null) {
+      instantsToIndex = getRemainingArchivedAndActiveInstantsSince(catchupStartInstant.getTimestamp(), table.getMetaClient());
+    } else {
+      instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+    }
+    return instantsToIndex;
+  }
+
+  private HoodieInstant validateAndGetIndexInstant() {
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    return table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+  }
+
+  private void updateTableConfigAndTimeline(HoodieInstant indexInstant,
+                                            List<HoodieIndexPartitionInfo> finalIndexPartitionInfos,
+                                            HoodieIndexCommitMetadata indexCommitMetadata) throws IOException {
+    try {
+      // update the table config and timeline in a lock as there could be another indexer running
+      txnManager.beginTransaction();
+      updateMetadataPartitionsTableConfig(table.getMetaClient(),
+          finalIndexPartitionInfos.stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toSet()));
+      table.getActiveTimeline().saveAsComplete(
+          new HoodieInstant(true, INDEXING_ACTION, indexInstant.getTimestamp()),
+          TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+    } finally {
+      txnManager.endTransaction();
+    }
+  }
+
+  private void catchupWithInflightWriters(HoodieTableMetadataWriter metadataWriter, List<HoodieInstant> instantsToIndex,
+                                          HoodieTableMetaClient metadataMetaClient, Set<String> metadataCompletedTimestamps) {
+    ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+    Future<?> indexingCatchupTaskFuture = executorService.submit(
+        new IndexingCatchupTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient(), metadataMetaClient));
+    try {
+      LOG.info("Starting index catchup task");
+      indexingCatchupTaskFuture.get(config.getIndexingCheckTimeoutSeconds(), TimeUnit.SECONDS);
+    } catch (Exception e) {
+      indexingCatchupTaskFuture.cancel(true);
+      throw new HoodieIndexException(String.format("Index catchup failed. Current indexed instant = %s. Aborting!", currentIndexedInstant), e);
+    } finally {
+      executorService.shutdownNow();
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline().getInstants()
+        .filter(i -> HoodieTimeline.compareTimestamps(i.getTimestamp(), GREATER_THAN_OR_EQUALS, instant))
+        .filter(i -> !INDEXING_ACTION.equals(i.getAction()))
+        .collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().findInstantsAfter(instant).getInstants()
+        .filter(i -> HoodieTimeline.compareTimestamps(i.getTimestamp(), GREATER_THAN_OR_EQUALS, instant))
+        .filter(i -> !INDEXING_ACTION.equals(i.getAction()))
+        .collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline().filterCompletedInstants().findInstantsAfter(instant)
+        .getInstants().filter(i -> !INDEXING_ACTION.equals(i.getAction())).collect(Collectors.toList());
+    completedInstants.addAll(metaClient.reloadActiveTimeline().filterCompletedInstants().findInstantsAfter(instant)
+        .getInstants().filter(i -> !INDEXING_ACTION.equals(i.getAction())).collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  private void updateMetadataPartitionsTableConfig(HoodieTableMetaClient metaClient, Set<String> metadataPartitions) {
+    // remove from inflight and update completed indexes
+    Set<String> inflightPartitions = getInflightMetadataPartitions(metaClient.getTableConfig());
+    Set<String> completedPartitions = getCompletedMetadataPartitions(metaClient.getTableConfig());
+    inflightPartitions.removeAll(metadataPartitions);
+    completedPartitions.addAll(metadataPartitions);
+    // update table config
+    metaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS_INFLIGHT.key(), String.join(",", inflightPartitions));
+    metaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_PARTITIONS.key(), String.join(",", completedPartitions));
+    HoodieTableConfig.update(metaClient.getFs(), new Path(metaClient.getMetaPath()), metaClient.getTableConfig().getProps());
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCatchupTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+    private final Set<String> metadataCompletedInstants;
+    private final HoodieTableMetaClient metaClient;
+    private final HoodieTableMetaClient metadataMetaClient;
+
+    IndexingCatchupTask(HoodieTableMetadataWriter metadataWriter,
+                        List<HoodieInstant> instantsToIndex,
+                        Set<String> metadataCompletedInstants,
+                        HoodieTableMetaClient metaClient,
+                        HoodieTableMetaClient metadataMetaClient) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+      this.metadataCompletedInstants = metadataCompletedInstants;
+      this.metaClient = metaClient;
+      this.metadataMetaClient = metadataMetaClient;
+    }
+
+    @Override
+    public void run() {
+      for (HoodieInstant instant : instantsToIndex) {
+        // metadata index already updated for this instant
+        if (!metadataCompletedInstants.isEmpty() && metadataCompletedInstants.contains(instant.getTimestamp())) {
+          currentIndexedInstant = instant.getTimestamp();
+          continue;
+        }
+        while (!instant.isCompleted()) {
+          try {
+            LOG.warn("instant not completed, reloading timeline " + instant);
+            // reload timeline and fetch instant details again wait until timeout
+            String instantTime = instant.getTimestamp();
+            Option<HoodieInstant> currentInstant = metaClient.reloadActiveTimeline()
+                .filterCompletedInstants().filter(i -> i.getTimestamp().equals(instantTime)).firstInstant();
+            instant = currentInstant.orElse(instant);
+            // so that timeline is not reloaded very frequently
+            Thread.sleep(TIMELINE_RELOAD_INTERVAL_MILLIS);
+          } catch (InterruptedException e) {
+            throw new HoodieIndexException(String.format("Thread interrupted while running indexing check for instant: %s", instant), e);
+          }
+        }
+        // if instant completed, ensure that there was metadata commit, else update metadata for this completed instant

Review comment:
       yes right..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077556653


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e58990e296aa5125807a4b96269fa7a06c885e69 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282) 
   * 32cfdbf4524384a7fb8220be6e822dc510cf173b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077706166


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca6f4c73d40497413fd38b6edd7fbf1de9b50cac Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296) 
   * c9295eeaffb5e804ee6c636b8617f754af1492d8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835767254



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and return
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        txnManager.beginTransaction();
+        table.getActiveTimeline().saveAsComplete(
+            new HoodieInstant(true, INDEX_ACTION, indexInstant.getTimestamp()),
+            TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+      } finally {
+        txnManager.endTransaction();
+      }
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline()
+        .getWriteTimeline()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().getWriteTimeline().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline()
+        .filterCompletedInstants()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    completedInstants.addAll(metaClient.getActiveTimeline().filterCompletedInstants().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCheckTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+    private final Set<String> metadataCompletedInstants;
+    private final HoodieTableMetaClient metaClient;
+
+    IndexingCheckTask(HoodieTableMetadataWriter metadataWriter,
+                      List<HoodieInstant> instantsToIndex,
+                      Set<String> metadataCompletedInstants,
+                      HoodieTableMetaClient metaClient) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+      this.metadataCompletedInstants = metadataCompletedInstants;
+      this.metaClient = metaClient;
+    }
+
+    @Override
+    public void run() {
+      while (!Thread.interrupted()) {
+        for (HoodieInstant instant : instantsToIndex) {
+          // metadata index already updated for this instant
+          if (metadataCompletedInstants.contains(instant.getTimestamp())) {
+            currentIndexedInstant = instant.getTimestamp();
+            continue;
+          }
+          while (!instant.isCompleted()) {
+            // reload timeline and fetch instant details again wait until timeout
+            String instantTime = instant.getTimestamp();
+            Option<HoodieInstant> currentInstant = metaClient.reloadActiveTimeline()
+                .filterCompletedInstants().filter(i -> i.getTimestamp().equals(instantTime)).firstInstant();
+            instant = currentInstant.orElse(instant);
+          }
+          // update metadata for this completed instant
+          if (COMPLETED.equals(instant.getState())) {
+            try {
+              // we need take a lock here as inflight writer could also try to update the timeline
+              txnManager.beginTransaction(Option.of(instant), Option.empty());
+              switch (instant.getAction()) {
+                case HoodieTimeline.COMMIT_ACTION:
+                case HoodieTimeline.DELTA_COMMIT_ACTION:
+                case HoodieTimeline.REPLACE_COMMIT_ACTION:
+                  HoodieCommitMetadata commitMetadata = HoodieCommitMetadata.fromBytes(
+                      table.getActiveTimeline().getInstantDetails(instant).get(), HoodieCommitMetadata.class);
+                  metadataWriter.update(commitMetadata, instant.getTimestamp(), false);
+                  break;
+                case HoodieTimeline.CLEAN_ACTION:
+                  HoodieCleanMetadata cleanMetadata = CleanerUtils.getCleanerMetadata(table.getMetaClient(), instant);
+                  metadataWriter.update(cleanMetadata, instant.getTimestamp());
+                  break;
+                case HoodieTimeline.RESTORE_ACTION:
+                  HoodieRestoreMetadata restoreMetadata = TimelineMetadataUtils.deserializeHoodieRestoreMetadata(
+                      table.getActiveTimeline().getInstantDetails(instant).get());
+                  metadataWriter.update(restoreMetadata, instant.getTimestamp());
+                  break;
+                case HoodieTimeline.ROLLBACK_ACTION:
+                  HoodieRollbackMetadata rollbackMetadata = TimelineMetadataUtils.deserializeHoodieRollbackMetadata(
+                      table.getActiveTimeline().getInstantDetails(instant).get());
+                  metadataWriter.update(rollbackMetadata, instant.getTimestamp());
+                  break;
+                default:
+                  throw new IllegalStateException("Unexpected value: " + instant.getAction());
+              }
+            } catch (IOException e) {
+              LOG.error("Could not update metadata partition for instant: " + instant);

Review comment:
       hmm.. yeah we should throw here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835766943



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();

Review comment:
       We're blocking compaction. 
   For archival, now we're checking archived timeline while figuring out remaining instants to index. Essentially, completed timestamps will include instants from both active and archived timeline. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835767497



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<HoodieInstant> metadataCompletedTimeline = metadataMetaClient.getActiveTimeline()
+          .getCommitsTimeline().filterCompletedInstants().getInstants().collect(Collectors.toSet());
+      List<HoodieInstant> finalRemainingInstantsToIndex = remainingInstantsToIndex.map(
+          instant -> new HoodieInstant(HoodieInstant.State.COMPLETED, HoodieTimeline.DELTA_COMMIT_ACTION, instant.getTimestamp())
+      ).filter(instant -> !metadataCompletedTimeline.contains(instant)).collect(Collectors.toList());
+
+      // index all remaining instants with a timeout
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(new PostRequestIndexingTask(metadataWriter, finalRemainingInstantsToIndex));
+      try {
+        // TODO: configure timeout
+        postRequestIndexingTaskFuture.get(60, TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      Option<HoodieInstant> lastMetadataInstant = metadataMetaClient.reloadActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
+      if (lastMetadataInstant.isPresent() && indexUptoInstant.equals(lastMetadataInstant.get().getTimestamp())) {
+        return Option.of(HoodieIndexCommitMetadata.newBuilder()
+            .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(indexPartitionInfos).build());
+      }
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              lastMetadataInstant.get().getTimestamp())).collect(Collectors.toList());
+      return Option.of(HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build());
+    } catch (IOException e) {

Review comment:
       will throw from task and thrad will be cancelled.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835770392



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Schedules INDEX action.
+ * <li>
+ *   1. Fetch last completed instant on data timeline.
+ *   2. Write the index plan to the <instant>.index.requested.
+ *   3. Initialize filegroups for the enabled partition types within a transaction.
+ * </li>
+ */
+public class ScheduleIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexPlan>> {
+
+  private static final Logger LOG = LogManager.getLogger(ScheduleIndexActionExecutor.class);
+  private static final Integer INDEX_PLAN_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_PLAN_VERSION = INDEX_PLAN_VERSION_1;
+
+  private final List<MetadataPartitionType> partitionsToIndex;
+  private final TransactionManager txnManager;
+
+  public ScheduleIndexActionExecutor(HoodieEngineContext context,
+                                     HoodieWriteConfig config,
+                                     HoodieTable<T, I, K, O> table,
+                                     String instantTime,
+                                     List<MetadataPartitionType> partitionsToIndex) {
+    super(context, config, table, instantTime);
+    this.partitionsToIndex = partitionsToIndex;
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexPlan> execute() {
+    // validate partitionsToIndex
+    if (!MetadataPartitionType.allPaths().containsAll(partitionsToIndex)) {
+      throw new HoodieIndexException("Not all partitions are valid: " + partitionsToIndex);
+    }
+    // get last completed instant
+    Option<HoodieInstant> indexUptoInstant = table.getActiveTimeline().filterCompletedInstants().lastInstant();
+    if (indexUptoInstant.isPresent()) {
+      final HoodieInstant indexInstant = HoodieTimeline.getIndexRequestedInstant(instantTime);
+      // for each partitionToIndex add that time to the plan
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = partitionsToIndex.stream()
+          .map(p -> new HoodieIndexPartitionInfo(LATEST_INDEX_PLAN_VERSION, p.getPartitionPath(), indexUptoInstant.get().getTimestamp()))
+          .collect(Collectors.toList());
+      HoodieIndexPlan indexPlan = new HoodieIndexPlan(LATEST_INDEX_PLAN_VERSION, indexPartitionInfos);
+      try {
+        table.getActiveTimeline().saveToPendingIndexCommit(indexInstant, TimelineMetadataUtils.serializeIndexPlan(indexPlan));
+      } catch (IOException e) {
+        LOG.error("Error while saving index requested file", e);
+        throw new HoodieIOException(e.getMessage(), e);
+      }
+      table.getMetaClient().reloadActiveTimeline();
+
+      // start initializing filegroups
+      // 1. get metadata writer
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      // 2. take a lock --> begin tx (data table)
+      try {
+        this.txnManager.beginTransaction(Option.of(indexInstant), Option.empty());
+        // 3. initialize filegroups as per plan for the enabled partition types
+        for (MetadataPartitionType partitionType : partitionsToIndex) {
+          metadataWriter.initializeFileGroups(table.getMetaClient(), partitionType, indexInstant.getTimestamp(), 1);
+        }
+      } catch (IOException e) {
+        LOG.error("Could not initialize file groups");
+        throw new HoodieIOException(e.getMessage(), e);
+      } finally {
+        this.txnManager.endTransaction(Option.of(indexInstant));
+      }
+      return Option.of(indexPlan);
+    }
+    return Option.empty();

Review comment:
       no, indexer will error out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835774040



##########
File path: hudi-common/src/main/avro/HoodieIndexPartitionInfo.avsc
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexPartitionInfo",
+  "fields": [
+    {
+      "name": "version",

Review comment:
       i followed the converntion.. don't we do that to evolve the schema?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835775436



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/bloom/BloomFilter.java
##########
@@ -30,6 +34,13 @@
    */
   void add(String key);
 
+  /**
+   * Add secondary key to the {@link BloomFilter}.
+   *
+   * @param keys list of secondary keys to add to the {@link BloomFilter}
+   */
+  void add(@Nonnull List<String> keys);

Review comment:
       it's just a wrapper to call the original `add(Key)` multiple times.. membership test will still happen per key.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835779404



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -392,6 +398,12 @@ public void initTableMetadata() {
     }
 
     if (!exists) {
+      if (metadataWriteConfig.isMetadataAsyncIndex()) {
+        // with async metadata indexing enabled, there can be inflight writers
+        MetadataRecordsGenerationParams indexParams = getRecordsGenerationParams();
+        scheduleIndex(indexParams.getEnabledPartitionTypes());

Review comment:
       I don't think we need this code block at all. scheduleIndexer will explicitly call schedule.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835784016



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No pending index instant found: %s", instantTime)));
+    ValidationUtils.checkArgument(HoodieInstant.State.INFLIGHT.equals(indexInstant.getState()),
+        String.format("Index instant %s already inflight", instantTime));
+    try {
+      // read HoodieIndexPlan assuming indexInstant is requested
+      // TODO: handle inflight instant, if it is inflight then throw error.
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+      // get all completed instants since the plan completed
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      Stream<HoodieInstant> remainingInstantsToIndex = table.getActiveTimeline().getWriteTimeline().getReverseOrderedInstants()
+          .filter(instant -> instant.isCompleted() && HoodieActiveTimeline.GREATER_THAN.test(instant.getTimestamp(), indexUptoInstant));
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();

Review comment:
       I don't see any changes to compaction code path. synced up f2f. we might need some guards here. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836572049



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -392,6 +398,12 @@ public void initTableMetadata() {
     }
 
     if (!exists) {
+      if (metadataWriteConfig.isMetadataAsyncIndex()) {
+        // with async metadata indexing enabled, there can be inflight writers
+        MetadataRecordsGenerationParams indexParams = getRecordsGenerationParams();
+        scheduleIndex(indexParams.getEnabledPartitionTypes());

Review comment:
       removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r836579606



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet());
+    partitionsToUpdate.addAll(Stream.of(dataMetaClient.getTableConfig().getInflightMetadataIndexes().split(","))
+        .map(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toSet()));
+    if (!partitionsToUpdate.isEmpty()) {
+      return partitionsToUpdate;
     }
+    // fallback to update files partition only if table config returned no partitions
+    partitionsToUpdate.add(MetadataPartitionType.FILES.getPartitionPath());
+    return partitionsToUpdate;
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    if (indexPartitionInfos.isEmpty()) {
+      LOG.warn("No partition to index in the plan");
+      return;
+    }
+    String indexUptoInstantTime = indexPartitionInfos.get(0).getIndexUptoInstant();
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        // filegroup should have already been initialized while scheduling index for this partition
+        if (!dataMetaClient.getFs().exists(new Path(metadataWriteConfig.getBasePath(), relativePartitionPath))) {
+          throw new HoodieIndexException(String.format("File group not initialized for metadata partition: %s, indexUptoInstant: %s. Looks like index scheduling failed!",
+              relativePartitionPath, indexUptoInstantTime));
+        }
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to check whether file group is initialized for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // return early and populate enabledPartitionTypes correctly (check in initialCommit)
+      MetadataPartitionType partitionType = MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT));
+      if (!enabledPartitionTypes.contains(partitionType)) {
+        throw new HoodieIndexException(String.format("Indexing for metadata partition: %s is not enabled", partitionType));
+      }
+    });
+    // before initial commit update table config
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), indexPartitionInfos.stream()
+        .map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.joining(",")));
+    HoodieTableConfig.update(dataMetaClient.getFs(), new Path(dataMetaClient.getMetaPath()), dataMetaClient.getTableConfig().getProps());
+    // check here for enabled partition types whether filegroups initialized or not
+    initialCommit(indexUptoInstantTime);
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_INFLIGHT.key(), "");
+    dataMetaClient.getTableConfig().setValue(HoodieTableConfig.TABLE_METADATA_INDEX_COMPLETED.key(), indexPartitionInfos.stream()

Review comment:
       Changed the logic. Now, only inflight config is getting updated here. The completed will get updated only after catchup completed in RunIndexActionExecutor.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1081069550


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     }, {
       "hash" : "69071c6306ce336076aa6daa4337276990572ee4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7368",
       "triggerID" : "69071c6306ce336076aa6daa4337276990572ee4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452",
       "triggerID" : "522a18caff448bcc9b127372d4526ee8f168f085",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471",
       "triggerID" : "ee361b1bf6b9b68e11f84f2af76625b847669ed2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 522a18caff448bcc9b127372d4526ee8f168f085 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7452) 
   * ee361b1bf6b9b68e11f84f2af76625b847669ed2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7471) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] manojpec commented on a change in pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
manojpec commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r797392212



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -588,10 +609,87 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
-      commit(engineContext.parallelize(records, 1), MetadataPartitionType.FILES.partitionPath(), instantTime, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
+        commit(engineContext.parallelize(records, 1), p, instantTime, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();
+      }
     }
+    // TODO: return only enabled partitions
+    return MetadataPartitionType.all();
+  }
+
+  private List<String> getExistingMetadataPartitions() {
+    return MetadataPartitionType.all().stream()
+        .filter(p -> {
+          try {
+            // TODO: avoid fs.exists() check
+            return metadataMetaClient.getFs().exists(FSUtils.getPartitionPath(metadataWriteConfig.getBasePath(), p));
+          } catch (IOException e) {
+            return false;
+          }
+        })
+        .collect(Collectors.toList());
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String indexUptoInstantTime = indexPartitionInfo.getIndexUptoInstant();
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        HoodieTableMetaClient.withPropertyBuilder()
+            .setTableType(HoodieTableType.MERGE_ON_READ)
+            .setTableName(tableName)
+            .setArchiveLogFolder(ARCHIVELOG_FOLDER.defaultValue())
+            .setPayloadClassName(HoodieMetadataPayload.class.getName())
+            .setBaseFileFormat(HoodieFileFormat.HFILE.toString())
+            .setRecordKeyFields(RECORD_KEY_FIELD_NAME)
+            .setPopulateMetaFields(dataWriteConfig.getMetadataConfig().populateMetaFields())
+            .setKeyGeneratorClassProp(HoodieTableMetadataKeyGenerator.class.getCanonicalName())
+            .initTable(hadoopConf.get(), metadataWriteConfig.getBasePath());
+        initTableMetadata();
+        initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT)), indexUptoInstantTime, 1);
+      } catch (IOException e) {
+        throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, indexUptoInstant: %s",
+            relativePartitionPath, indexUptoInstantTime));
+      }
+
+      // List all partitions in the basePath of the containing dataset
+      LOG.info("Initializing metadata table by using file listings in " + dataWriteConfig.getBasePath());
+      engineContext.setJobStatus(this.getClass().getSimpleName(), "MetadataIndex: initializing metadata table by listing files and partitions");
+      List<DirectoryInfo> dirInfoList = listAllPartitions(dataMetaClient);
+
+      // During bootstrap, the list of files to be committed can be huge. So creating a HoodieCommitMetadata out of these
+      // large number of files and calling the existing update(HoodieCommitMetadata) function does not scale well.
+      // Hence, we have a special commit just for the bootstrap scenario.
+      bootstrapCommit(dirInfoList, indexUptoInstantTime, relativePartitionPath);

Review comment:
       We need to decide on the interface for bootstrapCommit() which can apply to all metadata partitions. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -588,10 +609,87 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
-      commit(engineContext.parallelize(records, 1), MetadataPartitionType.FILES.partitionPath(), instantTime, canTriggerTableService);
+    List<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        try {
+          initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(p.toUpperCase(Locale.ROOT)), instantTime, 1);
+        } catch (IOException e) {
+          throw new HoodieIndexException(String.format("Unable to initialize file groups for metadata partition: %s, instant: %s", p, instantTime));
+        }
+        List<HoodieRecord> records = convertMetadataFunction.convertMetadata();
+        commit(engineContext.parallelize(records, 1), p, instantTime, canTriggerTableService);
+      }
+    });
+  }
+
+  private List<String> getMetadataPartitionsToUpdate() {
+    // find last (pending or) completed index instant and get partitions (to be) written
+    Option<HoodieInstant> lastIndexingInstant = dataMetaClient.getActiveTimeline()
+        .getTimelineOfActions(CollectionUtils.createImmutableSet(HoodieTimeline.INDEX_ACTION)).lastInstant();
+    if (lastIndexingInstant.isPresent()) {
+      try {
+        // TODO: handle inflight instant, if it is inflight then read from requested file.
+        HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(
+            dataMetaClient.getActiveTimeline().readIndexPlanAsBytes(lastIndexingInstant.get()).get());
+        return indexPlan.getIndexPartitionInfos().stream().map(HoodieIndexPartitionInfo::getMetadataPartitionPath).collect(Collectors.toList());
+      } catch (IOException e) {
+        LOG.warn("Could not read index plan. Falling back to FileSystem.exists() check.");
+        return getExistingMetadataPartitions();
+      }
     }
+    // TODO: return only enabled partitions
+    return MetadataPartitionType.all();
+  }
+
+  private List<String> getExistingMetadataPartitions() {
+    return MetadataPartitionType.all().stream()
+        .filter(p -> {
+          try {
+            // TODO: avoid fs.exists() check
+            return metadataMetaClient.getFs().exists(FSUtils.getPartitionPath(metadataWriteConfig.getBasePath(), p));
+          } catch (IOException e) {
+            return false;
+          }
+        })
+        .collect(Collectors.toList());
+  }
+
+  @Override
+  public void index(HoodieEngineContext engineContext, List<HoodieIndexPartitionInfo> indexPartitionInfos) {
+    indexPartitionInfos.forEach(indexPartitionInfo -> {
+      String indexUptoInstantTime = indexPartitionInfo.getIndexUptoInstant();
+      String relativePartitionPath = indexPartitionInfo.getMetadataPartitionPath();
+      LOG.info(String.format("Creating a new metadata index for partition '%s' under path %s upto instant %s",
+          relativePartitionPath, metadataWriteConfig.getBasePath(), indexUptoInstantTime));
+      try {
+        HoodieTableMetaClient.withPropertyBuilder()
+            .setTableType(HoodieTableType.MERGE_ON_READ)
+            .setTableName(tableName)
+            .setArchiveLogFolder(ARCHIVELOG_FOLDER.defaultValue())
+            .setPayloadClassName(HoodieMetadataPayload.class.getName())
+            .setBaseFileFormat(HoodieFileFormat.HFILE.toString())
+            .setRecordKeyFields(RECORD_KEY_FIELD_NAME)
+            .setPopulateMetaFields(dataWriteConfig.getMetadataConfig().populateMetaFields())
+            .setKeyGeneratorClassProp(HoodieTableMetadataKeyGenerator.class.getCanonicalName())
+            .initTable(hadoopConf.get(), metadataWriteConfig.getBasePath());
+        initTableMetadata();
+        initializeFileGroups(dataMetaClient, MetadataPartitionType.valueOf(relativePartitionPath.toUpperCase(Locale.ROOT)), indexUptoInstantTime, 1);

Review comment:
       FileGroup count for each partition comes from config. Either they can be part of the plan or we need to refer back to the config.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025425611


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 238b128260cab3ad11c8e00bd20871b45e112c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533) 
   * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1025464066


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca12a7818b2a799fb57ee04376dfcb14d628cdb2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1065109086


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7920cb15d99cd92ea2a3e6bd515249eb63040772 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801) 
   * e6e3e1612928fb0892d071ec4c3a26e31ce1ff76 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1065194900


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e6e3e1612928fb0892d071ec4c3a26e31ce1ff76 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066502014


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e6e3e1612928fb0892d071ec4c3a26e31ce1ff76 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840) 
   * 4a036d809018043ed0d99adccbe0efdfd920284a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1066502014


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e6e3e1612928fb0892d071ec4c3a26e31ce1ff76 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840) 
   * 4a036d809018043ed0d99adccbe0efdfd920284a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1067036039


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5c1c7e91b5f530907cda50135fef8286ee8a8e38 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929) 
   * a9f8c1316b55b72c57d18fbe8d0c8103948a30bc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835758046



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -620,8 +636,14 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
 
     LOG.info(String.format("Creating %d file groups for partition %s with base fileId %s at instant time %s",
         fileGroupCount, metadataPartition.getPartitionPath(), metadataPartition.getFileIdPrefix(), instantTime));
+    HoodieTableFileSystemView fsView = HoodieTableMetadataUtil.getFileSystemView(metadataMetaClient);
+    List<FileSlice> fileSlices = HoodieTableMetadataUtil.getPartitionLatestFileSlices(metadataMetaClient, Option.ofNullable(fsView), metadataPartition.getPartitionPath());
     for (int i = 0; i < fileGroupCount; ++i) {
       final String fileGroupFileId = String.format("%s%04d", metadataPartition.getFileIdPrefix(), i);
+      // if a writer or async indexer had already initialized the filegroup then continue
+      if (!fileSlices.isEmpty() && fileSlices.stream().anyMatch(fileSlice -> fileGroupFileId.equals(fileSlice.getFileGroupId().getFileId()))) {
+        continue;

Review comment:
        Not handled currently.
   So, first we check whether a particular partition needs to be initialized or not. If yes, then initialize but in case of partial failed filegroup instantiation, we will clean up all file groups and start from scratch. Will add this logic.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835763435



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -663,20 +711,82 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() {
 
   /**
    * Processes commit metadata from data table and commits to metadata table.
+   *
    * @param instantTime instant time of interest.
    * @param convertMetadataFunction converter function to convert the respective metadata to List of HoodieRecords to be written to metadata table.
    * @param <T> type of commit metadata.
    * @param canTriggerTableService true if table services can be triggered. false otherwise.
    */
   private <T> void processAndCommit(String instantTime, ConvertMetadataFunction convertMetadataFunction, boolean canTriggerTableService) {
-    if (enabled && metadata != null) {
-      Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
-      commit(instantTime, partitionRecordsMap, canTriggerTableService);
+    if (!dataWriteConfig.isMetadataTableEnabled()) {
+      return;
+    }
+    Set<String> partitionsToUpdate = getMetadataPartitionsToUpdate();
+    partitionsToUpdate.forEach(p -> {
+      if (enabled && metadata != null) {
+        Map<MetadataPartitionType, HoodieData<HoodieRecord>> partitionRecordsMap = convertMetadataFunction.convertMetadata();
+        commit(instantTime, partitionRecordsMap, canTriggerTableService);
+      }
+    });
+  }
+
+  private Set<String> getMetadataPartitionsToUpdate() {
+    // fetch partitions to update from table config
+    Set<String> partitionsToUpdate = Stream.of(dataMetaClient.getTableConfig().getCompletedMetadataIndexes().split(","))

Review comment:
       actually, it makes sense to have `getInflightAndCompleteMetadataIndexes` in addition to existing getters. At times, we may jsut want completed index. Will change accordingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835770935



##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkCopyOnWriteTable.java
##########
@@ -343,6 +347,16 @@ public HoodieRollbackMetadata rollback(HoodieEngineContext context, String rollb
         deleteInstants, skipLocking).execute();
   }
 
+  @Override
+  public Option<HoodieIndexPlan> scheduleIndex(HoodieEngineContext context, String indexInstantTime, List<String> partitionsToIndex) {
+    return new ScheduleIndexActionExecutor<>(context, config, this, indexInstantTime, partitionsToIndex).execute();

Review comment:
       yeah i just followed the way it's being done. we can take up this refactoring if needed later on for all such classes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1078597374


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c9295eeaffb5e804ee6c636b8617f754af1492d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298) 
   * d02e0c2ca65038f88ae753484dcb2642ef789f27 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1078598500


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334",
       "triggerID" : "d02e0c2ca65038f88ae753484dcb2642ef789f27",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c9295eeaffb5e804ee6c636b8617f754af1492d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298) 
   * d02e0c2ca65038f88ae753484dcb2642ef789f27 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7334) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835753092



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##########
@@ -915,6 +917,39 @@ public boolean scheduleCompactionAtInstant(String instantTime, Option<Map<String
     return scheduleTableService(instantTime, extraMetadata, TableServiceType.COMPACT).isPresent();
   }
 
+  public Option<String> scheduleIndexing(List<MetadataPartitionType> partitionTypes) {
+    String instantTime = HoodieActiveTimeline.createNewInstantTime();
+    return scheduleIndexingAtInstant(partitionTypes, instantTime) ? Option.of(instantTime) : Option.empty();
+  }
+
+  private boolean scheduleIndexingAtInstant(List<MetadataPartitionType> partitionTypes, String instantTime) throws HoodieIOException {
+    Option<HoodieIndexPlan> indexPlan = createTable(config, hadoopConf, config.isMetadataTableEnabled())

Review comment:
       we check table config to see inflight/completed indexes and this would return false in case triggered twice.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835764478



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {

Review comment:
       no, i'll add.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835786373



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and return
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        txnManager.beginTransaction();
+        table.getActiveTimeline().saveAsComplete(
+            new HoodieInstant(true, INDEX_ACTION, indexInstant.getTimestamp()),
+            TimelineMetadataUtils.serializeIndexCommitMetadata(indexCommitMetadata));
+      } finally {
+        txnManager.endTransaction();
+      }
+      return Option.of(indexCommitMetadata);
+    } catch (IOException e) {
+      throw new HoodieIndexException(String.format("Unable to index instant: %s", indexInstant));
+    }
+  }
+
+  private static List<HoodieInstant> getRemainingArchivedAndActiveInstantsSince(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> remainingInstantsToIndex = metaClient.getArchivedTimeline()
+        .getWriteTimeline()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    remainingInstantsToIndex.addAll(metaClient.getActiveTimeline().getWriteTimeline().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return remainingInstantsToIndex;
+  }
+
+  private static List<HoodieInstant> getCompletedArchivedAndActiveInstantsAfter(String instant, HoodieTableMetaClient metaClient) {
+    List<HoodieInstant> completedInstants = metaClient.getArchivedTimeline()
+        .filterCompletedInstants()
+        .findInstantsAfter(instant)
+        .getInstants().collect(Collectors.toList());
+    completedInstants.addAll(metaClient.getActiveTimeline().filterCompletedInstants().findInstantsAfter(instant).getInstants().collect(Collectors.toList()));
+    return completedInstants;
+  }
+
+  /**
+   * Indexing check runs for instants that completed after the base instant (in the index plan).
+   * It will check if these later instants have logged updates to metadata table or not.
+   * If not, then it will do the update. If a later instant is inflight, it will wait until it is completed or the task times out.
+   */
+  class IndexingCheckTask implements Runnable {
+
+    private final HoodieTableMetadataWriter metadataWriter;
+    private final List<HoodieInstant> instantsToIndex;
+    private final Set<String> metadataCompletedInstants;
+    private final HoodieTableMetaClient metaClient;
+
+    IndexingCheckTask(HoodieTableMetadataWriter metadataWriter,
+                      List<HoodieInstant> instantsToIndex,
+                      Set<String> metadataCompletedInstants,
+                      HoodieTableMetaClient metaClient) {
+      this.metadataWriter = metadataWriter;
+      this.instantsToIndex = instantsToIndex;
+      this.metadataCompletedInstants = metadataCompletedInstants;
+      this.metaClient = metaClient;
+    }
+
+    @Override
+    public void run() {
+      while (!Thread.interrupted()) {
+        for (HoodieInstant instant : instantsToIndex) {
+          // metadata index already updated for this instant
+          if (metadataCompletedInstants.contains(instant.getTimestamp())) {
+            currentIndexedInstant = instant.getTimestamp();
+            continue;
+          }
+          while (!instant.isCompleted()) {
+            // reload timeline and fetch instant details again wait until timeout
+            String instantTime = instant.getTimestamp();
+            Option<HoodieInstant> currentInstant = metaClient.reloadActiveTimeline()

Review comment:
       looks like we are reloading in repeated succession here. can we add a 5 sec delay between each check




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1077725510


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7282",
       "triggerID" : "e58990e296aa5125807a4b96269fa7a06c885e69",
       "triggerType" : "PUSH"
     }, {
       "hash" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7283",
       "triggerID" : "32cfdbf4524384a7fb8220be6e822dc510cf173b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296",
       "triggerID" : "ca6f4c73d40497413fd38b6edd7fbf1de9b50cac",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298",
       "triggerID" : "c9295eeaffb5e804ee6c636b8617f754af1492d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ca6f4c73d40497413fd38b6edd7fbf1de9b50cac Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7296) 
   * c9295eeaffb5e804ee6c636b8617f754af1492d8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7298) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835776580



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -608,7 +624,7 @@ private void initializeEnabledFileGroups(HoodieTableMetaClient dataMetaClient, S
    * File groups will be named as :
    *    record-index-bucket-0000, .... -> ..., record-index-bucket-0009
    */
-  private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, MetadataPartitionType metadataPartition, String instantTime,
+  public void initializeFileGroups(HoodieTableMetaClient dataMetaClient, MetadataPartitionType metadataPartition, String instantTime,

Review comment:
       think thru what incase someone sets FILES partition also to be indexed? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835781695



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.table.action.index;
+
+import org.apache.hudi.avro.model.HoodieCleanMetadata;
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.avro.model.HoodieIndexPlan;
+import org.apache.hudi.avro.model.HoodieRestoreMetadata;
+import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.transaction.TransactionManager;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieCommitMetadata;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.common.util.CleanerUtils;
+import org.apache.hudi.common.util.HoodieTimer;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
+import org.apache.hudi.metadata.HoodieTableMetadataWriter;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.BaseActionExecutor;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.model.WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.COMPLETED;
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.REQUESTED;
+import static org.apache.hudi.common.table.timeline.HoodieTimeline.INDEX_ACTION;
+import static org.apache.hudi.config.HoodieWriteConfig.WRITE_CONCURRENCY_MODE;
+
+/**
+ * Reads the index plan and executes the plan.
+ * It also reconciles updates on data timeline while indexing was in progress.
+ */
+public class RunIndexActionExecutor<T extends HoodieRecordPayload, I, K, O> extends BaseActionExecutor<T, I, K, O, Option<HoodieIndexCommitMetadata>> {
+
+  private static final Logger LOG = LogManager.getLogger(RunIndexActionExecutor.class);
+  private static final Integer INDEX_COMMIT_METADATA_VERSION_1 = 1;
+  private static final Integer LATEST_INDEX_COMMIT_METADATA_VERSION = INDEX_COMMIT_METADATA_VERSION_1;
+  private static final int MAX_CONCURRENT_INDEXING = 1;
+
+  // we use this to update the latest instant in data timeline that has been indexed in metadata table
+  // this needs to be volatile as it can be updated in the IndexingCheckTask spawned by this executor
+  // assumption is that only one indexer can execute at a time
+  private volatile String currentIndexedInstant;
+
+  private final TransactionManager txnManager;
+
+  public RunIndexActionExecutor(HoodieEngineContext context, HoodieWriteConfig config, HoodieTable<T, I, K, O> table, String instantTime) {
+    super(context, config, table, instantTime);
+    this.txnManager = new TransactionManager(config, table.getMetaClient().getFs());
+  }
+
+  @Override
+  public Option<HoodieIndexCommitMetadata> execute() {
+    HoodieTimer indexTimer = new HoodieTimer();
+    indexTimer.startTimer();
+
+    // ensure lock provider configured
+    if (!config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() || StringUtils.isNullOrEmpty(config.getLockProviderClass())) {
+      throw new HoodieIndexException(String.format("Need to set %s as %s and configure lock provider class",
+          WRITE_CONCURRENCY_MODE.key(), OPTIMISTIC_CONCURRENCY_CONTROL.name()));
+    }
+
+    HoodieInstant indexInstant = table.getActiveTimeline()
+        .filterPendingIndexTimeline()
+        .filter(instant -> instant.getTimestamp().equals(instantTime) && REQUESTED.equals(instant.getState()))
+        .lastInstant()
+        .orElseThrow(() -> new HoodieIndexException(String.format("No requested index instant found: %s", instantTime)));
+    try {
+      // read HoodieIndexPlan
+      HoodieIndexPlan indexPlan = TimelineMetadataUtils.deserializeIndexPlan(table.getActiveTimeline().readIndexPlanAsBytes(indexInstant).get());
+      List<HoodieIndexPartitionInfo> indexPartitionInfos = indexPlan.getIndexPartitionInfos();
+      if (indexPartitionInfos == null || indexPartitionInfos.isEmpty()) {
+        throw new HoodieIndexException(String.format("No partitions to index for instant: %s", instantTime));
+      }
+      // transition requested indexInstant to inflight
+      table.getActiveTimeline().transitionIndexRequestedToInflight(indexInstant, Option.empty());
+      // start indexing for each partition
+      HoodieTableMetadataWriter metadataWriter = table.getMetadataWriter(instantTime)
+          .orElseThrow(() -> new HoodieIndexException(String.format("Could not get metadata writer to run index action for instant: %s", instantTime)));
+      metadataWriter.index(context, indexPartitionInfos);
+
+      // get all instants since the plan completed (both from active timeline and archived timeline)
+      // assumption is that all metadata partitions had same instant upto which they were scheduled to be indexed
+      table.getMetaClient().reloadActiveTimeline();
+      String indexUptoInstant = indexPartitionInfos.get(0).getIndexUptoInstant();
+      List<HoodieInstant> instantsToIndex = getRemainingArchivedAndActiveInstantsSince(indexUptoInstant, table.getMetaClient());
+
+      // reconcile with metadata table timeline
+      String metadataBasePath = HoodieTableMetadata.getMetadataTableBasePath(table.getMetaClient().getBasePath());
+      HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+      Set<String> metadataCompletedTimestamps = getCompletedArchivedAndActiveInstantsAfter(indexUptoInstant, metadataMetaClient).stream()
+          .map(HoodieInstant::getTimestamp).collect(Collectors.toSet());
+
+      // index all remaining instants with a timeout
+      currentIndexedInstant = indexUptoInstant;
+      ExecutorService executorService = Executors.newFixedThreadPool(MAX_CONCURRENT_INDEXING);
+      Future<?> postRequestIndexingTaskFuture = executorService.submit(
+          new IndexingCheckTask(metadataWriter, instantsToIndex, metadataCompletedTimestamps, table.getMetaClient()));
+      try {
+        postRequestIndexingTaskFuture.get(config.getIndexingCheckTimeout(), TimeUnit.SECONDS);
+      } catch (TimeoutException | InterruptedException | ExecutionException e) {
+        postRequestIndexingTaskFuture.cancel(true);
+      } finally {
+        executorService.shutdownNow();
+      }
+      // save index commit metadata and return
+      List<HoodieIndexPartitionInfo> finalIndexPartitionInfos = indexPartitionInfos.stream()
+          .map(info -> new HoodieIndexPartitionInfo(
+              info.getVersion(),
+              info.getMetadataPartitionPath(),
+              currentIndexedInstant))
+          .collect(Collectors.toList());
+      HoodieIndexCommitMetadata indexCommitMetadata = HoodieIndexCommitMetadata.newBuilder()
+          .setVersion(LATEST_INDEX_COMMIT_METADATA_VERSION).setIndexPartitionInfos(finalIndexPartitionInfos).build();
+      try {
+        txnManager.beginTransaction();
+        table.getActiveTimeline().saveAsComplete(

Review comment:
       from what we discussed, we should update the tableConfig here.
   
   also, wrt updating table config, not sure if we need a lock or not. please validate it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4693: [WIP][HUDI-3175][RFC-45] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#issuecomment-1073679460


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5533",
       "triggerID" : "238b128260cab3ad11c8e00bd20871b45e112c83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5618",
       "triggerID" : "ca12a7818b2a799fb57ee04376dfcb14d628cdb2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5619",
       "triggerID" : "c5c563ffa6625d610c9c6bd252457129ce5ccddc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5709",
       "triggerID" : "06c6dd9db383efa291c999d5f0140e5d2493eeaf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5801",
       "triggerID" : "7920cb15d99cd92ea2a3e6bd515249eb63040772",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6840",
       "triggerID" : "e6e3e1612928fb0892d071ec4c3a26e31ce1ff76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6918",
       "triggerID" : "4a036d809018043ed0d99adccbe0efdfd920284a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6925",
       "triggerID" : "6a577410251d17a1f2b9e782ded4908fec9977a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6929",
       "triggerID" : "5c1c7e91b5f530907cda50135fef8286ee8a8e38",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6930",
       "triggerID" : "a9f8c1316b55b72c57d18fbe8d0c8103948a30bc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6990",
       "triggerID" : "0d6ad6e1d8767d66b15b31bb06d1318fb08e582c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038",
       "triggerID" : "680a99a669d9e2c2e81465efe8e491812e6c3012",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124",
       "triggerID" : "1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 680a99a669d9e2c2e81465efe8e491812e6c3012 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7038) 
   * 1f9e535ef629c8b35c5edc2cfd5687b2d55c29f2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7124) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4693: [HUDI-2488][HUDI-3175] Implement async metadata indexing

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4693:
URL: https://github.com/apache/hudi/pull/4693#discussion_r835845988



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieIndexer.java
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.avro.model.HoodieIndexCommitMetadata;
+import org.apache.hudi.avro.model.HoodieIndexPartitionInfo;
+import org.apache.hudi.client.SparkRDDWriteClient;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieIndexException;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.common.table.timeline.HoodieInstant.State.INFLIGHT;
+import static org.apache.hudi.utilities.UtilHelpers.EXECUTE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE;
+import static org.apache.hudi.utilities.UtilHelpers.SCHEDULE_AND_EXECUTE;
+
+public class HoodieIndexer {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieIndexer.class);
+
+  private final HoodieIndexer.Config cfg;
+  private TypedProperties props;
+  private final JavaSparkContext jsc;
+  private final HoodieTableMetaClient metaClient;
+
+  public HoodieIndexer(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    this.cfg = cfg;
+    this.jsc = jsc;
+    this.props = StringUtils.isNullOrEmpty(cfg.propsFilePath)
+        ? UtilHelpers.buildProperties(cfg.configs)
+        : readConfigFromFileSystem(jsc, cfg);
+    this.metaClient = UtilHelpers.createMetaClient(jsc, cfg.basePath, true);
+  }
+
+  private TypedProperties readConfigFromFileSystem(JavaSparkContext jsc, HoodieIndexer.Config cfg) {
+    return UtilHelpers.readConfig(jsc.hadoopConfiguration(), new Path(cfg.propsFilePath), cfg.configs)
+        .getProps(true);
+  }
+
+  public static class Config implements Serializable {
+    @Parameter(names = {"--base-path", "-sp"}, description = "Base path for the table", required = true)
+    public String basePath = null;
+    @Parameter(names = {"--table-name", "-tn"}, description = "Table name", required = true)
+    public String tableName = null;
+    @Parameter(names = {"--instant-time", "-it"}, description = "Indexing Instant time")
+    public String indexInstantTime = null;
+    @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism for hoodie insert", required = true)
+    public int parallelism = 1;
+    @Parameter(names = {"--spark-master", "-ms"}, description = "Spark master")
+    public String sparkMaster = null;
+    @Parameter(names = {"--spark-memory", "-sm"}, description = "spark memory to use", required = true)

Review comment:
       All tools, including HoodieCompactor and HoodieClusteringJob, take spark parameters as input to run the job. Do you mean to abstract out and refactor at a higher level?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org