You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "jonvex (via GitHub)" <gi...@apache.org> on 2023/03/28 02:36:10 UTC

[GitHub] [hudi] jonvex opened a new pull request, #8303: use hadoopfsrelation for bootstrap

jonvex opened a new pull request, #8303:
URL: https://github.com/apache/hudi/pull/8303

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1489003102

   @hudi-bot run azure
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1533899798

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * 3ad5ae580928952bb601cf90f09abb53d1d436e4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798) 
   * e4144fb95b764a96f71b125bd02fd62bac9f00ba UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1553959688

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b8772a74388873c35b1a13ba6ef99ecda9246646 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1522728741

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324) 
   * 732fbf0bf522d987baff2e6831fa85a0b5597c88 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1165049844


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
    Can you elaborate what optimization are being done to HadoopFsRelation that causes 100% speed up ? I don't seem to find this information from the PR description. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1532141550

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * e7795634f222d6d27363dc4900c9fb458105ffce Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743) 
   * 3ad5ae580928952bb601cf90f09abb53d1d436e4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1201546844


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBootstrapRelation.scala:
##########
@@ -188,11 +188,23 @@ case class HoodieBootstrapRelation(override val sqlContext: SQLContext,
 
   override def updatePrunedDataSchema(prunedSchema: StructType): HoodieBootstrapRelation =
     this.copy(prunedDataSchema = Some(prunedSchema))
+
+  def toHadoopFsRelation: HadoopFsRelation = {
+      HadoopFsRelation(
+        location = fileIndex,
+        partitionSchema = fileIndex.partitionSchema,
+        dataSchema = fileIndex.dataSchema,
+        bucketSpec = None,
+        fileFormat = fileFormat,
+        optParams)(sparkSession)
+  }
 }
 
 
 object HoodieBootstrapRelation {
 
+  val USE_FAST_BOOTSTRAP_READ = "hoodie.bootstrap.relation.use.fast.bootstrap.read"

Review Comment:
   @jonvex : Can we just use one config hoodie.bootstrap.data.queries.only and get away with hoodie.bootstrap.relation.use.fast.bootstrap.read ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1559844835

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17304",
       "triggerID" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b8772a74388873c35b1a13ba6ef99ecda9246646 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203) 
   * f361b40cba23c728338a5163b0c00c50ac6c60b8 UNKNOWN
   * 27375abd2d676eb530d0ee2d2803efddce0bb92c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17304) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: use hadoopfsrelation for bootstrap

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1487558217

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 221af924fdc9941787af0355360a5399c0386167 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948) 
   * 7f9a12f98ef6555b75617aeee8eac57760c28c04 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1551839058

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811",
       "triggerID" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ed2644fd1a2a6f6eec727a77251e7a9908fabd0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17162",
       "triggerID" : "0ed2644fd1a2a6f6eec727a77251e7a9908fabd0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * e4144fb95b764a96f71b125bd02fd62bac9f00ba Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811) 
   * 0ed2644fd1a2a6f6eec727a77251e7a9908fabd0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17162) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: use hadoopfsrelation for bootstrap

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1486150405

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 221af924fdc9941787af0355360a5399c0386167 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1490532138

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 498f23ea7365f6b6aa781f8fee34f5c6b6b63314 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976) 
   * 78befd976f513cad8901e3c0e5fc97eabef19da2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1522763024

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324) 
   * 732fbf0bf522d987baff2e6831fa85a0b5597c88 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665) 
   * c6908a16bf2f1fb46735781f8d969177eadc23a4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1505378442

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5bca709bbf2690b2cd3a077f8214bd6e627d8420 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127) 
   * 6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1527823426

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * 76394b7ed5df01286d5085e5f1b43a47e52baa5d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715) 
   * e7795634f222d6d27363dc4900c9fb458105ffce Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1532254235

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * 3ad5ae580928952bb601cf90f09abb53d1d436e4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1533904674

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811",
       "triggerID" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * 3ad5ae580928952bb601cf90f09abb53d1d436e4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798) 
   * e4144fb95b764a96f71b125bd02fd62bac9f00ba Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1490913875

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 78befd976f513cad8901e3c0e5fc97eabef19da2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1496134699

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 78befd976f513cad8901e3c0e5fc97eabef19da2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997) 
   * 5bca709bbf2690b2cd3a077f8214bd6e627d8420 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1167644211


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
   @jonvex : Can we make HoodieBootstrapRelation/HoodieBaseRelation extend HadoopFsRelation to get the behavior ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1489042916

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7f9a12f98ef6555b75617aeee8eac57760c28c04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962) 
   * 4a200747aaf17a8889519e51b2da5fcda2496da5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: use hadoopfsrelation for bootstrap

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1487679360

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7f9a12f98ef6555b75617aeee8eac57760c28c04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1489254360

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4a200747aaf17a8889519e51b2da5fcda2496da5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1523417964

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c6908a16bf2f1fb46735781f8d969177eadc23a4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1551901276

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811",
       "triggerID" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ed2644fd1a2a6f6eec727a77251e7a9908fabd0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17162",
       "triggerID" : "0ed2644fd1a2a6f6eec727a77251e7a9908fabd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * e4144fb95b764a96f71b125bd02fd62bac9f00ba Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811) 
   * 0ed2644fd1a2a6f6eec727a77251e7a9908fabd0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17162) 
   * b8772a74388873c35b1a13ba6ef99ecda9246646 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1533985556

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811",
       "triggerID" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * e4144fb95b764a96f71b125bd02fd62bac9f00ba Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1157363856


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:
##########
@@ -180,6 +180,22 @@ case class HoodieFileIndex(spark: SparkSession,
     }
   }
 
+  /**
+   * In the fast bootstrap read code path, it gets the file status for the bootstrap base files instead of
+   * skeleton files.
+   */
+  private def getBaseFileStatus(baseFiles: mutable.Buffer[HoodieBaseFile]): mutable.Buffer[FileStatus] = {
+    if (shouldFastBootstrap) {
+     return baseFiles.map(f =>
+        if (f.getBootstrapBaseFile.isPresent) {
+         f.getBootstrapBaseFile.get().getFileStatus

Review Comment:
   Not sure I understand the question



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1559774438

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b8772a74388873c35b1a13ba6ef99ecda9246646 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203) 
   * f361b40cba23c728338a5163b0c00c50ac6c60b8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1562243166

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17304",
       "triggerID" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "551c52dd6cbaaa4b48e08c9388fd1fd67cb1d9c5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17345",
       "triggerID" : "551c52dd6cbaaa4b48e08c9388fd1fd67cb1d9c5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f361b40cba23c728338a5163b0c00c50ac6c60b8 UNKNOWN
   * 551c52dd6cbaaa4b48e08c9388fd1fd67cb1d9c5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17345) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1551828727

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811",
       "triggerID" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ed2644fd1a2a6f6eec727a77251e7a9908fabd0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0ed2644fd1a2a6f6eec727a77251e7a9908fabd0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * e4144fb95b764a96f71b125bd02fd62bac9f00ba Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811) 
   * 0ed2644fd1a2a6f6eec727a77251e7a9908fabd0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1157360595


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
   If you want to read the metadata columns you need to disable it. I found a few tests that use the metadata columns and I would assume that some users must



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1162307091


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:
##########
@@ -180,6 +180,22 @@ case class HoodieFileIndex(spark: SparkSession,
     }
   }
 
+  /**
+   * In the fast bootstrap read code path, it gets the file status for the bootstrap base files instead of
+   * skeleton files.
+   */
+  private def getBaseFileStatus(baseFiles: mutable.Buffer[HoodieBaseFile]): mutable.Buffer[FileStatus] = {
+    if (shouldFastBootstrap) {
+     return baseFiles.map(f =>
+        if (f.getBootstrapBaseFile.isPresent) {
+         f.getBootstrapBaseFile.get().getFileStatus

Review Comment:
   Discussed offline. This needs to be guarded as in the fast bootstrap path we only scan source files and won't have meta columns to stitch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1522768292

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 732fbf0bf522d987baff2e6831fa85a0b5597c88 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665) 
   * c6908a16bf2f1fb46735781f8d969177eadc23a4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1526782528

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c6908a16bf2f1fb46735781f8d969177eadc23a4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668) 
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * 76394b7ed5df01286d5085e5f1b43a47e52baa5d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1526812836

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * 76394b7ed5df01286d5085e5f1b43a47e52baa5d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1164217702


##########
docker/demo/sparksql-batch2.commands:
##########
@@ -26,7 +26,8 @@ spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from s
 spark.sql("select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG'").show(100, false)
 spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_rt where  symbol = 'GOOG'").show(100, false)
 
- // Copy-On-Write Bootstrapped table
+// Copy-On-Write Bootstrapped table
+spark.sql("set hoodie.bootstrap.data.queries.only=false")

Review Comment:
   I updated it so now it will use the feature in this test on the queries that don't use the meta fields



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: use hadoopfsrelation for bootstrap

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1487549402

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 221af924fdc9941787af0355360a5399c0386167 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948) 
   * 7f9a12f98ef6555b75617aeee8eac57760c28c04 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1561362515

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17304",
       "triggerID" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "551c52dd6cbaaa4b48e08c9388fd1fd67cb1d9c5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "551c52dd6cbaaa4b48e08c9388fd1fd67cb1d9c5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f361b40cba23c728338a5163b0c00c50ac6c60b8 UNKNOWN
   * 27375abd2d676eb530d0ee2d2803efddce0bb92c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17304) 
   * 551c52dd6cbaaa4b48e08c9388fd1fd67cb1d9c5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1554395490

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b8772a74388873c35b1a13ba6ef99ecda9246646 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1551912333

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811",
       "triggerID" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ed2644fd1a2a6f6eec727a77251e7a9908fabd0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17162",
       "triggerID" : "0ed2644fd1a2a6f6eec727a77251e7a9908fabd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * 0ed2644fd1a2a6f6eec727a77251e7a9908fabd0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17162) 
   * b8772a74388873c35b1a13ba6ef99ecda9246646 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1507082083

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289) 
   * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1156740671


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
   I think we should do away with the config and rely on the condition here to decide whether or not to use the fast read path (which should be done by default). Wdyt?



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -807,7 +807,9 @@ class TestHoodieSparkSqlWriter {
         .option("hoodie.insert.shuffle.parallelism", "4")
         .mode(SaveMode.Append).save(tempBasePath)
 
-      val currentCommits = spark.read.format("hudi").load(tempBasePath).select("_hoodie_commit_time").take(1).map(_.getString(0))
+      val currentCommits = spark.read.format("hudi")
+        .option(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key, "false")

Review Comment:
   Need more tests. Setting it to `false` does not test the changed code path.



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:
##########
@@ -180,6 +180,22 @@ case class HoodieFileIndex(spark: SparkSession,
     }
   }
 
+  /**
+   * In the fast bootstrap read code path, it gets the file status for the bootstrap base files instead of
+   * skeleton files.
+   */
+  private def getBaseFileStatus(baseFiles: mutable.Buffer[HoodieBaseFile]): mutable.Buffer[FileStatus] = {
+    if (shouldFastBootstrap) {
+     return baseFiles.map(f =>
+        if (f.getBootstrapBaseFile.isPresent) {
+         f.getBootstrapBaseFile.get().getFileStatus

Review Comment:
   Why do we need to guard this by `shouldFastBootstrap` conditional? Shouldn't we always return the source file status if it's present>



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala:
##########
@@ -83,10 +83,18 @@ class SparkHoodieTableFileIndex(spark: SparkSession,
   /**
    * Get the schema of the table.
    */
-  lazy val schema: StructType = schemaSpec.getOrElse({
-    val schemaUtil = new TableSchemaResolver(metaClient)
-    AvroConversionUtils.convertAvroSchemaToStructType(schemaUtil.getTableAvroSchema)
-  })
+  lazy val schema: StructType = if (shouldFastBootstrap) {
+      StructType(rawSchema.fields.filterNot(f => HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION.contains(f.name)))

Review Comment:
   just import the static member `HOODIE_META_COLUMNS_WITH_OPERATION` instead of importing full `HoodieRecord`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1156744308


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:
##########
@@ -180,6 +180,22 @@ case class HoodieFileIndex(spark: SparkSession,
     }
   }
 
+  /**
+   * In the fast bootstrap read code path, it gets the file status for the bootstrap base files instead of
+   * skeleton files.
+   */
+  private def getBaseFileStatus(baseFiles: mutable.Buffer[HoodieBaseFile]): mutable.Buffer[FileStatus] = {
+    if (shouldFastBootstrap) {
+     return baseFiles.map(f =>
+        if (f.getBootstrapBaseFile.isPresent) {
+         f.getBootstrapBaseFile.get().getFileStatus

Review Comment:
   Why do we need to guard this by `shouldFastBootstrap` conditional? Shouldn't we always return the source file status if it's present?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1496150469

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 78befd976f513cad8901e3c0e5fc97eabef19da2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997) 
   * 5bca709bbf2690b2cd3a077f8214bd6e627d8420 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1180550309


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -271,6 +273,25 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    val isSchemaEvolutionEnabledOnRead = HoodieSparkConfUtils.getConfigValue(parameters,
+      sqlContext.sparkSession.sessionState.conf, DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.key,
+      DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.defaultValue.toString).toBoolean
+    if (!enableFileIndex || isSchemaEvolutionEnabledOnRead
+      || globPaths.nonEmpty || !parameters.getOrElse(DATA_QUERIES_ONLY.key, DATA_QUERIES_ONLY.defaultValue).toBoolean) {

Review Comment:
   To answer your first question: I got that condition from BaseFileOnlyRelation.toHadoopFsRelation. 
   For the second question, I need to go through today and update the existing bootstrap tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1532151375

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * e7795634f222d6d27363dc4900c9fb458105ffce Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743) 
   * 3ad5ae580928952bb601cf90f09abb53d1d436e4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1161793098


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
   We need to know at the point of creating the relation, so I don't think this can be done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1164524745


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
   hmmm, @jonvex : if you look at HoodieBootstrapRelation.composeRDD (the relation is being instantiated in below line), we segregate the skeleton schema and base file schema. Can we move the optimization logic inside that ?  My main concern is this would break the existing functionality of bootstrap queries including hudi metafields failing unless user turn off the feature. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1507573458

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: use hadoopfsrelation for bootstrap

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1486230955

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 221af924fdc9941787af0355360a5399c0386167 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1489016351

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 7f9a12f98ef6555b75617aeee8eac57760c28c04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1489554462

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 498f23ea7365f6b6aa781f8fee34f5c6b6b63314 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1489414600

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4a200747aaf17a8889519e51b2da5fcda2496da5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975) 
   * 498f23ea7365f6b6aa781f8fee34f5c6b6b63314 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1554725754

   @bvaradar Yes, it is ready for review. I wrote a a lot of tests to ensure that this matched the functionality of the regular bootstrap read. However, I discovered that there were some issues with bootstrap such as https://github.com/apache/hudi/pull/8666 and https://issues.apache.org/jira/browse/HUDI-6201 (which is still unsolved). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope merged pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope merged PR #8303:
URL: https://github.com/apache/hudi/pull/8303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1552307320

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16798",
       "triggerID" : "3ad5ae580928952bb601cf90f09abb53d1d436e4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16811",
       "triggerID" : "e4144fb95b764a96f71b125bd02fd62bac9f00ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ed2644fd1a2a6f6eec727a77251e7a9908fabd0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17162",
       "triggerID" : "0ed2644fd1a2a6f6eec727a77251e7a9908fabd0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * b8772a74388873c35b1a13ba6ef99ecda9246646 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1202649672


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBootstrapRelation.scala:
##########
@@ -188,11 +188,23 @@ case class HoodieBootstrapRelation(override val sqlContext: SQLContext,
 
   override def updatePrunedDataSchema(prunedSchema: StructType): HoodieBootstrapRelation =
     this.copy(prunedDataSchema = Some(prunedSchema))
+
+  def toHadoopFsRelation: HadoopFsRelation = {
+      HadoopFsRelation(
+        location = fileIndex,
+        partitionSchema = fileIndex.partitionSchema,
+        dataSchema = fileIndex.dataSchema,
+        bucketSpec = None,
+        fileFormat = fileFormat,
+        optParams)(sparkSession)
+  }
 }
 
 
 object HoodieBootstrapRelation {
 
+  val USE_FAST_BOOTSTRAP_READ = "hoodie.bootstrap.relation.use.fast.bootstrap.read"

Review Comment:
   Ok. I was able to reuse the config. The reason I introduced the config was because I was worried about the case where the config is enabled but we decide to use the regular bootstrap read. I tested and made sure the config was correct in all places where it is used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1496736168

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5bca709bbf2690b2cd3a077f8214bd6e627d8420 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1553964669

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b8772a74388873c35b1a13ba6ef99ecda9246646 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1563829690

   @bvaradar The changes looks good to me. Can you take another pass?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1162986794


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -100,7 +101,7 @@ class DefaultSource extends RelationProvider
       )
     } else {
       Map()
-    }) ++ DataSourceOptionsHelper.parametersWithReadDefaults(optParams)
+    }) ++ DataSourceOptionsHelper.parametersWithReadDefaults(sqlContext.getAllConfs.filter(k => k._1.startsWith("hoodie.")) ++ optParams)

Review Comment:
   Currently we can't set read configs in spark sql using the syntax like "set hoodie.bootstrap.data.queries.only=false". It only works for write configs. This was something we wanted to add anyways: https://issues.apache.org/jira/browse/HUDI-5361 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1159880935


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
   I get it. But, does it need to be inferred through a separate config? Can we not infer from the already available parameters?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1528434133

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * e7795634f222d6d27363dc4900c9fb458105ffce Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16743) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1180087583


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -271,6 +273,25 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    val isSchemaEvolutionEnabledOnRead = HoodieSparkConfUtils.getConfigValue(parameters,
+      sqlContext.sparkSession.sessionState.conf, DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.key,
+      DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.defaultValue.toString).toBoolean
+    if (!enableFileIndex || isSchemaEvolutionEnabledOnRead
+      || globPaths.nonEmpty || !parameters.getOrElse(DATA_QUERIES_ONLY.key, DATA_QUERIES_ONLY.defaultValue).toBoolean) {

Review Comment:
   Also, How are we ensuring that for MOR, the behavior is unchanged ? 



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -271,6 +273,25 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    val isSchemaEvolutionEnabledOnRead = HoodieSparkConfUtils.getConfigValue(parameters,
+      sqlContext.sparkSession.sessionState.conf, DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.key,
+      DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.defaultValue.toString).toBoolean
+    if (!enableFileIndex || isSchemaEvolutionEnabledOnRead
+      || globPaths.nonEmpty || !parameters.getOrElse(DATA_QUERIES_ONLY.key, DATA_QUERIES_ONLY.defaultValue).toBoolean) {

Review Comment:
   Can you explain why globPaths.nonEmpty is included here. Not following it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1505392181

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5bca709bbf2690b2cd3a077f8214bd6e627d8420 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127) 
   * 6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1165555978


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
   https://issues.apache.org/jira/browse/HUDI-3896 I am not sure if this is the only optimization, but it is one of them. The query plans for non bootstrapped and bootstrap tables look pretty much identical except non bootstrap says "FileScan parquet" when reading and bootstrap reading says "scan HoodieBootstrapRelation"
   
   I started by comparing time to run tpcds queries on boostrapped tables vs non bootstrapped. For a full bootstrap, the runtime ratio was 1.997 and for a metadata only bootstrap it was 1.638.
   
   I thought that was surprising that the full bootstrap was so slow, so I tried to replicate what was being done in BaseFileOnlyRelation in the first commit in [this pr](https://github.com/apache/hudi/pull/8272). We create a HoodieFileScanRDD instead of a HoodieBootstrapRDD. The ratio of tpcds runtime compared to reading from a non bootstrap table was 1.48 for a full bootstrap table, and 1.35 for a metadata only bootstrap. 
    
    With the changes in this pr to leverage HadoopFsRelation the ratio was 1.12 for metadata only bootstrap, and 1.09 for full bootstrap. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1162253186


##########
docker/demo/sparksql-batch2.commands:
##########
@@ -26,7 +26,8 @@ spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from s
 spark.sql("select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG'").show(100, false)
 spark.sql("select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_rt where  symbol = 'GOOG'").show(100, false)
 
- // Copy-On-Write Bootstrapped table
+// Copy-On-Write Bootstrapped table
+spark.sql("set hoodie.bootstrap.data.queries.only=false")

Review Comment:
   Are there any integration test for bootstrap where we test with this feature on?



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -100,7 +101,7 @@ class DefaultSource extends RelationProvider
       )
     } else {
       Map()
-    }) ++ DataSourceOptionsHelper.parametersWithReadDefaults(optParams)
+    }) ++ DataSourceOptionsHelper.parametersWithReadDefaults(sqlContext.getAllConfs.filter(k => k._1.startsWith("hoodie.")) ++ optParams)

Review Comment:
   Why is this needed ? 



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
   @jonvex : Wouldn't this change cause user queries which includes hoodie metadata columns to fail ? Can't we just userschema being passed here to determine if there are any hoodie metadata columns being queried to determine appropriate next steps ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1489423804

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4a200747aaf17a8889519e51b2da5fcda2496da5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975) 
   * 498f23ea7365f6b6aa781f8fee34f5c6b6b63314 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1526770534

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c6908a16bf2f1fb46735781f8d969177eadc23a4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668) 
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: use hadoopfsrelation for bootstrap

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1486146528

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 221af924fdc9941787af0355360a5399c0386167 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1522757352

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324) 
   * 732fbf0bf522d987baff2e6831fa85a0b5597c88 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1157379761


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -807,7 +807,9 @@ class TestHoodieSparkSqlWriter {
         .option("hoodie.insert.shuffle.parallelism", "4")
         .mode(SaveMode.Append).save(tempBasePath)
 
-      val currentCommits = spark.read.format("hudi").load(tempBasePath).select("_hoodie_commit_time").take(1).map(_.getString(0))
+      val currentCommits = spark.read.format("hudi")
+        .option(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key, "false")

Review Comment:
   Every other bootstrap test is now using the fast path though



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1163162168


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
   When I set a breakpoint here, userschema was null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1554000815

   @jonvex : Is this ready for review ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1559788925

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b8772a74388873c35b1a13ba6ef99ecda9246646 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203) 
   * f361b40cba23c728338a5163b0c00c50ac6c60b8 UNKNOWN
   * 27375abd2d676eb530d0ee2d2803efddce0bb92c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1560248990

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17304",
       "triggerID" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f361b40cba23c728338a5163b0c00c50ac6c60b8 UNKNOWN
   * 27375abd2d676eb530d0ee2d2803efddce0bb92c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17304) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1561377578

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17165",
       "triggerID" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8772a74388873c35b1a13ba6ef99ecda9246646",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17203",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f361b40cba23c728338a5163b0c00c50ac6c60b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17304",
       "triggerID" : "27375abd2d676eb530d0ee2d2803efddce0bb92c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "551c52dd6cbaaa4b48e08c9388fd1fd67cb1d9c5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17345",
       "triggerID" : "551c52dd6cbaaa4b48e08c9388fd1fd67cb1d9c5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f361b40cba23c728338a5163b0c00c50ac6c60b8 UNKNOWN
   * 27375abd2d676eb530d0ee2d2803efddce0bb92c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17304) 
   * 551c52dd6cbaaa4b48e08c9388fd1fd67cb1d9c5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17345) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1489031387

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7f9a12f98ef6555b75617aeee8eac57760c28c04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962) 
   * 4a200747aaf17a8889519e51b2da5fcda2496da5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1490519189

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 498f23ea7365f6b6aa781f8fee34f5c6b6b63314 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976) 
   * 78befd976f513cad8901e3c0e5fc97eabef19da2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1164683803


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -270,6 +271,21 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") {

Review Comment:
   Spark applies special optimizations to HadoopFsRelation so unless we contribute PRs to spark, this is the only way to do it as far as I can tell



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1507098386

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289) 
   * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1505803118

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1527815768

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "221af924fdc9941787af0355360a5399c0386167",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15948",
       "triggerID" : "221af924fdc9941787af0355360a5399c0386167",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f9a12f98ef6555b75617aeee8eac57760c28c04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15962",
       "triggerID" : "1489003102",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15975",
       "triggerID" : "4a200747aaf17a8889519e51b2da5fcda2496da5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15976",
       "triggerID" : "498f23ea7365f6b6aa781f8fee34f5c6b6b63314",
       "triggerType" : "PUSH"
     }, {
       "hash" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15997",
       "triggerID" : "78befd976f513cad8901e3c0e5fc97eabef19da2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16127",
       "triggerID" : "5bca709bbf2690b2cd3a077f8214bd6e627d8420",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16289",
       "triggerID" : "6a7ae704e1bc334ebc88ee8ed52b0b6ce07aaaf3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324",
       "triggerID" : "9cda89b23cbf8514e1c2e0049eea4624f3b49f10",
       "triggerType" : "PUSH"
     }, {
       "hash" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16665",
       "triggerID" : "732fbf0bf522d987baff2e6831fa85a0b5597c88",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16668",
       "triggerID" : "c6908a16bf2f1fb46735781f8d969177eadc23a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3cfef7fc92a6c5ce9bb078a7186e04614c11647f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715",
       "triggerID" : "76394b7ed5df01286d5085e5f1b43a47e52baa5d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7795634f222d6d27363dc4900c9fb458105ffce",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN
   * 76394b7ed5df01286d5085e5f1b43a47e52baa5d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16715) 
   * e7795634f222d6d27363dc4900c9fb458105ffce UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

Posted by "jonvex (via GitHub)" <gi...@apache.org>.
jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1180579947


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -271,6 +273,25 @@ object DefaultSource {
     }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+                                             globPaths: Seq[Path],
+                                             userSchema: Option[StructType],
+                                             metaClient: HoodieTableMetaClient,
+                                             parameters: Map[String, String]): BaseRelation = {
+    val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf,
+      ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+    val isSchemaEvolutionEnabledOnRead = HoodieSparkConfUtils.getConfigValue(parameters,
+      sqlContext.sparkSession.sessionState.conf, DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.key,
+      DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.defaultValue.toString).toBoolean
+    if (!enableFileIndex || isSchemaEvolutionEnabledOnRead
+      || globPaths.nonEmpty || !parameters.getOrElse(DATA_QUERIES_ONLY.key, DATA_QUERIES_ONLY.defaultValue).toBoolean) {

Review Comment:
   Looking at the existing testing for bootstrap, there are probably a lot of cases that we are not testing currently. 
   It doesn't seem like we support MOR with bootstrap very well https://issues.apache.org/jira/browse/HUDI-2071 . 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org