You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "codope (via GitHub)" <gi...@apache.org> on 2023/03/31 12:25:03 UTC

[GitHub] [hudi] codope opened a new pull request, #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

codope opened a new pull request, #8342:
URL: https://github.com/apache/hudi/pull/8342

   ### Change Logs
   
   Clustering on a bootstrap table (`METADATA_ONLY` bootstrap mode) with row writer disabled did not show correct results. Only meta-fields were populated, while data columns were null. This PR fixes the bug. It adds a separate `HoodieBootstrapFileReader` that stitches the meta columns with the data columns.
   
   Before this fix, snapshot query after clustering on bootstrap table:
   ```
   +-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+---------------------+-------------------------+---------------------------+------------------+------------+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                                                   |timestamp    |_row_key|partition_path|rider   |driver   |begin_lat           |begin_lon          |end_lat             |end_lon              |fare                     |tip_history                |_hoodie_is_deleted|datestr     |
   +-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+---------------------+-------------------------+---------------------------+------------------+------------+
   |00000000000001     |00000000000001_5_0  |trip_0            |datestr=2018          |a80c61fe-89a1-488c-8daa-dac6cea52dfb_5-10-37_00000000000001.parquet |1680265327825|trip_0  |1680265327825 |rider_0 |driver_0 |0.2909073141582583  |0.6713659942455674 |0.3199873855402988  |0.8901008450132192   |[13.409427386679251, USD]|[[76.98430157746769, USD]] |false             |datestr=2018|
   |00000000000001     |00000000000001_7_0  |trip_3            |datestr=2018          |45b13f98-496e-4107-8c66-5ce88ab69940_7-10-39_00000000000001.parquet |1680265327825|trip_3  |1680265327825 |rider_3 |driver_3 |0.13139874521266626 |0.9288890012418678 |0.19960441648570804 |0.028970072867536834 |[3.934944937321838, USD] |[[60.94692580064911, USD]] |false             |datestr=2018|
   |00000000000001     |00000000000001_7_1  |trip_4            |datestr=2018          |45b13f98-496e-4107-8c66-5ce88ab69940_7-10-39_00000000000001.parquet |1680265327825|trip_4  |1680265327825 |rider_4 |driver_4 |0.19148119051373647 |0.3121563466437075 |0.07312220393022284 |0.4623498809657779   |[84.27465303833377, USD] |[[48.54971480008592, USD]] |false             |datestr=2018|
   |00000000000001     |00000000000001_6_0  |trip_2            |datestr=2018          |4b6f5614-cfe9-42cd-bd0c-09667714a6a3_6-10-38_00000000000001.parquet |1680265327825|trip_2  |1680265327825 |rider_2 |driver_2 |0.29293250471488286 |0.8169497077647824 |0.4575395537485407  |0.37034912499009554  |[65.48417669107184, USD] |[[51.323010501226705, USD]]|false             |datestr=2018|
   |00000000000001     |00000000000001_4_0  |trip_1            |datestr=2018          |2f656a57-d3d3-453b-bea3-beb3f86a2cfc_4-10-36_00000000000001.parquet |1680265327825|trip_1  |1680265327825 |rider_1 |driver_1 |0.7593035032651309  |0.4695942868315275 |0.04062310794619961 |0.7483312940246941   |[99.53761667379452, USD] |[[36.68130227843157, USD]] |false             |datestr=2018|
   |00000000000001     |00000000000001_8_0  |trip_6            |datestr=2019          |0e43fd89-9294-4630-8f7b-b782f15377b8_8-10-40_00000000000001.parquet |1680265327825|trip_6  |1680265327825 |rider_6 |driver_6 |0.6576893480206276  |0.20124822123740116|0.5587907101480606  |0.0087676912597352   |[46.3596114051868, USD]  |[[1.4482069738172454, USD]]|false             |datestr=2019|
   |00000000000001     |00000000000001_9_0  |trip_5            |datestr=2019          |1260bd0a-e1b0-469e-9407-c0952a2e5bce_9-10-41_00000000000001.parquet |1680265327825|trip_5  |1680265327825 |rider_5 |driver_5 |0.8780482394034513  |0.45016664520520033|0.1210946590521833  |0.559346262842122    |[3.980544730087332, USD] |[[11.81867856830614, USD]] |false             |datestr=2019|
   |00000000000001     |00000000000001_10_0 |trip_7            |datestr=2019          |ca2423ad-40f9-437a-a009-bf5b14cedb34_10-10-42_00000000000001.parquet|1680265327825|trip_7  |1680265327825 |rider_7 |driver_7 |0.8539282876074638  |0.6288419331027626 |0.1199959028048404  |0.19234888544292428  |[17.28229998461128, USD] |[[77.49172321783067, USD]] |false             |datestr=2019|
   |00000000000001     |00000000000001_11_0 |trip_8            |datestr=2019          |11502732-a705-4f63-9b8e-3ace93d8c9f4_11-10-43_00000000000001.parquet|1680265327825|trip_8  |1680265327825 |rider_8 |driver_8 |0.5247015895548016  |0.09543754441513863|0.1510348079622863  |0.3036501516600335   |[18.50748211199097, USD] |[[80.26618263126355, USD]] |false             |datestr=2019|
   |00000000000001     |00000000000001_11_1 |trip_9            |datestr=2019          |11502732-a705-4f63-9b8e-3ace93d8c9f4_11-10-43_00000000000001.parquet|1680265327825|trip_9  |1680265327825 |rider_9 |driver_9 |0.18732285899232892 |0.419057912375039  |0.9402509062992255  |0.7540875540699798   |[77.90400106882183, USD] |[[89.12865661547804, USD]] |false             |datestr=2019|
   |00000000000001     |00000000000001_3_0  |trip_10           |datestr=2020          |75960f03-0093-438c-b71c-fe5eb02496e4_3-10-35_00000000000001.parquet |1680265327825|trip_10 |1680265327825 |rider_10|driver_10|0.7945595842585961  |0.849250587072739  |0.8016352053998793  |0.6664019129654204   |[68.54476863463951, USD] |[[78.73973533402236, USD]] |false             |datestr=2020|
   |00000000000001     |00000000000001_0_0  |trip_12           |datestr=2020          |920c7f2e-0cc9-46b6-8780-3e3312ef133c_0-10-32_00000000000001.parquet |1680265327825|trip_12 |1680265327825 |rider_12|driver_12|0.26359097652813546 |0.3040963404277949 |0.783608220421833   |0.26773327561669813  |[8.899266098961778, USD] |[[63.19151746906088, USD]] |false             |datestr=2020|
   |00000000000001     |00000000000001_1_0  |trip_13           |datestr=2020          |3cc87619-d56d-4a6d-9023-8af97824bfac_1-10-33_00000000000001.parquet |1680265327825|trip_13 |1680265327825 |rider_13|driver_13|0.037809287288638194|0.20234037038861052|0.7404294591470656  |0.29316985501104065  |[93.45037833211967, USD] |[[50.56012365982448, USD]] |false             |datestr=2020|
   |00000000000001     |00000000000001_1_1  |trip_14           |datestr=2020          |3cc87619-d56d-4a6d-9023-8af97824bfac_1-10-33_00000000000001.parquet |1680265327825|trip_14 |1680265327825 |rider_14|driver_14|0.7519002026514892  |0.9448162986968871 |0.40054933992868635 |0.0038455626793925113|[15.880759811433354, USD]|[[84.44445639423378, USD]] |false             |datestr=2020|
   |00000000000001     |00000000000001_2_0  |trip_11           |datestr=2020          |50b2f640-e284-49ef-a1f8-f5a819a0e7be_2-10-34_00000000000001.parquet |1680265327825|trip_11 |1680265327825 |rider_11|driver_11|0.23032054239540056 |0.9100367991551281 |0.022237439482133525|0.08921895796973023  |[68.27062120012675, USD] |[[39.13358730683697, USD]] |false             |datestr=2020|
   +-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+---------------------+-------------------------+---------------------------+------------------+------------+
   ```
   After this fix:
   ```
   +-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+-------------------+-------------------------+---------------------------+------------------+------------+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                                                   |timestamp    |_row_key|partition_path|rider   |driver   |begin_lat           |begin_lon          |end_lat             |end_lon            |fare                     |tip_history                |_hoodie_is_deleted|datestr     |
   +-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+-------------------+-------------------------+---------------------------+------------------+------------+
   |00000000000001     |00000000000001_4_0  |trip_3            |datestr=2018          |bf676b9f-2e9b-48fd-a6a1-26d8139aecd0_4-10-36_00000000000001.parquet |1680265436528|trip_3  |1680265436528 |rider_3 |driver_3 |0.5082028544317309  |0.6186035619925132 |0.019487324652589844|0.34244473537926823|[92.41566255606341, USD] |[[67.8016115074926, USD]]  |false             |datestr=2018|
   |00000000000001     |00000000000001_4_1  |trip_4            |datestr=2018          |bf676b9f-2e9b-48fd-a6a1-26d8139aecd0_4-10-36_00000000000001.parquet |1680265436528|trip_4  |1680265436528 |rider_4 |driver_4 |0.5182280625084768  |0.9253109379737152 |0.33233798005862314 |0.7110019996809055 |[4.44409622575439, USD]  |[[33.869898194219516, USD]]|false             |datestr=2018|
   |00000000000001     |00000000000001_6_0  |trip_0            |datestr=2018          |0f4f79d1-c011-4bc9-9f89-8a4a2197ef56_6-10-38_00000000000001.parquet |1680265436528|trip_0  |1680265436528 |rider_0 |driver_0 |0.12176296539745046 |0.382558364451396  |0.0870559794514496  |0.27640429152343515|[92.1024811423022, USD]  |[[67.77835365292796, USD]] |false             |datestr=2018|
   |00000000000001     |00000000000001_5_0  |trip_2            |datestr=2018          |dec41ce1-9fe8-4cf2-99ab-ffa25297a2da_5-10-37_00000000000001.parquet |1680265436528|trip_2  |1680265436528 |rider_2 |driver_2 |0.5522660335262106  |0.7589583434997402 |0.6039198595852253  |0.8361083230362024 |[78.0609254553147, USD]  |[[27.858948192411514, USD]]|false             |datestr=2018|
   |00000000000001     |00000000000001_7_0  |trip_1            |datestr=2018          |df6c3238-a4f2-4796-b582-2e427c0e1dcd_7-10-39_00000000000001.parquet |1680265436528|trip_1  |1680265436528 |rider_1 |driver_1 |0.7389516331004687  |0.28811408775028   |0.7200780424137405  |0.484662130326595  |[1.6077573601573025, USD]|[[10.341913607318709, USD]]|false             |datestr=2018|
   |00000000000001     |00000000000001_8_0  |trip_5            |datestr=2019          |37ffa376-a86d-4706-a785-2711fe13aa78_8-10-40_00000000000001.parquet |1680265436528|trip_5  |1680265436528 |rider_5 |driver_5 |0.1630151212353752  |0.27057428081894186|0.3808059886411259  |0.3692283742910598 |[31.179184715024654, USD]|[[93.96021299492908, USD]] |false             |datestr=2019|
   |00000000000001     |00000000000001_9_0  |trip_6            |datestr=2019          |4b1783be-5cae-4ef9-9030-60cbce595531_9-10-41_00000000000001.parquet |1680265436528|trip_6  |1680265436528 |rider_6 |driver_6 |0.5420218856799521  |0.3717532476763643 |0.7316585090626965  |0.5182677308446296 |[49.210873427144186, USD]|[[2.034155984429642, USD]] |false             |datestr=2019|
   |00000000000001     |00000000000001_10_0 |trip_8            |datestr=2019          |8af5538e-bbe9-4291-8814-bc5e356f90dc_10-10-42_00000000000001.parquet|1680265436528|trip_8  |1680265436528 |rider_8 |driver_8 |0.8253202558194069  |0.8769063071666001 |0.9978855323416493  |0.07003530632543731|[22.31002279951365, USD] |[[30.365612077091576, USD]]|false             |datestr=2019|
   |00000000000001     |00000000000001_10_1 |trip_9            |datestr=2019          |8af5538e-bbe9-4291-8814-bc5e356f90dc_10-10-42_00000000000001.parquet|1680265436528|trip_9  |1680265436528 |rider_9 |driver_9 |0.31560399915225323 |0.496779058144757  |0.6974261081429741  |0.9073312408362796 |[87.04727640702991, USD] |[[96.17579621323826, USD]] |false             |datestr=2019|
   |00000000000001     |00000000000001_11_0 |trip_7            |datestr=2019          |eea5a961-af77-4834-9abf-73cc5bf20eff_11-10-43_00000000000001.parquet|1680265436528|trip_7  |1680265436528 |rider_7 |driver_7 |0.08038761693792418 |0.632904243467236  |0.555660576167659   |0.4872642442124486 |[13.555441426862014, USD]|[[10.544626239374132, USD]]|false             |datestr=2019|
   |00000000000001     |00000000000001_3_0  |trip_12           |datestr=2020          |1cc534c7-1e6f-4412-bd6c-c1855070974a_3-10-35_00000000000001.parquet |1680265436528|trip_12 |1680265436528 |rider_12|driver_12|0.004079088549327037|0.16874021976709552|0.20828594874323636 |0.895462473317559  |[92.18052838420539, USD] |[[44.650703399553215, USD]]|false             |datestr=2020|
   |00000000000001     |00000000000001_1_0  |trip_11           |datestr=2020          |ffc5ecc7-46e9-4fe4-a6ec-7adbf1e31e33_1-10-33_00000000000001.parquet |1680265436528|trip_11 |1680265436528 |rider_11|driver_11|0.16583914122830068 |0.28708446826172784|0.6707401823203576  |0.20113024584157368|[13.875727591686381, USD]|[[52.648852351275025, USD]]|false             |datestr=2020|
   |00000000000001     |00000000000001_0_0  |trip_10           |datestr=2020          |09f09c46-8d3a-420d-a7d5-435c6280b161_0-10-32_00000000000001.parquet |1680265436528|trip_10 |1680265436528 |rider_10|driver_10|0.7531144860222685  |0.9217065388363564 |0.12736143989601045 |0.6846542499163221 |[85.46301785894622, USD] |[[67.94440570686055, USD]] |false             |datestr=2020|
   |00000000000001     |00000000000001_2_0  |trip_13           |datestr=2020          |bb0e66b1-a853-4075-a9ce-f5150d1db17e_2-10-34_00000000000001.parquet |1680265436528|trip_13 |1680265436528 |rider_13|driver_13|0.37536133167833274 |0.13380768426991696|0.7165151686625107  |0.4484507140549401 |[37.18742431963579, USD] |[[29.610616003915634, USD]]|false             |datestr=2020|
   |00000000000001     |00000000000001_2_1  |trip_14           |datestr=2020          |bb0e66b1-a853-4075-a9ce-f5150d1db17e_2-10-34_00000000000001.parquet |1680265436528|trip_14 |1680265436528 |rider_14|driver_14|0.5579756297430776  |0.39976488479239436|0.4722872205073937  |0.10655015779417953|[95.73825510010874, USD] |[[94.31603336355222, USD]] |false             |datestr=2020|
   +-------------------+--------------------+------------------+----------------------+--------------------------------------------------------------------+-------------+--------+--------------+--------+---------+--------------------+-------------------+--------------------+-------------------+-------------------------+---------------------------+------------------+------------+
   ```
   
   ### Impact
   
   A bug fix for bootstrap tables.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   Only when the base file has a bootstrap path in clustering then only the `HoodieBootstrapFileReader` will be used.
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua merged pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua merged PR #8342:
URL: https://github.com/apache/hudi/pull/8342


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1493402969

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d534d8186351c0f9831441eb58229ad068211592",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022",
       "triggerID" : "d534d8186351c0f9831441eb58229ad068211592",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16069",
       "triggerID" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d534d8186351c0f9831441eb58229ad068211592 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022) 
   * e5971f241437cdcf4302fb3bd4aa956d348e103b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16069) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1512721278

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087",
       "triggerID" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "46f6599da55859ecfc562908b67c279117c937c0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "46f6599da55859ecfc562908b67c279117c937c0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76e5ebe79dc1c1bb53c36d95210a861fbd781ffd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087) 
   * 46f6599da55859ecfc562908b67c279117c937c0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1513061330

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087",
       "triggerID" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "46f6599da55859ecfc562908b67c279117c937c0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16419",
       "triggerID" : "46f6599da55859ecfc562908b67c279117c937c0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 46f6599da55859ecfc562908b67c279117c937c0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16419) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1491879439

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d534d8186351c0f9831441eb58229ad068211592",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022",
       "triggerID" : "d534d8186351c0f9831441eb58229ad068211592",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d534d8186351c0f9831441eb58229ad068211592 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1493401669

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d534d8186351c0f9831441eb58229ad068211592",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022",
       "triggerID" : "d534d8186351c0f9831441eb58229ad068211592",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d534d8186351c0f9831441eb58229ad068211592 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022) 
   * e5971f241437cdcf4302fb3bd4aa956d348e103b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8342:
URL: https://github.com/apache/hudi/pull/8342#discussion_r1170136553


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -327,21 +332,52 @@ private HoodieData<HoodieRecord<T>> readRecordsForGroupBaseFiles(JavaSparkContex
 
     // NOTE: It's crucial to make sure that we don't capture whole "this" object into the
     //       closure, as this might lead to issues attempting to serialize its nested fields
+    HoodieTableConfig  tableConfig = getHoodieTable().getMetaClient().getTableConfig();
+    String bootstrapBasePath = tableConfig.getBootstrapBasePath().orElse(null);
+    Option<String[]> partitionFields = tableConfig.getPartitionFields();
+    String timeZoneId = jsc.getConf().get("timeZone", SQLConf.get().sessionLocalTimeZone());
+    boolean shouldValidateColumns = jsc.getConf().getBoolean("spark.sql.sources.validatePartitionColumns", true);
+
     return HoodieJavaRDD.of(jsc.parallelize(clusteringOps, clusteringOps.size())
         .mapPartitions(clusteringOpsPartition -> {
           List<Iterator<HoodieRecord<T>>> iteratorsForPartition = new ArrayList<>();
           clusteringOpsPartition.forEachRemaining(clusteringOp -> {
             try {
               Schema readerSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(writeConfig.getSchema()));
               HoodieFileReader baseFileReader = HoodieFileReaderFactory.getReaderFactory(recordType).getFileReader(hadoopConf.get(), new Path(clusteringOp.getDataFilePath()));
+              // handle bootstrap path
+              if (StringUtils.nonEmpty(clusteringOp.getBootstrapFilePath()) && StringUtils.nonEmpty(bootstrapBasePath)) {

Review Comment:
   Good catch! Refactored and added a test to cover this scenario.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1493790850

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d534d8186351c0f9831441eb58229ad068211592",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022",
       "triggerID" : "d534d8186351c0f9831441eb58229ad068211592",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16069",
       "triggerID" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087",
       "triggerID" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5971f241437cdcf4302fb3bd4aa956d348e103b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16069) 
   * 76e5ebe79dc1c1bb53c36d95210a861fbd781ffd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1496120871

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087",
       "triggerID" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76e5ebe79dc1c1bb53c36d95210a861fbd781ffd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1492497364

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d534d8186351c0f9831441eb58229ad068211592",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022",
       "triggerID" : "d534d8186351c0f9831441eb58229ad068211592",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d534d8186351c0f9831441eb58229ad068211592 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8342:
URL: https://github.com/apache/hudi/pull/8342#discussion_r1169374327


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -368,6 +404,8 @@ private Dataset<Row> readRecordsForGroupAsRow(JavaSparkContext jsc,
         .stream()
         .map(op -> {
           ArrayList<String> readPaths = new ArrayList<>();
+          // NOTE: for bootstrap tables, only need to handle data file path (ehich is the skeleton file) because

Review Comment:
   ```suggestion
             // NOTE: for bootstrap tables, only need to handle data file path (which is the skeleton file) because
   ```



##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieBootstrapFileReader.java:
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.MetadataValues;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.collection.ClosableIterator;
+
+import org.apache.avro.Schema;
+
+import java.io.IOException;
+import java.util.Set;
+
+public abstract class HoodieBootstrapFileReader<T> implements HoodieFileReader<T> {
+
+  private final HoodieFileReader<T> skeletonFileReader;
+  private final HoodieFileReader<T> dataFileReader;
+
+  private final Option<String[]> partitionFields;
+  private final Object[] partitionValues;
+
+  public HoodieBootstrapFileReader(HoodieFileReader<T> skeletonFileReader, HoodieFileReader<T> dataFileReader, Option<String[]> partitionFields, Object[] partitionValues) {
+    this.skeletonFileReader = skeletonFileReader;
+    this.dataFileReader = dataFileReader;
+    this.partitionFields = partitionFields;
+    this.partitionValues = partitionValues;
+  }
+
+  @Override
+  public String[] readMinMaxRecordKeys() {
+    return skeletonFileReader.readMinMaxRecordKeys();
+  }
+
+  @Override
+  public BloomFilter readBloomFilter() {
+    return skeletonFileReader.readBloomFilter();
+  }
+
+  @Override
+  public Set<String> filterRowKeys(Set<String> candidateRowKeys) {
+    return skeletonFileReader.filterRowKeys(candidateRowKeys);
+  }
+
+  @Override
+  public ClosableIterator<HoodieRecord<T>> getRecordIterator(Schema readerSchema, Schema requestedSchema) throws IOException {
+    ClosableIterator<HoodieRecord<T>> skeletonIterator = skeletonFileReader.getRecordIterator(readerSchema, requestedSchema);
+    ClosableIterator<HoodieRecord<T>> dataFileIterator = dataFileReader.getRecordIterator(HoodieAvroUtils.removeMetadataFields(readerSchema), requestedSchema);
+    return new ClosableIterator<HoodieRecord<T>>() {
+      @Override
+      public void close() {
+        skeletonIterator.close();
+        dataFileIterator.close();
+      }
+
+      @Override
+      public boolean hasNext() {
+        return skeletonIterator.hasNext() && dataFileIterator.hasNext();
+      }
+
+      @Override
+      public HoodieRecord<T> next() {
+        HoodieRecord<T> dataRecord = dataFileIterator.next();
+        HoodieRecord<T> skeletonRecord = skeletonIterator.next();
+        HoodieRecord<T> ret = dataRecord.prependMetaFields(readerSchema, readerSchema,
+            new MetadataValues().setCommitTime(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.COMMIT_TIME_METADATA_FIELD))
+                .setCommitSeqno(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.COMMIT_SEQNO_METADATA_FIELD))
+                .setRecordKey(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.RECORD_KEY_METADATA_FIELD))
+                .setPartitionPath(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.PARTITION_PATH_METADATA_FIELD))
+                .setFileName(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.FILENAME_METADATA_FIELD)), null);
+        if (partitionFields.isPresent()) {
+          for (int i = 0; i < partitionValues.length; i++) {
+            int position = readerSchema.getField(partitionFields.get()[i]).pos();
+            setPartitionField(position, partitionValues[i], ret.getData());
+          }
+        }
+        return ret;
+      }
+    };
+  }
+
+  protected abstract void setPartitionField(int position, Object fieldValue, T row);
+
+  @Override
+  public Schema getSchema() {
+    return skeletonFileReader.getSchema();

Review Comment:
   How is this used?  I assume this only contains meta fields.



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -327,21 +332,52 @@ private HoodieData<HoodieRecord<T>> readRecordsForGroupBaseFiles(JavaSparkContex
 
     // NOTE: It's crucial to make sure that we don't capture whole "this" object into the
     //       closure, as this might lead to issues attempting to serialize its nested fields
+    HoodieTableConfig  tableConfig = getHoodieTable().getMetaClient().getTableConfig();
+    String bootstrapBasePath = tableConfig.getBootstrapBasePath().orElse(null);
+    Option<String[]> partitionFields = tableConfig.getPartitionFields();
+    String timeZoneId = jsc.getConf().get("timeZone", SQLConf.get().sessionLocalTimeZone());
+    boolean shouldValidateColumns = jsc.getConf().getBoolean("spark.sql.sources.validatePartitionColumns", true);

Review Comment:
   Could this be moved to closure and use `hadoopConf.get()` to avoid additional variables passed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1491878143

   Rebased on top of #8289 with some cleanup and fixes. Tested for all Spark versions supported by Hudi.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1493436024

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d534d8186351c0f9831441eb58229ad068211592",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022",
       "triggerID" : "d534d8186351c0f9831441eb58229ad068211592",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16069",
       "triggerID" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5971f241437cdcf4302fb3bd4aa956d348e103b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16069) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1491871022

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d534d8186351c0f9831441eb58229ad068211592",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d534d8186351c0f9831441eb58229ad068211592",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d534d8186351c0f9831441eb58229ad068211592 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1495845722

   <img width="1147" alt="Screenshot 2023-04-04 at 5 24 58 PM" src="https://user-images.githubusercontent.com/16440354/229783859-9f594dd7-8ff2-4650-a56c-7e91ca68f493.png">
   
   `testHoodieClientBasicMultiWriterWithEarlyConflictDetection` is flaky and it's being tracked in HUDI-5831


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1496105896

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76e5ebe79dc1c1bb53c36d95210a861fbd781ffd UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8342:
URL: https://github.com/apache/hudi/pull/8342#discussion_r1170145766


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -327,21 +332,52 @@ private HoodieData<HoodieRecord<T>> readRecordsForGroupBaseFiles(JavaSparkContex
 
     // NOTE: It's crucial to make sure that we don't capture whole "this" object into the
     //       closure, as this might lead to issues attempting to serialize its nested fields
+    HoodieTableConfig  tableConfig = getHoodieTable().getMetaClient().getTableConfig();
+    String bootstrapBasePath = tableConfig.getBootstrapBasePath().orElse(null);
+    Option<String[]> partitionFields = tableConfig.getPartitionFields();
+    String timeZoneId = jsc.getConf().get("timeZone", SQLConf.get().sessionLocalTimeZone());
+    boolean shouldValidateColumns = jsc.getConf().getBoolean("spark.sql.sources.validatePartitionColumns", true);

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8342:
URL: https://github.com/apache/hudi/pull/8342#discussion_r1172103490


##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieBootstrapFileReader.java:
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.MetadataValues;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.collection.ClosableIterator;
+
+import org.apache.avro.Schema;
+
+import java.io.IOException;
+import java.util.Set;
+
+public abstract class HoodieBootstrapFileReader<T> implements HoodieFileReader<T> {
+
+  private final HoodieFileReader<T> skeletonFileReader;
+  private final HoodieFileReader<T> dataFileReader;
+
+  private final Option<String[]> partitionFields;
+  private final Object[] partitionValues;
+
+  public HoodieBootstrapFileReader(HoodieFileReader<T> skeletonFileReader, HoodieFileReader<T> dataFileReader, Option<String[]> partitionFields, Object[] partitionValues) {
+    this.skeletonFileReader = skeletonFileReader;
+    this.dataFileReader = dataFileReader;
+    this.partitionFields = partitionFields;
+    this.partitionValues = partitionValues;
+  }
+
+  @Override
+  public String[] readMinMaxRecordKeys() {
+    return skeletonFileReader.readMinMaxRecordKeys();
+  }
+
+  @Override
+  public BloomFilter readBloomFilter() {
+    return skeletonFileReader.readBloomFilter();
+  }
+
+  @Override
+  public Set<String> filterRowKeys(Set<String> candidateRowKeys) {
+    return skeletonFileReader.filterRowKeys(candidateRowKeys);
+  }
+
+  @Override
+  public ClosableIterator<HoodieRecord<T>> getRecordIterator(Schema readerSchema, Schema requestedSchema) throws IOException {
+    ClosableIterator<HoodieRecord<T>> skeletonIterator = skeletonFileReader.getRecordIterator(readerSchema, requestedSchema);
+    ClosableIterator<HoodieRecord<T>> dataFileIterator = dataFileReader.getRecordIterator(HoodieAvroUtils.removeMetadataFields(readerSchema), requestedSchema);
+    return new ClosableIterator<HoodieRecord<T>>() {
+      @Override
+      public void close() {
+        skeletonIterator.close();
+        dataFileIterator.close();
+      }
+
+      @Override
+      public boolean hasNext() {
+        return skeletonIterator.hasNext() && dataFileIterator.hasNext();
+      }
+
+      @Override
+      public HoodieRecord<T> next() {
+        HoodieRecord<T> dataRecord = dataFileIterator.next();
+        HoodieRecord<T> skeletonRecord = skeletonIterator.next();
+        HoodieRecord<T> ret = dataRecord.prependMetaFields(readerSchema, readerSchema,
+            new MetadataValues().setCommitTime(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.COMMIT_TIME_METADATA_FIELD))
+                .setCommitSeqno(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.COMMIT_SEQNO_METADATA_FIELD))
+                .setRecordKey(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.RECORD_KEY_METADATA_FIELD))
+                .setPartitionPath(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.PARTITION_PATH_METADATA_FIELD))
+                .setFileName(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.FILENAME_METADATA_FIELD)), null);
+        if (partitionFields.isPresent()) {
+          for (int i = 0; i < partitionValues.length; i++) {
+            int position = readerSchema.getField(partitionFields.get()[i]).pos();
+            setPartitionField(position, partitionValues[i], ret.getData());
+          }
+        }
+        return ret;
+      }
+    };
+  }
+
+  protected abstract void setPartitionField(int position, Object fieldValue, T row);
+
+  @Override
+  public Schema getSchema() {
+    return skeletonFileReader.getSchema();

Review Comment:
   S g



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8342:
URL: https://github.com/apache/hudi/pull/8342#discussion_r1170138227


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -327,21 +332,52 @@ private HoodieData<HoodieRecord<T>> readRecordsForGroupBaseFiles(JavaSparkContex
 
     // NOTE: It's crucial to make sure that we don't capture whole "this" object into the
     //       closure, as this might lead to issues attempting to serialize its nested fields
+    HoodieTableConfig  tableConfig = getHoodieTable().getMetaClient().getTableConfig();
+    String bootstrapBasePath = tableConfig.getBootstrapBasePath().orElse(null);
+    Option<String[]> partitionFields = tableConfig.getPartitionFields();
+    String timeZoneId = jsc.getConf().get("timeZone", SQLConf.get().sessionLocalTimeZone());
+    boolean shouldValidateColumns = jsc.getConf().getBoolean("spark.sql.sources.validatePartitionColumns", true);
+
     return HoodieJavaRDD.of(jsc.parallelize(clusteringOps, clusteringOps.size())
         .mapPartitions(clusteringOpsPartition -> {
           List<Iterator<HoodieRecord<T>>> iteratorsForPartition = new ArrayList<>();
           clusteringOpsPartition.forEachRemaining(clusteringOp -> {
             try {
               Schema readerSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(writeConfig.getSchema()));
               HoodieFileReader baseFileReader = HoodieFileReaderFactory.getReaderFactory(recordType).getFileReader(hadoopConf.get(), new Path(clusteringOp.getDataFilePath()));

Review Comment:
   Actually bootstrap file reader depends on both skeleton and data file reader. So, this becomes the skeleton file reader in the conditional block. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1493779929

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d534d8186351c0f9831441eb58229ad068211592",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022",
       "triggerID" : "d534d8186351c0f9831441eb58229ad068211592",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16069",
       "triggerID" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5971f241437cdcf4302fb3bd4aa956d348e103b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16069) 
   * 76e5ebe79dc1c1bb53c36d95210a861fbd781ffd UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1494458115

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d534d8186351c0f9831441eb58229ad068211592",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16022",
       "triggerID" : "d534d8186351c0f9831441eb58229ad068211592",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16069",
       "triggerID" : "e5971f241437cdcf4302fb3bd4aa956d348e103b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087",
       "triggerID" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76e5ebe79dc1c1bb53c36d95210a861fbd781ffd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8342:
URL: https://github.com/apache/hudi/pull/8342#discussion_r1169386885


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -327,21 +332,52 @@ private HoodieData<HoodieRecord<T>> readRecordsForGroupBaseFiles(JavaSparkContex
 
     // NOTE: It's crucial to make sure that we don't capture whole "this" object into the
     //       closure, as this might lead to issues attempting to serialize its nested fields
+    HoodieTableConfig  tableConfig = getHoodieTable().getMetaClient().getTableConfig();
+    String bootstrapBasePath = tableConfig.getBootstrapBasePath().orElse(null);
+    Option<String[]> partitionFields = tableConfig.getPartitionFields();
+    String timeZoneId = jsc.getConf().get("timeZone", SQLConf.get().sessionLocalTimeZone());
+    boolean shouldValidateColumns = jsc.getConf().getBoolean("spark.sql.sources.validatePartitionColumns", true);
+
     return HoodieJavaRDD.of(jsc.parallelize(clusteringOps, clusteringOps.size())
         .mapPartitions(clusteringOpsPartition -> {
           List<Iterator<HoodieRecord<T>>> iteratorsForPartition = new ArrayList<>();
           clusteringOpsPartition.forEachRemaining(clusteringOp -> {
             try {
               Schema readerSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(writeConfig.getSchema()));
               HoodieFileReader baseFileReader = HoodieFileReaderFactory.getReaderFactory(recordType).getFileReader(hadoopConf.get(), new Path(clusteringOp.getDataFilePath()));

Review Comment:
   We should skip this for bootstrap file group.



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -327,21 +332,52 @@ private HoodieData<HoodieRecord<T>> readRecordsForGroupBaseFiles(JavaSparkContex
 
     // NOTE: It's crucial to make sure that we don't capture whole "this" object into the
     //       closure, as this might lead to issues attempting to serialize its nested fields
+    HoodieTableConfig  tableConfig = getHoodieTable().getMetaClient().getTableConfig();
+    String bootstrapBasePath = tableConfig.getBootstrapBasePath().orElse(null);
+    Option<String[]> partitionFields = tableConfig.getPartitionFields();
+    String timeZoneId = jsc.getConf().get("timeZone", SQLConf.get().sessionLocalTimeZone());
+    boolean shouldValidateColumns = jsc.getConf().getBoolean("spark.sql.sources.validatePartitionColumns", true);
+
     return HoodieJavaRDD.of(jsc.parallelize(clusteringOps, clusteringOps.size())
         .mapPartitions(clusteringOpsPartition -> {
           List<Iterator<HoodieRecord<T>>> iteratorsForPartition = new ArrayList<>();
           clusteringOpsPartition.forEachRemaining(clusteringOp -> {
             try {
               Schema readerSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(writeConfig.getSchema()));
               HoodieFileReader baseFileReader = HoodieFileReaderFactory.getReaderFactory(recordType).getFileReader(hadoopConf.get(), new Path(clusteringOp.getDataFilePath()));
+              // handle bootstrap path
+              if (StringUtils.nonEmpty(clusteringOp.getBootstrapFilePath()) && StringUtils.nonEmpty(bootstrapBasePath)) {

Review Comment:
   Do we need to provide the same fix for MOR table, in `readRecordsForGroupWithLogs(jsc, clusteringOps, instantTime)`?  E.g., clustering is applied to a bootstrap file group with bootstrap data file, skeleton file, and log files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on code in PR #8342:
URL: https://github.com/apache/hudi/pull/8342#discussion_r1170145150


##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieBootstrapFileReader.java:
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.MetadataValues;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.collection.ClosableIterator;
+
+import org.apache.avro.Schema;
+
+import java.io.IOException;
+import java.util.Set;
+
+public abstract class HoodieBootstrapFileReader<T> implements HoodieFileReader<T> {
+
+  private final HoodieFileReader<T> skeletonFileReader;
+  private final HoodieFileReader<T> dataFileReader;
+
+  private final Option<String[]> partitionFields;
+  private final Object[] partitionValues;
+
+  public HoodieBootstrapFileReader(HoodieFileReader<T> skeletonFileReader, HoodieFileReader<T> dataFileReader, Option<String[]> partitionFields, Object[] partitionValues) {
+    this.skeletonFileReader = skeletonFileReader;
+    this.dataFileReader = dataFileReader;
+    this.partitionFields = partitionFields;
+    this.partitionValues = partitionValues;
+  }
+
+  @Override
+  public String[] readMinMaxRecordKeys() {
+    return skeletonFileReader.readMinMaxRecordKeys();
+  }
+
+  @Override
+  public BloomFilter readBloomFilter() {
+    return skeletonFileReader.readBloomFilter();
+  }
+
+  @Override
+  public Set<String> filterRowKeys(Set<String> candidateRowKeys) {
+    return skeletonFileReader.filterRowKeys(candidateRowKeys);
+  }
+
+  @Override
+  public ClosableIterator<HoodieRecord<T>> getRecordIterator(Schema readerSchema, Schema requestedSchema) throws IOException {
+    ClosableIterator<HoodieRecord<T>> skeletonIterator = skeletonFileReader.getRecordIterator(readerSchema, requestedSchema);
+    ClosableIterator<HoodieRecord<T>> dataFileIterator = dataFileReader.getRecordIterator(HoodieAvroUtils.removeMetadataFields(readerSchema), requestedSchema);
+    return new ClosableIterator<HoodieRecord<T>>() {
+      @Override
+      public void close() {
+        skeletonIterator.close();
+        dataFileIterator.close();
+      }
+
+      @Override
+      public boolean hasNext() {
+        return skeletonIterator.hasNext() && dataFileIterator.hasNext();
+      }
+
+      @Override
+      public HoodieRecord<T> next() {
+        HoodieRecord<T> dataRecord = dataFileIterator.next();
+        HoodieRecord<T> skeletonRecord = skeletonIterator.next();
+        HoodieRecord<T> ret = dataRecord.prependMetaFields(readerSchema, readerSchema,
+            new MetadataValues().setCommitTime(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.COMMIT_TIME_METADATA_FIELD))
+                .setCommitSeqno(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.COMMIT_SEQNO_METADATA_FIELD))
+                .setRecordKey(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.RECORD_KEY_METADATA_FIELD))
+                .setPartitionPath(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.PARTITION_PATH_METADATA_FIELD))
+                .setFileName(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.FILENAME_METADATA_FIELD)), null);
+        if (partitionFields.isPresent()) {
+          for (int i = 0; i < partitionValues.length; i++) {
+            int position = readerSchema.getField(partitionFields.get()[i]).pos();
+            setPartitionField(position, partitionValues[i], ret.getData());
+          }
+        }
+        return ret;
+      }
+    };
+  }
+
+  protected abstract void setPartitionField(int position, Object fieldValue, T row);
+
+  @Override
+  public Schema getSchema() {
+    return skeletonFileReader.getSchema();

Review Comment:
   Actually this is not used because we enforce the reader schema at the call site. But, I have changed it for and returning merged schema for completeness. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8342: [HUDI-5987] Fix clustering on bootstrap table with row writer disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8342:
URL: https://github.com/apache/hudi/pull/8342#issuecomment-1512770628

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087",
       "triggerID" : "76e5ebe79dc1c1bb53c36d95210a861fbd781ffd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "46f6599da55859ecfc562908b67c279117c937c0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16419",
       "triggerID" : "46f6599da55859ecfc562908b67c279117c937c0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76e5ebe79dc1c1bb53c36d95210a861fbd781ffd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16087) 
   * 46f6599da55859ecfc562908b67c279117c937c0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16419) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org