You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "yihua (via GitHub)" <gi...@apache.org> on 2023/02/21 04:11:54 UTC

[GitHub] [hudi] yihua opened a new pull request, #8001: [HUDI-5817] Fix async indexer metadata writer to avoid eager rollback and failed write cleaning

yihua opened a new pull request, #8001:
URL: https://github.com/apache/hudi/pull/8001

   ### Change Logs
   
   Even though the metadata table writer used by the async indexer is configured to use `LAZY` failed write cleaning policy, the `SparkHoodieBackedTableMetadataWriter` is hard-coded to roll back failed writes regardless of the configuration, which should not be triggered for the async indexer.  In the current logic, the async indexer can trigger the rollback of inflight delta commit from another regular writer in the metadata table, causing issues.  This also makes the following test flaky.
   
   This PR fixes `SparkHoodieBackedTableMetadataWriter` so that the rollback of failed writes is not triggered by the async indexer.
   
   ```
   2023-02-16T13:46:06.1573775Z [ERROR] Tests run: 113, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 3,518.191 s <<< FAILURE! - in org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer
   2023-02-16T13:46:06.1576031Z [ERROR] testHoodieIndexer{HoodieRecordType}[2]  Time elapsed: 79.838 s  <<< ERROR!
   ...
   2023-02-16T13:46:06.1705711Z Caused by: java.lang.IllegalArgumentException
   2023-02-16T13:46:06.1706251Z 	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
   2023-02-16T13:46:06.1706995Z 	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:633)
   2023-02-16T13:46:06.1707847Z 	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:698)
   2023-02-16T13:46:06.1708751Z 	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:147)
   2023-02-16T13:46:06.1709792Z 	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:172)
   2023-02-16T13:46:06.1710733Z 	at org.apache.hudi.table.action.deltacommit.SparkUpsertPreppedDeltaCommitActionExecutor.execute(SparkUpsertPreppedDeltaCommitActionExecutor.java:44)
   2023-02-16T13:46:06.1712815Z 	at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:111)
   2023-02-16T13:46:06.1713593Z 	at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:80)
   2023-02-16T13:46:06.1714353Z 	at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:154)
   2023-02-16T13:46:06.1715155Z 	at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:186)
   ...
   ```
   
   ### Impact
   
   Fixes the rollback behavior of async indexer.  Also fixes the flaky test.  Adds a new test to guard around the behavior (before this PR, the test fails).
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #8001: [HUDI-5817] Fix async indexer metadata writer to avoid eager rollback and failed write cleaning

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan merged PR #8001:
URL: https://github.com/apache/hudi/pull/8001


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8001: [HUDI-5817] Fix async indexer metadata writer to avoid eager rollback and failed write cleaning

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8001:
URL: https://github.com/apache/hudi/pull/8001#issuecomment-1438907160

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15305",
       "triggerID" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15305) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8001: [HUDI-5817] Fix async indexer metadata writer to avoid eager rollback and failed write cleaning

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8001:
URL: https://github.com/apache/hudi/pull/8001#issuecomment-1437856388

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8001: [HUDI-5817] Fix async indexer metadata writer to avoid eager rollback and failed write cleaning

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8001:
URL: https://github.com/apache/hudi/pull/8001#issuecomment-1438813673

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15305",
       "triggerID" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15305) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8001: [HUDI-5817] Fix async indexer metadata writer to avoid eager rollback and failed write cleaning

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8001:
URL: https://github.com/apache/hudi/pull/8001#issuecomment-1438760993

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8001: [HUDI-5817] Fix async indexer metadata writer to avoid eager rollback and failed write cleaning

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8001:
URL: https://github.com/apache/hudi/pull/8001#issuecomment-1437958889

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15305",
       "triggerID" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15305) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8001: [HUDI-5817] Fix async indexer metadata writer to avoid eager rollback and failed write cleaning

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8001:
URL: https://github.com/apache/hudi/pull/8001#issuecomment-1437861949

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15305",
       "triggerID" : "3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15305) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org