You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "prashantwason (via GitHub)" <gi...@apache.org> on 2023/04/28 23:17:54 UTC

[GitHub] [hudi] prashantwason opened a new pull request, #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.

prashantwason opened a new pull request, #8606:
URL: https://github.com/apache/hudi/pull/8606

   [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.
   
   ### Change Logs
   
   Return value from FileSystem.delete call is checked. It should be true in case delete was successful. If the return value is false, we check if the file exists.
   
   ### Impact
   
   Improves the guarantee that files are deleted when required. We have seen cases where the call returns false and the file is left over. This causes data inconsistency issues (duplicate data, incomplete rollback) in HUDI dataset.
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8606:
URL: https://github.com/apache/hudi/pull/8606#issuecomment-1532407484

   Can you help checking the test failure?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8606:
URL: https://github.com/apache/hudi/pull/8606#issuecomment-1528705558

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e306a06b8c62c4218a0833e271b52364e05c4b50",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16750",
       "triggerID" : "e306a06b8c62c4218a0833e271b52364e05c4b50",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e306a06b8c62c4218a0833e271b52364e05c4b50 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16750) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nbalajee commented on a diff in pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.

Posted by "nbalajee (via GitHub)" <gi...@apache.org>.
nbalajee commented on code in PR #8606:
URL: https://github.com/apache/hudi/pull/8606#discussion_r1183090630


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackHelper.java:
##########
@@ -197,14 +197,21 @@ protected List<HoodieRollbackStat> deleteFiles(HoodieTableMetaClient metaClient,
             // if first rollback attempt failed and retried again, chances that some files are already deleted.
             isDeleted = true;
           }
+
+          if (!isDeleted) {

Review Comment:
   https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html#delete(org.apache.hadoop.fs.Path,%20boolean)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on PR #8606:
URL: https://github.com/apache/hudi/pull/8606#issuecomment-1560478261

   @danny0405 All checks have passed. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8606:
URL: https://github.com/apache/hudi/pull/8606#issuecomment-1528210577

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e306a06b8c62c4218a0833e271b52364e05c4b50",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e306a06b8c62c4218a0833e271b52364e05c4b50",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e306a06b8c62c4218a0833e271b52364e05c4b50 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8606:
URL: https://github.com/apache/hudi/pull/8606#discussion_r1181157467


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackHelper.java:
##########
@@ -197,14 +197,21 @@ protected List<HoodieRollbackStat> deleteFiles(HoodieTableMetaClient metaClient,
             // if first rollback attempt failed and retried again, chances that some files are already deleted.
             isDeleted = true;
           }
+
+          if (!isDeleted) {

Review Comment:
   In which case the `metaClient.getFs().delete()` can return false if the file actually exists there?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8606:
URL: https://github.com/apache/hudi/pull/8606#issuecomment-1528225557

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e306a06b8c62c4218a0833e271b52364e05c4b50",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16750",
       "triggerID" : "e306a06b8c62c4218a0833e271b52364e05c4b50",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e306a06b8c62c4218a0833e271b52364e05c4b50 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16750) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 merged PR #8606:
URL: https://github.com/apache/hudi/pull/8606


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8606: [MINOR] Check the return value from delete during rollback and finalize to ensure the files actually got deleted.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8606:
URL: https://github.com/apache/hudi/pull/8606#issuecomment-1562202836

   @prashantwason I have reverted this commit first, because the test `TestMarkerBasedRollbackStrategy.testCopyOnWriteRollbackWithTestTable` is failing, please check again and reopen the PR again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org