You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/11 12:37:26 UTC

[GitHub] [hudi] nsivabalan opened a new pull request, #5288: [HUDI-3848] Fixing restore with cleaned up commits

nsivabalan opened a new pull request, #5288:
URL: https://github.com/apache/hudi/pull/5288

   ## What is the purpose of the pull request
   
   Recently we made a [fix](https://github.com/apache/hudi/pull/4957) to listbased rollback strategy to rely on commit metadata for completed commits. For completed commits, we are relying on commit metadata to deduce the list of files to be deleted and we have fs.listStatus in one of the places. But there are chances, that a commit is cleaned up by the cleaner and hence no data files exists. So, fixing this to do exists check before to ensure we can ignore non existant files to be rolledback. 
   
   
   ## Brief change log
   
   - Fixed list based rollback strategy to ignore non existant data files while deducing list of files to delete for a complete commit. 
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   - TestHoodieSparkMergeOnReadTableRollback.testRestoreWithCleanedUpCommits
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #5288: [HUDI-3848] Fixing restore with cleaned up commits

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #5288:
URL: https://github.com/apache/hudi/pull/5288#issuecomment-1095055074

   @XuQianJin-Stars : can you review the patch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #5288: [HUDI-3848] Fixing restore with cleaned up commits

Posted by GitBox <gi...@apache.org>.
nsivabalan merged PR #5288:
URL: https://github.com/apache/hudi/pull/5288


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #5288: [HUDI-3848] Fixing restore with cleaned up commits

Posted by GitBox <gi...@apache.org>.
codope commented on code in PR #5288:
URL: https://github.com/apache/hudi/pull/5288#discussion_r929897040


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackStrategy.java:
##########
@@ -239,7 +239,15 @@ private FileStatus[] fetchFilesFromCommitMetadata(HoodieInstant instantToRollbac
     SerializablePathFilter pathFilter = getSerializablePathFilter(baseFileExtension, instantToRollback.getTimestamp());
     Path[] filePaths = getFilesFromCommitMetadata(basePath, commitMetadata, partitionPath);
 
-    return fs.listStatus(filePaths, pathFilter);
+    return fs.listStatus(Arrays.stream(filePaths).filter(entry -> {
+      try {
+        return fs.exists(entry);
+      } catch (IOException e) {
+        LOG.error("Exists check failed for " + entry.toString(), e);
+      }
+      // if IOException is thrown, do not ignore. lets try to add the file of interest to be deleted. we can't miss any files to be rolled back.

Review Comment:
   This is a misleading comment. This PR intends to ignore files that have already been deleted by the cleaner and gracefully handle the FileNotFound/IO exception.
   @XuQianJin-Stars Since you reviewed this PR, can you validate my understanding?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5288: [HUDI-3848] Fixing restore with cleaned up commits

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5288:
URL: https://github.com/apache/hudi/pull/5288#issuecomment-1095000449

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea146c568c21bc880c45e88a03f1cab07145e1ba",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ea146c568c21bc880c45e88a03f1cab07145e1ba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ea146c568c21bc880c45e88a03f1cab07145e1ba UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5288: [HUDI-3848] Fixing restore with cleaned up commits

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5288:
URL: https://github.com/apache/hudi/pull/5288#issuecomment-1095003597

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea146c568c21bc880c45e88a03f1cab07145e1ba",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7979",
       "triggerID" : "ea146c568c21bc880c45e88a03f1cab07145e1ba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ea146c568c21bc880c45e88a03f1cab07145e1ba Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7979) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] vinothchandar commented on a diff in pull request #5288: [HUDI-3848] Fixing restore with cleaned up commits

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on code in PR #5288:
URL: https://github.com/apache/hudi/pull/5288#discussion_r927144961


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackStrategy.java:
##########
@@ -239,7 +239,15 @@ private FileStatus[] fetchFilesFromCommitMetadata(HoodieInstant instantToRollbac
     SerializablePathFilter pathFilter = getSerializablePathFilter(baseFileExtension, instantToRollback.getTimestamp());
     Path[] filePaths = getFilesFromCommitMetadata(basePath, commitMetadata, partitionPath);
 
-    return fs.listStatus(filePaths, pathFilter);
+    return fs.listStatus(Arrays.stream(filePaths).filter(entry -> {
+      try {
+        return fs.exists(entry);
+      } catch (IOException e) {
+        LOG.error("Exists check failed for " + entry.toString(), e);
+      }
+      // if IOException is thrown, do not ignore. lets try to add the file of interest to be deleted. we can't miss any files to be rolled back.

Review Comment:
   @nsivabalan does n't the line above just ignore the IOException? Why would fs.listStatus() return stuff that does not exist?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XuQianJin-Stars commented on pull request #5288: [HUDI-3848] Fixing restore with cleaned up commits

Posted by GitBox <gi...@apache.org>.
XuQianJin-Stars commented on PR #5288:
URL: https://github.com/apache/hudi/pull/5288#issuecomment-1095061845

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5288: [HUDI-3848] Fixing restore with cleaned up commits

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5288:
URL: https://github.com/apache/hudi/pull/5288#issuecomment-1095081425

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ea146c568c21bc880c45e88a03f1cab07145e1ba",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7979",
       "triggerID" : "ea146c568c21bc880c45e88a03f1cab07145e1ba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ea146c568c21bc880c45e88a03f1cab07145e1ba Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7979) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org