You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/07 02:02:03 UTC

[GitHub] [hudi] yihua opened a new pull request, #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

yihua opened a new pull request, #5245:
URL: https://github.com/apache/hudi/pull/5245

   ## What is the purpose of the pull request
   
   When the `instant_time.rollback.requested` file in the timeline is empty or corrupted, it cannot be parsed.  When running `getPendingRollbackInfos()`, it's going to skip that empty/corrupted requested rollback instant and the rollback instant is going to stay on the timeline forever, preventing metadata table archival.
   
   This PR fixes the problem by deleting the requested rollback plan if it is empty or corrupted.
   
   ## Brief change log
   
     - Adds logic to delete requested rollback plan if it cannot be parsed in `BaseHoodieWriteClient::getPendingRollbackInfos()`.
   
   ## Verify this pull request
   
   This change added tests for requested rollbacks, either valid or corrupted, in `TestClientRollback`.  The fix is also verified by running the deltastreamer on a Hudi Table with corrupted requested rollback in the timeline.  The corrupted rollback plan is deleted afterwards.
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope merged pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
codope merged PR #5245:
URL: https://github.com/apache/hudi/pull/5245


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5245:
URL: https://github.com/apache/hudi/pull/5245#issuecomment-1091031248

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7882",
       "triggerID" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6e1603b42d36bb6a6ae7998a90ba882aee164b72 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7882) 
   * 40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5245:
URL: https://github.com/apache/hudi/pull/5245#discussion_r932573326


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -1113,9 +1113,28 @@ protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos
   protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos(HoodieTableMetaClient metaClient, boolean ignoreCompactionAndClusteringInstants) {
     List<HoodieInstant> instants = metaClient.getActiveTimeline().filterPendingRollbackTimeline().getInstants().collect(Collectors.toList());
     Map<String, Option<HoodiePendingRollbackInfo>> infoMap = new HashMap<>();
-    for (HoodieInstant instant : instants) {
+    for (HoodieInstant rollbackInstant : instants) {
+      HoodieRollbackPlan rollbackPlan;
+      try {
+        rollbackPlan = RollbackUtils.getRollbackPlan(metaClient, rollbackInstant);
+      } catch (IOException e) {

Review Comment:
   yeah. Ethan reminded me of the same discussion we had when the patch was put up. we found the existing fix as the safest option compared to other alternatives. Just to add to what Ethan has mentioned above, we do this only incase of rollback.requested and not for rollback.inflight. For inflight, its safe to re-use the plan from the rollback.requested file. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5245:
URL: https://github.com/apache/hudi/pull/5245#issuecomment-1091029769

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7882",
       "triggerID" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6e1603b42d36bb6a6ae7998a90ba882aee164b72 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7882) 
   * 40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5245:
URL: https://github.com/apache/hudi/pull/5245#issuecomment-1091002727

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6e1603b42d36bb6a6ae7998a90ba882aee164b72 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5245:
URL: https://github.com/apache/hudi/pull/5245#issuecomment-1091007613

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7882",
       "triggerID" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6e1603b42d36bb6a6ae7998a90ba882aee164b72 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7882) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5245:
URL: https://github.com/apache/hudi/pull/5245#discussion_r931813254


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -1113,9 +1113,28 @@ protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos
   protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos(HoodieTableMetaClient metaClient, boolean ignoreCompactionAndClusteringInstants) {
     List<HoodieInstant> instants = metaClient.getActiveTimeline().filterPendingRollbackTimeline().getInstants().collect(Collectors.toList());
     Map<String, Option<HoodiePendingRollbackInfo>> infoMap = new HashMap<>();
-    for (HoodieInstant instant : instants) {
+    for (HoodieInstant rollbackInstant : instants) {
+      HoodieRollbackPlan rollbackPlan;
+      try {
+        rollbackPlan = RollbackUtils.getRollbackPlan(metaClient, rollbackInstant);
+      } catch (IOException e) {

Review Comment:
   yes. probably we should catch only for the known exception and stacktrace as well.
   ```
   Caused by: java.io.IOException: Not an Avro data fileat org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
   ```
   
   this is the stacktrace when avro file is corrupt. we will follow up w/ the right fix. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5245:
URL: https://github.com/apache/hudi/pull/5245#issuecomment-1091064807

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7882",
       "triggerID" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7884",
       "triggerID" : "40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7884) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5245:
URL: https://github.com/apache/hudi/pull/5245#issuecomment-1091032820

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7882",
       "triggerID" : "6e1603b42d36bb6a6ae7998a90ba882aee164b72",
       "triggerType" : "PUSH"
     }, {
       "hash" : "40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7884",
       "triggerID" : "40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6e1603b42d36bb6a6ae7998a90ba882aee164b72 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7882) 
   * 40fc11655edbcdb1f3aabaa4e4ba39e8f22d5e1e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7884) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #5245:
URL: https://github.com/apache/hudi/pull/5245#discussion_r932532922


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -1113,9 +1113,28 @@ protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos
   protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos(HoodieTableMetaClient metaClient, boolean ignoreCompactionAndClusteringInstants) {
     List<HoodieInstant> instants = metaClient.getActiveTimeline().filterPendingRollbackTimeline().getInstants().collect(Collectors.toList());
     Map<String, Option<HoodiePendingRollbackInfo>> infoMap = new HashMap<>();
-    for (HoodieInstant instant : instants) {
+    for (HoodieInstant rollbackInstant : instants) {
+      HoodieRollbackPlan rollbackPlan;
+      try {
+        rollbackPlan = RollbackUtils.getRollbackPlan(metaClient, rollbackInstant);
+      } catch (IOException e) {

Review Comment:
   The deletion only happens for the requested rollback plan.  Any inflight rollback is not affected.  The assumption here is that even if the requested rollback plan is inaccessible and deleted, it can be requested again by a new writer, which is still safe.  We don't want the hanging rollback plan to block metadata table compaction.
   
   The corruption is mainly due to writes of a rollback plan not atomic, and the job fails during that time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] vinothchandar commented on a diff in pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on code in PR #5245:
URL: https://github.com/apache/hudi/pull/5245#discussion_r927097078


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -1113,9 +1113,28 @@ protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos
   protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos(HoodieTableMetaClient metaClient, boolean ignoreCompactionAndClusteringInstants) {
     List<HoodieInstant> instants = metaClient.getActiveTimeline().filterPendingRollbackTimeline().getInstants().collect(Collectors.toList());
     Map<String, Option<HoodiePendingRollbackInfo>> infoMap = new HashMap<>();
-    for (HoodieInstant instant : instants) {
+    for (HoodieInstant rollbackInstant : instants) {
+      HoodieRollbackPlan rollbackPlan;
+      try {
+        rollbackPlan = RollbackUtils.getRollbackPlan(metaClient, rollbackInstant);
+      } catch (IOException e) {

Review Comment:
   would n't this go ahead and delete even if say the error was a legit IO exception? lets say cloud storage was inaccessible/connection timeout.. We should explicitly handle corruptions cleanly IMO



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5245:
URL: https://github.com/apache/hudi/pull/5245#discussion_r931814208


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -1113,9 +1113,28 @@ protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos
   protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos(HoodieTableMetaClient metaClient, boolean ignoreCompactionAndClusteringInstants) {
     List<HoodieInstant> instants = metaClient.getActiveTimeline().filterPendingRollbackTimeline().getInstants().collect(Collectors.toList());
     Map<String, Option<HoodiePendingRollbackInfo>> infoMap = new HashMap<>();
-    for (HoodieInstant instant : instants) {
+    for (HoodieInstant rollbackInstant : instants) {
+      HoodieRollbackPlan rollbackPlan;
+      try {
+        rollbackPlan = RollbackUtils.getRollbackPlan(metaClient, rollbackInstant);
+      } catch (IOException e) {

Review Comment:
   https://issues.apache.org/jira/browse/HUDI-4493



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #5245: [HUDI-3805] Delete existing corrupted requested rollback plan during rollback

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #5245:
URL: https://github.com/apache/hudi/pull/5245#discussion_r932532922


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##########
@@ -1113,9 +1113,28 @@ protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos
   protected Map<String, Option<HoodiePendingRollbackInfo>> getPendingRollbackInfos(HoodieTableMetaClient metaClient, boolean ignoreCompactionAndClusteringInstants) {
     List<HoodieInstant> instants = metaClient.getActiveTimeline().filterPendingRollbackTimeline().getInstants().collect(Collectors.toList());
     Map<String, Option<HoodiePendingRollbackInfo>> infoMap = new HashMap<>();
-    for (HoodieInstant instant : instants) {
+    for (HoodieInstant rollbackInstant : instants) {
+      HoodieRollbackPlan rollbackPlan;
+      try {
+        rollbackPlan = RollbackUtils.getRollbackPlan(metaClient, rollbackInstant);
+      } catch (IOException e) {

Review Comment:
   The deletion only happens for the requested rollback plan.  Any inflight rollback is not affected.  The assumption here is that even if the requested rollback plan is inaccessible and deleted, it can be requested again by a new writer, which is still safe.  We don't want the hanging rollback plan to block metadata table compaction.
   
   The corruption is mainly due to writes not atomic, and the job fails during that time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org