You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/03/11 02:00:00 UTC

[jira] [Created] (HUDI-3604) Missing to apply rollback commits to Metadata table

sivabalan narayanan created HUDI-3604:
-----------------------------------------

             Summary: Missing to apply rollback commits to Metadata table
                 Key: HUDI-3604
                 URL: https://issues.apache.org/jira/browse/HUDI-3604
             Project: Apache Hudi
          Issue Type: Bug
          Components: metadata
            Reporter: sivabalan narayanan


C1, C2, C3. C4 (RB_C1) 

When C4 (i.e. RB of C1 is triggered, after deleting data files, and after deleting the commits files in timeline (C1), lets say the process crashed (before applying to MDT). 

Even if the user restarts the pipeline, there won't be any pending failed commits to rollback and new commit will continue. w/o worrying about C4. But metadata table will miss out this rollback commit. 

 

Proposal: 

We need two fixes atleast: 

a. We should clean the C1 commit files from data table timeline only after applying the rollback commit to MDT. This way we will ensure no commit files in data table will be cleaned up before applying the rollback to MDT. 

b. Whenever we check for failed commits to rollback, we should also check for any dangling rollback to be re-attempted. This again needs some fixes in rollback executor as well. since chances that the commit to rollback may not exist in data table timeline at all. but we need to re-attempt the rollback and get it to completion. Its not easy to detect a pending rollback from a dangling rollback. So, can't think of ways to detect dangling rollback just by looking at data table active timeline. 

 

 

 

 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)