You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/03/15 03:40:00 UTC

[jira] [Assigned] (HUDI-3604) Missing to apply rollback commits to Metadata table if rollback failed mid-way

     [ https://issues.apache.org/jira/browse/HUDI-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raymond Xu reassigned HUDI-3604:
--------------------------------

    Assignee: sivabalan narayanan

> Missing to apply rollback commits to Metadata table if rollback failed mid-way
> ------------------------------------------------------------------------------
>
>                 Key: HUDI-3604
>                 URL: https://issues.apache.org/jira/browse/HUDI-3604
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> C1, C2, C3. C4 (RB_C1) in progress.
> When C4 (i.e. RB of C1 is triggered), after deleting data files, and after deleting the commits files in timeline (C1), let's say the process crashed (before applying to MDT). 
> Even if the user restarts the pipeline, there won't be any pending failed commits(i.e. C1) to rollback and new commit will continue. w/o worrying about C4. But metadata table will miss out this rollback commit. 
>  
> Proposal: 
> We need two fixes atleast: 
> a. We should clean the C1 commit files from data table timeline only after applying the rollback commit to MDT. This way we will ensure no commit files in data table will be cleaned up before applying the rollback to MDT. 
> b. Whenever we check for failed commits to rollback, we should also check for any dangling rollback to be re-attempted. This again needs some fixes in rollback executor as well. since chances that the commit to rollback may not exist in data table timeline at all. but we need to re-attempt the rollback and get it to completion(so that we let metadata make progress wrt compactions). It's not easy to detect a pending rollback from a dangling rollback. So, can't think of ways to detect dangling rollback just by looking at data table active timeline. hence had to re-attempt any pending rollback instants and get it to completion. 
>  
> Dangling rollbacks:
> Following up on above eg:
> C1, C2, C3, C4(RB_C1) failed mid-way. But the crash happens after deleting the datafiles and deleting commit files in data timeline. So, lets say the process crashes as of now (before applying to MDT). If the user restarts the pipeline, hudi will check for partially failed commits to trigger rollback. But since C1 is deleted from the timeline by C4(RB_C1), rollback of C1 will not kick in. So, C4 i.e RB_C1 will just stay in the timeline forever since there is no other trigger that can take it to completion or delete it. 
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)