You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/09/14 23:45:00 UTC
[jira] [Assigned] (HUDI-2432) Fix restore by adding a requested instant and restore plan

     [ https://issues.apache.org/jira/browse/HUDI-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan reassigned HUDI-2432:
-----------------------------------------

    Assignee: sivabalan narayanan

> Fix restore by adding a requested instant and restore plan
> ----------------------------------------------------------
>
>                 Key: HUDI-2432
>                 URL: https://issues.apache.org/jira/browse/HUDI-2432
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.10.0
>
>
> Fix restore by adding a requested instant and restore plan
>  
> Trying to see if we really need a plan. Dumping my thoughts here. 
> Restore internally converts to N no of rollbacks. We fetch active instants in reverse order from timeline and trigger rollbacks 1 by 1. We have already have a patch fixing rollback to add rollback Plan in rollback.requested meta file. So, walking through failure scenarios. 
>  
> If 5 instants need to be rolledback, but process crashed after 3 rollbacks. 
>  * When we retry restore 2nd time, only pending 2 will be returned from timeline for instants that need to be rolledback. And so we will rollback remaining 2 commits/instants. Only missing piece will be the list of rollback metadata that gets serialized as part of restore commit metadata might miss first 3 commits. Anyways, restore is a destructive operation, not sure if not serializing the already rolledback commit to restore commit metadata will cause any issues. 
>  ** Metadata table: first 3 would have been rolledback in metadata table as well (applied as upsert). and so should be fine when we retrigger the restore. the rest 2 will get applied. 
>  ** If by chance, one of the rollback gets committted to metadata table and failed before getting committed to data table: this 2nd time rollback of same instant is yet another delta commit to metadata table and we should be good there too. 
>  * If there was a crash during a rollback was inflight.
>  ** let's say rollback of c3 failed while in progress. when we re-attempt restore, we will again try to rollback c3 again. With the fix for rollback plan in place, we should be good as we will continue the rollback and get it to completion. 
>  ** Metadata table: for first time, since the rollback failed while inflight, there won't be any trace of this in metadata table. but when we retry for 2nd time, this should get applied to metadata table. the rollback plan fix should ensure rollback commit metadata has all file info from original plan and not just the successfully deleted ones. bcoz, in this case, during 2nd time, only pending files will be deleted.
>  
> From the looks of it, I don't see a real need for restore plan. Atleast it does not block our metadata synchronous patch as such. But open to hear from others.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)