You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2023/01/03 03:06:00 UTC

[jira] [Assigned] (HUDI-5464) Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit

     [ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan reassigned HUDI-5464:
-----------------------------------------

    Assignee: Raymond Xu

> Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit
> ----------------------------------------------------------------------------------------------
>
>                 Key: HUDI-5464
>                 URL: https://issues.apache.org/jira/browse/HUDI-5464
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Assignee: Raymond Xu
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> we re-use the same instant time as the commit being applied to MDT while instantiating a new partition in MDT. this needs to be fixed. 
>  
> for eg, lets say we have 10 commits w/ already FILES enabled. 
> for C11, we are enabling col-stats. 
> after data table business, when we enter metadata writer instantiation, we deduct that col-stats has to be instantiated and then instantiate using DC11. in MDT timeline, we see dc11.req. dc11.inflight and dc11.complete. and then we go ahead and apply actual C11 from DT to MDT (dc11.inflight and dc11.complete is updated). here, we overwrite the same DC11 w/ records pertaining to C11. 
> which is buggy. we definitely need to fix this. 
> We can add a suffix to C11 (say C11_003 or C11_001) as we do for compaction and clean in MDT so that any additional operation in MDT has a diff commit time format. For everything else, it should match w/ DT 1 on 1. 
>  
>  
> Impact:
> We are over-riding the same DC for two purposes which is bad. if there is a crash after initializing col-stats and before applying actual C11(in above context), we might mistakenly rollback col-stats initialization, but still table config could say that col stats is fully ready to be served. But while reading MDT, we may not read DC11 since its a failed commit. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)