You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/11/18 15:14:00 UTC

[jira] [Created] (HUDI-2792) Metadata table enters into inconsistent state

sivabalan narayanan created HUDI-2792:
-----------------------------------------

             Summary: Metadata table enters into inconsistent state
                 Key: HUDI-2792
                 URL: https://issues.apache.org/jira/browse/HUDI-2792
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: sivabalan narayanan


I see we have validations to ensure metadata table is in valid state. Specifically, if a file was deleted from metadata table which was never added, we throw an exception. 

I could able to reproduce this issue in one of my test scenario. Even though the actual test case is bit tangential, here is the convincing case which requires relaxing this constraint. 

 

Due to spark task failures, there could be more files in the system than being tracked in the commit metadata. so, if a user tries to rollback a completed write(which had some spark task failures), the rollback will have more files compared to the initial set of files added as part of commit metadata.

So, we are in need of relaxing this constraint (if a file was deleted from metadata table which was never added, we throw an exception). If not, I cannot think of a way to get around this. 

 

Trying to get ideas on how to go about this. Can we add some minimal constraint, but loosen up the existing one so that we support the spark task failure cases. 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)