You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/06/15 07:41:00 UTC

[jira] [Updated] (HUDI-2016) Metadata table bootstrap does not work when there are inflight instances

     [ https://issues.apache.org/jira/browse/HUDI-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HUDI-2016:
---------------------------------
    Labels: pull-request-available  (was: )

> Metadata table bootstrap does not work when there are inflight instances
> ------------------------------------------------------------------------
>
>                 Key: HUDI-2016
>                 URL: https://issues.apache.org/jira/browse/HUDI-2016
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Priority: Major
>              Labels: pull-request-available
>
> There is a race condition in metadata table bootstrap when there are inflight instances.
> Example: Assume a CLEAN is in progress which is planning to delete p1/f1.parquet (as per clean plan). If bootstrap is going on at the same time, there are two cases possible:
>  # bootstrap lists files in partition p1 BEFORE clean deletes them
>  ## hence p1/f1.parquet is added to metadata table during bootstrap
>  ## When processing the CLEAN, p1/f1.parquet will be deleted from metadata table
>  # bootstrap lists files in partition p1 AFTER clean deletes them
>  ## p1/f1.parquet is not found
>  ## When processing the CLEAN, p1/f1.parquet will be deleted from metadata table
> We cannot differenciate 2.2 from the case that we missed adding p1/f1.parquet to the metadata table.
> There is an exception in the metadata reader code to ensure that that any file being deleted was added to the metadata table. This exception is throws in case 2.2 above.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)