You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2023/03/26 03:29:00 UTC

[jira] [Updated] (HUDI-5769) Partitions created by Async indexer could be deleted by regular writers

     [ https://issues.apache.org/jira/browse/HUDI-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raymond Xu updated HUDI-5769:
-----------------------------
    Sprint: Sprint 2023-01-31, Sprint 2023-02-14, Sprint 2023-02-28  (was: Sprint 2023-01-31, Sprint 2023-02-14)

> Partitions created by Async indexer could be deleted by regular writers
> -----------------------------------------------------------------------
>
>                 Key: HUDI-5769
>                 URL: https://issues.apache.org/jira/browse/HUDI-5769
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.13.1
>
>
> In regular writer we have a flow, where we detect if some MDT partition is not enabled, but the partition is found in storage and as part of table config's fully built out partitions, hudi deletes the metadata partition with the intent that user wishes to disable it. 
> But this does not sit well w/ async indexer. 
>  
> process1 -> Deltastreamer runs continuously. 
> no metadata configs set. 
> which means, default value for metadata enable = true and hence "files" partition will be instantiated inline on first commit. 
> no value set for col stats enable. So, no action will be taken. 
>  
> process2: user starts HoodieIndexer for col stats partition. 
> Once indexer completes, tableConfig will add "col stats" as part of fully built out metadata partition. 
>  
> While in process1, when deltastreamer goes to next write, it will detect that col stats wasn't enabled (default value as per code), but tableConfig shows that col stats is fully built out, and hence decides to delete the col stats partition and updates the tableConfig. 
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)