You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/06/03 05:29:00 UTC

[jira] [Updated] (HUDI-4078) BootstrapOperator cannot load all index data

     [ https://issues.apache.org/jira/browse/HUDI-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo updated HUDI-4078:
----------------------------
    Fix Version/s:     (was: 0.12.0)

> BootstrapOperator cannot load all index data
> --------------------------------------------
>
>                 Key: HUDI-4078
>                 URL: https://issues.apache.org/jira/browse/HUDI-4078
>             Project: Apache Hudi
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.11.0
>            Reporter: Bo Cui
>            Assignee: Bo Cui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.11.1
>
>
> the bootstrapOperator can not obtain all the parquet and and log from the hoodieTable#getSliceView()#getLatestFileSlicesBeforeOrOn
> Procedure:
> 1) write 10k records to the HUDI table by stream mode.
> create table() with (
>  'table.type' = 'MERGE_ON_READ',
>  'index.bootstrap.enabled' =  'true',
>  'archive.max_commits' = '4200',
> 'archive.min_commits' = '4000',
> 'clean.retain_commits' = '3999', 
> ...
> )
> 2) stop job, and delete the last compaction commit, like `.hoodie/20220505131426.commit`
> 3) restart job without chk/savepoint and not write data.
> 4)  Observe how much index data is loaded to the bootstrapOperator.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)