You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/06/03 05:29:00 UTC
[jira] [Updated] (HUDI-4078) BootstrapOperator cannot load all index data
[ https://issues.apache.org/jira/browse/HUDI-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo updated HUDI-4078:
----------------------------
Fix Version/s: (was: 0.12.0)
> BootstrapOperator cannot load all index data
> --------------------------------------------
>
> Key: HUDI-4078
> URL: https://issues.apache.org/jira/browse/HUDI-4078
> Project: Apache Hudi
> Issue Type: Bug
> Affects Versions: 0.9.0, 0.11.0
> Reporter: Bo Cui
> Assignee: Bo Cui
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.11.1
>
>
> the bootstrapOperator can not obtain all the parquet and and log from the hoodieTable#getSliceView()#getLatestFileSlicesBeforeOrOn
> Procedure:
> 1) write 10k records to the HUDI table by stream mode.
> create table() with (
> 'table.type' = 'MERGE_ON_READ',
> 'index.bootstrap.enabled' = 'true',
> 'archive.max_commits' = '4200',
> 'archive.min_commits' = '4000',
> 'clean.retain_commits' = '3999',
> ...
> )
> 2) stop job, and delete the last compaction commit, like `.hoodie/20220505131426.commit`
> 3) restart job without chk/savepoint and not write data.
> 4) Observe how much index data is loaded to the bootstrapOperator.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)