You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "wuwenchi (Jira)" <ji...@apache.org> on 2022/08/03 01:47:00 UTC
[jira] [Created] (HUDI-4527) wrong data after compaction in MOR table by flink
wuwenchi created HUDI-4527:
------------------------------
Summary: wrong data after compaction in MOR table by flink
Key: HUDI-4527
URL: https://issues.apache.org/jira/browse/HUDI-4527
Project: Apache Hudi
Issue Type: Bug
Components: flink, flink-sql
Reporter: wuwenchi
A MOR table with primarykey and preCombine field, after compaction, it will generate a parquet file. Insert one record, a log file will be generated.
If the preCombine in the log is smaller than the parquet, the record in the parquet should eventually be returned, but now the data in the log is returned incorrectly. If it is a table in COW mode, there is no such problem.
Such as:
create table t1(
uuid int,
ts int,
PRIMARY KEY(uuid) NOT ENFORCED
) with (
'connector' = 'hudi',
'read.data.skipping.enabled' = 'true',
'write.precombine' = 'true',
'hoodie.datasource.write.recordkey.field' = 'uuid',
'path' = '........',
'table.type' = 'MERGE_ON_READ',
'compaction.delta_commits' = '2',
'hoodie.compact.inline' = 'true'
);
insert into t1 values(1, 1); ---- deltacommit
insert into t1 values(1, 100); ---- deltacommit --> compaction --> parquent
insert into t1 values(1, 2); ---- deltacommit
select * from t1;
It will return [+I[1, 2]], but in COPY_ON_WRITE mode, it will return [+I[1, 100]].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)