You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "wuwenchi (Jira)" <ji...@apache.org> on 2022/08/03 01:47:00 UTC

[jira] [Created] (HUDI-4527) wrong data after compaction in MOR table by flink

wuwenchi created HUDI-4527:
------------------------------

             Summary: wrong data after compaction in MOR table by flink
                 Key: HUDI-4527
                 URL: https://issues.apache.org/jira/browse/HUDI-4527
             Project: Apache Hudi
          Issue Type: Bug
          Components: flink, flink-sql
            Reporter: wuwenchi


A MOR table with primarykey and preCombine field, after compaction, it will generate a parquet file. Insert one record, a log file will be generated.

If the preCombine in the log is smaller than the parquet, the record in the parquet should eventually be returned, but now the data in the log is returned incorrectly. If it is a table in COW mode, there is no such problem.

Such as:

create table t1(
  uuid int,
  ts int,
  PRIMARY KEY(uuid) NOT ENFORCED
) with (
  'connector' = 'hudi',
  'read.data.skipping.enabled' = 'true',
  'write.precombine' = 'true',
  'hoodie.datasource.write.recordkey.field' = 'uuid',
  'path' = '........',
  'table.type' = 'MERGE_ON_READ',
  'compaction.delta_commits' = '2',
  'hoodie.compact.inline' = 'true'
);

insert into t1 values(1, 1);                     ---- deltacommit

insert into t1 values(1, 100);                ---- deltacommit --> compaction -->  parquent

insert into t1 values(1, 2);                    ---- deltacommit

select * from t1;

It will return [+I[1, 2]], but in COPY_ON_WRITE mode, it will return [+I[1, 100]].

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)