You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "suheng.cloud (Jira)" <ji...@apache.org> on 2021/11/01 05:43:00 UTC

[jira] [Created] (HUDI-2659) concurrent compaction problem on flink sql

suheng.cloud created HUDI-2659:
----------------------------------

             Summary: concurrent compaction problem on flink sql
                 Key: HUDI-2659
                 URL: https://issues.apache.org/jira/browse/HUDI-2659
             Project: Apache Hudi
          Issue Type: Bug
          Components: Flink Integration
            Reporter: suheng.cloud
         Attachments: image-2021-11-01-13-14-40-831.png, image-2021-11-01-13-16-28-695.png

Hi, Community:

We continously watch the flink compact task, and found there maybe some issue after the job run 2 days.

The taskmanager log shows that the 2 compact plan executed in sequence, in witch the former commit action delete the basefile(for some duplicated reason?) which was a dependence of the latter one.

I wonder will this cause data lost in final ?

the core flink sink table params are:
{code:java}
'table.type' = 'MERGE_ON_READ','table.type' = 'MERGE_ON_READ', 'write.operation'='upsert', 'read.streaming.enabled' = 'true', 'hive_sync.enable' = 'false', 'write.precombine.field'='ts', 'compaction.trigger.strategy'='num_commits', 'compaction.delta_commits'= '5', 'compaction.tasks'='4', 'compaction.max_memory'='10',    'hoodie.parquet.max.file.size'='20971520',    'hoodie.parquet.small.file.limit'='10485760',    'write.log.max.size'='52428800', 'compaction.target_io'='5120', 'changelog.enabled'='false', 'clean.retain_commits'='20', 'archive.max_commits'='30', 'archive.min_commits'='20'{code}
 

cc [~danny0405],  can you also give some suggestion :)

Thank you all~

 

 

!image-2021-11-01-13-14-40-831.png!

 

!image-2021-11-01-13-16-28-695.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)