You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "hd zhou (Jira)" <ji...@apache.org> on 2022/03/16 07:30:00 UTC

[jira] [Created] (HUDI-3644) hoodie log scan bug cause data duplication

hd zhou created HUDI-3644:
-----------------------------

             Summary: hoodie log scan bug cause data duplication
                 Key: HUDI-3644
                 URL: https://issues.apache.org/jira/browse/HUDI-3644
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: hd zhou


AbstractHoodieLogRecordReader 

 
{code:java}
//代码占位符
if (!completedInstantsTimeline.containsOrBeforeTimelineStarts(instantTime)
    || inflightInstantsTimeline.containsInstant(instantTime)) {
  // hit an uncommitted block possibly from a failed write, move to the next one and skip processing this one
  continue;
} {code}
 

completedInstantsTimeline.containsOrBeforeTimelineStarts(instantTime)  is true will merge log file. this is not good.

 

when log file block append sucess.  And deltacommit rollback. And this instance time is not before activeTimeline starts. This log file block will be merged, cause data duplication.

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)