You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Danny Chen (Jira)" <ji...@apache.org> on 2021/11/12 08:04:00 UTC

[jira] [Created] (HUDI-2751) To avoid the duplicates for streaming read MOR table

Danny Chen created HUDI-2751:
--------------------------------

             Summary: To avoid the duplicates for streaming read MOR table
                 Key: HUDI-2751
                 URL: https://issues.apache.org/jira/browse/HUDI-2751
             Project: Apache Hudi
          Issue Type: Sub-task
          Components: Common Core
            Reporter: Danny Chen
             Fix For: 0.11.0


Image there are commits on the timeline:

                inflight compaction                                 complete compaction 
                                  |                                                            |
-----instant 99 - instant 100 ----- 101 --- 102 ------ instant 100 ----------
 first read   ->|                                                          second read   ->|

-- range 1 ----| ----------------------range 2 -------------------|

instant 99, 101, 102 are successful non-compaction delta commits;
instant 100 is compaction instant,

the first inc read consumes to instant 99 and the second read consumes from instant 100 to instant 102, the second read would consumes the commit files of instant 100 which has already been consumed before.

The duplicate reading happens when this condition triggers: a compaction instant schedules then completes in *one* consume range.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)