You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "nonggia.liang (Jira)" <ji...@apache.org> on 2022/09/02 02:19:00 UTC

[jira] [Updated] (HUDI-4769) Option read.streaming.skip_compaction skips delta commit

     [ https://issues.apache.org/jira/browse/HUDI-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nonggia.liang updated HUDI-4769:
--------------------------------
    Description: 
Option read.streaming.skip_compaction was introduced to avoid consuming duplicate data from delta-commits and compactions in MOR table.

But the option may cause delta-commits, here the case:

Support we have a timeline (d for delta-commit, C for compaction/commit):

d1 --> d2 --> C3 --> d4 --> d5 -->

t1.......................................................t2..........

Let's say scans for streaming read happen at time t1 and t2, when d1 and d5 is the latest instant seperately. 

When we scan at t2 with read.streaming.skip_compaction=true, we get a latest merged fileslice with only log files containing d4+d5.  So d2 is skipped.

  was:
Option read.streaming.skip_compaction was introduced to avoid consuming duplicate data from delta-commits and compactions in MOR table.

But the option may cause delta-commits, here the case:

Support we have a timeline (d for delta-commit, C for compaction/commit):

d1 --> d2 --> C3 --> d3 --> d4 -->

t1.......................................................t2..........

Let's say scans for streaming read happen at time t1 and t2, when d1 and d4 is the latest instant seperately. 

When we scan at t2 with read.streaming.skip_compaction=true, we get a latest merged fileslice with only log files containing d3+d4.  So d2 is skipped.


> Option read.streaming.skip_compaction skips delta commit
> --------------------------------------------------------
>
>                 Key: HUDI-4769
>                 URL: https://issues.apache.org/jira/browse/HUDI-4769
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink, flink-sql
>            Reporter: nonggia.liang
>            Priority: Major
>
> Option read.streaming.skip_compaction was introduced to avoid consuming duplicate data from delta-commits and compactions in MOR table.
> But the option may cause delta-commits, here the case:
> Support we have a timeline (d for delta-commit, C for compaction/commit):
> d1 --> d2 --> C3 --> d4 --> d5 -->
> t1.......................................................t2..........
> Let's say scans for streaming read happen at time t1 and t2, when d1 and d5 is the latest instant seperately. 
> When we scan at t2 with read.streaming.skip_compaction=true, we get a latest merged fileslice with only log files containing d4+d5.  So d2 is skipped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)