You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2021/11/17 13:42:00 UTC

[jira] [Commented] (HUDI-2752) The MOR DELETE block breaks the event time sequence of CDC

    [ https://issues.apache.org/jira/browse/HUDI-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445169#comment-17445169 ] 

Vinoth Chandar commented on HUDI-2752:
--------------------------------------

IIUC the issue is that the merge happens by delta commit time? Could you provide some example? 

In general, having first class definitions for event and commit times (i.e arrival/processing time) might be good. Lots of users don't relate that quickly to preCombine field being the event time. Your approach seems like in the right direction. We need to flesh details out

> The MOR DELETE block breaks the event time sequence of CDC
> ----------------------------------------------------------
>
>                 Key: HUDI-2752
>                 URL: https://issues.apache.org/jira/browse/HUDI-2752
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: Flink Integration
>            Reporter: Danny Chen
>            Priority: Major
>             Fix For: 0.11.0
>
>
> Currently, the DELETE blocks are always written after the data blocks for one batch of data write, when there are INSERT/UPDATEs after the DELETE, the data would lost.
> What i can thought of is that the DELETE block should at least keep the event time sequence for #preCombine with other record payloads.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)