You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Shengkai Fang (Jira)" <ji...@apache.org> on 2022/10/14 02:42:00 UTC

[jira] [Comment Edited] (FLINK-29625) Optimize changelog normalize for upsert source

    [ https://issues.apache.org/jira/browse/FLINK-29625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617416#comment-17617416 ] 

Shengkai Fang edited comment on FLINK-29625 at 10/14/22 2:41 AM:
-----------------------------------------------------------------

[~jiabao.sun]  I think you are right we should let the downstream operator to determine whether we need the ChangeLogNormalize. 


was (Author: fsk119):
[~jiabao.sun]  I think you are right we should let the downstream operator to determine whether we need the ChangeLogNormalize. But could you share some inputs how do you plan to implement this features?

> Optimize changelog normalize for upsert source
> ----------------------------------------------
>
>                 Key: FLINK-29625
>                 URL: https://issues.apache.org/jira/browse/FLINK-29625
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Planner
>    Affects Versions: 1.15.2
>            Reporter: Jiabao Sun
>            Priority: Major
>
> Currently, Flink will add an expensive operator _changelog normalize_ to the source of the upsert changelog mode to complete the _update_before_ value. 
> Even inserting directly from upsert-kafka source to upsert-kafka sink will still add this operator, and there is an extra operator to clear _upsert_before_ messages, which is obviously redundant.
> In CDC scenarios, some databases do not provide update before images, such as Cassandra、MongoDB、TiDB({_}Old Value{_} is not turned on) and Postgres ({_}REPLICA IDENTITY{_} is not set to {_}FULL{_}). Using Flink SQL to process these changelog will have a lot of state overhead.
> I don't know much about why this operator is needed, so I take the liberty to ask if we can get rid of changelog normalize completely or optimistic about it, adding it only if a normalized changelog is required by an after operator.
> If this optimization is worthwhile, I'm happy to help with it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)