You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Kenyore (Jira)" <ji...@apache.org> on 2022/02/24 09:54:00 UTC
[jira] [Created] (FLINK-26348) Maybe ChangelogNormalize should ignore unused columns when deduplicate
Kenyore created FLINK-26348:
-------------------------------
Summary: Maybe ChangelogNormalize should ignore unused columns when deduplicate
Key: FLINK-26348
URL: https://issues.apache.org/jira/browse/FLINK-26348
Project: Flink
Issue Type: Improvement
Affects Versions: 1.13.2
Reporter: Kenyore
In my case I have tables below
* sku(size:1K+)
* custom_product(size:10B+)
* order(size:100M+)
And my sql is like
{code:sql}
SELECT o.code,o.created,s.sku_name,p.product_name FROM order o
INNER JOIN custom_product p ON o.p_id=p.id
INNER JOIN sku s ON s.id=p.s_id
{code}
Table sku has some other columns.
The problem is that when another column(be like description) in any row of table sku changes,flink may produce millions of update rows whitch is useless in downstream.Because we only pick column sku_name in the downstream,but the change is column description.
This kind of useless update row would bring pressure to downstream operators.
I think it is significant for flink to improve this.thks
--
This message was sent by Atlassian Jira
(v8.20.1#820001)