You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "chenfengLiu (Jira)" <ji...@apache.org> on 2022/07/01 08:53:00 UTC
[jira] [Created] (HUDI-4350) reduce the shuffle work when we just insert but not update and delete
chenfengLiu created HUDI-4350:
---------------------------------
Summary: reduce the shuffle work when we just insert but not update and delete
Key: HUDI-4350
URL: https://issues.apache.org/jira/browse/HUDI-4350
Project: Apache Hudi
Issue Type: Improvement
Components: flink
Reporter: chenfengLiu
As the discussion on the https://issues.apache.org/jira/browse/HUDI-4338, more shuffle work will cause the network overhead and the risk of the data skew.
So when we build the flink data stream to write to hudi, the orignal plan is able to improve this point.
Now if we wanna update or delete record, we need to load index first, then send the index record and the hoodie record to Bucket Assgin Operator.
Bucket Assin Opeator will build the index state for assgining the bucket for incomming record.
If we just insert the new record not update or delete, we don't need these works like buld the index, repartion the existed record.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)