You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "chenfengLiu (Jira)" <ji...@apache.org> on 2022/07/01 08:53:00 UTC

[jira] [Created] (HUDI-4350) reduce the shuffle work when we just insert but not update and delete

chenfengLiu created HUDI-4350:
---------------------------------

             Summary: reduce the shuffle work when we just insert but not update and delete
                 Key: HUDI-4350
                 URL: https://issues.apache.org/jira/browse/HUDI-4350
             Project: Apache Hudi
          Issue Type: Improvement
          Components: flink
            Reporter: chenfengLiu


As the discussion on the https://issues.apache.org/jira/browse/HUDI-4338, more shuffle work will cause the network overhead and the risk of the data skew.

So when we build the flink data stream to write to hudi, the orignal plan is able to improve this point.

Now if we wanna update or delete record, we need to load index first, then send the index record and the hoodie record to Bucket Assgin Operator.

Bucket Assin Opeator will build the index state for assgining the bucket for incomming record.

If we just insert the new record not update or delete, we don't need these works like buld the index, repartion the existed record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)