You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Frank Wong (Jira)" <ji...@apache.org> on 2022/06/30 03:15:00 UTC
[jira] [Created] (HUDI-4347) Make a simple method "merge" in HoodieMerge instead of "preCombine" and "combineAndGetUpdateValue"
Frank Wong created HUDI-4347:
--------------------------------
Summary: Make a simple method "merge" in HoodieMerge instead of "preCombine" and "combineAndGetUpdateValue"
Key: HUDI-4347
URL: https://issues.apache.org/jira/browse/HUDI-4347
Project: Apache Hudi
Issue Type: Improvement
Reporter: Frank Wong
Historically, this has been 2 different methods with (potentially) different semantics:
* {{preCombine}} is de-duplicating the input batch (before inserting it into the table
* {{combineAndGet}} is used to merge persisted version with the incoming (that could have been previously de-duplicated
I also don't see a reason for us to get hung up on this historical context and we should try to unify these historically (potentially) divergent methods into 1 providing a single avenue of merging records either inside a batch (when de-duping) or when combining persisted one with the incoming.
The merge api should be [associative operation|https://en.wikipedia.org/wiki/Associative_property]: {{f(a, f(b, c)) = f(f(a, b), c)}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)