You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Frank Wong (Jira)" <ji...@apache.org> on 2022/06/30 03:15:00 UTC

[jira] [Created] (HUDI-4347) Make a simple method "merge" in HoodieMerge instead of "preCombine" and "combineAndGetUpdateValue"

Frank Wong created HUDI-4347:
--------------------------------

             Summary: Make a simple method "merge" in HoodieMerge instead of "preCombine" and "combineAndGetUpdateValue"
                 Key: HUDI-4347
                 URL: https://issues.apache.org/jira/browse/HUDI-4347
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Frank Wong


Historically, this has been 2 different methods with (potentially) different semantics:
 * {{preCombine}} is de-duplicating the input batch (before inserting it into the table
 * {{combineAndGet}} is used to merge persisted version with the incoming (that could have been previously de-duplicated

I also don't see a reason for us to get hung up on this historical context and we should try to unify these historically (potentially) divergent methods into 1 providing a single avenue of merging records either inside a batch (when de-duping) or when combining persisted one with the incoming.

 

The merge api should be [associative operation|https://en.wikipedia.org/wiki/Associative_property]: {{f(a, f(b, c)) = f(f(a, b), c)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)