You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Pratyaksh Sharma (Jira)" <ji...@apache.org> on 2021/10/26 18:29:00 UTC
[jira] [Closed] (HUDI-2496) Inserts are precombined even with dedup
disabled
[ https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pratyaksh Sharma closed HUDI-2496.
----------------------------------
> Inserts are precombined even with dedup disabled
> ------------------------------------------------
>
> Key: HUDI-2496
> URL: https://issues.apache.org/jira/browse/HUDI-2496
> Project: Apache Hudi
> Issue Type: Bug
> Components: Writer Core
> Reporter: Sagar Sumit
> Assignee: Helias Antoniou
> Priority: Critical
> Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>
> Original GH issue https://github.com/apache/hudi/issues/3709
> Test case by [~xushiyan] : [https://github.com/apache/hudi/pull/3723/files]
> RCA by [~shivnarayan] :
> Within HoodieMergeHandle, we use a hashmap to store incoming records, where keys are record keys.
> and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd batch, only unique records are considered and later concatenated w/ 1st batch.
> [https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
--
This message was sent by Atlassian Jira
(v8.3.4#803005)