You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/09/30 21:27:00 UTC

[jira] [Commented] (HUDI-4958) Provide accurate numDeletes in commit metadata

    [ https://issues.apache.org/jira/browse/HUDI-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611772#comment-17611772 ] 

Ethan Guo commented on HUDI-4958:
---------------------------------

I check the commit data after insert, upsert (including deletes with "_hoodie_is_deleted"), and delete operations using Spark datasource in the Spark Guide.  The numInserts and numDeletes look accurate.  Also the logic for deriving numDeletes in HoodieMergeHandle looks OK.  We need to see if the inaccuracy comes from the custom payload implementation.

> Provide accurate numDeletes in commit metadata
> ----------------------------------------------
>
>                 Key: HUDI-4958
>                 URL: https://issues.apache.org/jira/browse/HUDI-4958
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Major
>
> When doing a simple computation of {{numInserts - numDeletes}} for all the commits, this leads to negative total records.  Need to check if number of inserts and deletes are accurate when both inserts and deletes exist in the same input batch for upsert.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)