You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Udit Mehrotra (Jira)" <ji...@apache.org> on 2021/08/25 09:00:00 UTC

[jira] [Updated] (HUDI-1127) Handling late arriving Deletes

     [ https://issues.apache.org/jira/browse/HUDI-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Udit Mehrotra updated HUDI-1127:
--------------------------------
    Fix Version/s:     (was: 0.9.0)
                   0.10.0

> Handling late arriving Deletes
> ------------------------------
>
>                 Key: HUDI-1127
>                 URL: https://issues.apache.org/jira/browse/HUDI-1127
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: DeltaStreamer, Writer Core
>    Affects Versions: 0.9.0
>            Reporter: Bhavani Sudha
>            Assignee: Bhavani Sudha
>            Priority: Major
>             Fix For: 0.10.0
>
>
> Recently I was working on a [PR|https://github.com/apache/hudi/pull/1704] to enhance OverwriteWithLatestAvroPayload class to consider records in storage when merging. Briefly, this class will ignore older updates if the record in storage is the latest one ( based on the Precombine field). 
> Based on this, the expectation is that we handle any write operation that should be dealt with the same way - if they are older they should be ignored. While at this, I identified that we cannot handle all Deletes the same way. This is because we process deletes in two ways mainly -
>  * by adding and enabling a metadata field  `_hoodie_is_deleted` to our in the original record and sending it as an UPSERT operation.
>  * by using an empty payload using the EmptyHoodieRecordPayload and sending the write as a DELETE operation. 
> While the former has ordering field and can be processed as expected (older deletes will be ignored), the later does not have any ordering field to identify if its an older delete or not and hence will let the older delete to go through.
> Just opening this issue to track this gap. We would need to identify what is the right choice here and fix as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)