You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2019/11/03 15:08:00 UTC

[jira] [Commented] (HUDI-15) Add a delete() API to HoodieWriteClient as well as Spark datasource #531

    [ https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965678#comment-16965678 ] 

sivabalan narayanan commented on HUDI-15:
-----------------------------------------

Thanks for the heads up. I guess I get the requirement now. 

I went through the code path for deletes. Few follow up questions:
 * Are we looking to introduce a class for Delete similar to HoodieRecord(used for inserts and updates) and HoodieMergeHandle? Or is our intention is just add a new delete api for external facing clients and not touch internal pieces as much as possible? I am bit vary of touching HoodieRecord, since its being used across the board. 
 * Wrt schema fix, here is what I am thinking as a fix. 
 ** Fix LogReaderUtils.readSchemaFromLogFileInReverse() to iterate over log blocks in reverse to find the first non delete block and return the schema? 
 ** I know we have a corner case here too. If all blocks are delete blocks, will have to fetch the schema from base file and return. 

> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> ------------------------------------------------------------------------
>
>                 Key: HUDI-15
>                 URL: https://issues.apache.org/jira/browse/HUDI-15
>             Project: Apache Hudi (incubating)
>          Issue Type: New Feature
>          Components: Spark datasource, Write Client
>            Reporter: Vinoth Chandar
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.5.1
>
>
> Delete API needs to be supported as first class citizen via DeltaStreamer, WriteClient and datasources. Currently there are two ways to delete, soft deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes. We need to ensure for hard deletes, we are able to leverage EmptyHoodieRecordPayload with just the HoodieKey and empty record value for deleting.
> [https://github.com/uber/hudi/issues/531]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)