You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2019/11/03 15:08:00 UTC
[jira] [Commented] (HUDI-15) Add a delete() API to
HoodieWriteClient as well as Spark datasource #531
[ https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965678#comment-16965678 ]
sivabalan narayanan commented on HUDI-15:
-----------------------------------------
Thanks for the heads up. I guess I get the requirement now.
I went through the code path for deletes. Few follow up questions:
* Are we looking to introduce a class for Delete similar to HoodieRecord(used for inserts and updates) and HoodieMergeHandle? Or is our intention is just add a new delete api for external facing clients and not touch internal pieces as much as possible? I am bit vary of touching HoodieRecord, since its being used across the board.
* Wrt schema fix, here is what I am thinking as a fix.
** Fix LogReaderUtils.readSchemaFromLogFileInReverse() to iterate over log blocks in reverse to find the first non delete block and return the schema?
** I know we have a corner case here too. If all blocks are delete blocks, will have to fetch the schema from base file and return.
> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> ------------------------------------------------------------------------
>
> Key: HUDI-15
> URL: https://issues.apache.org/jira/browse/HUDI-15
> Project: Apache Hudi (incubating)
> Issue Type: New Feature
> Components: Spark datasource, Write Client
> Reporter: Vinoth Chandar
> Assignee: sivabalan narayanan
> Priority: Major
> Fix For: 0.5.1
>
>
> Delete API needs to be supported as first class citizen via DeltaStreamer, WriteClient and datasources. Currently there are two ways to delete, soft deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes. We need to ensure for hard deletes, we are able to leverage EmptyHoodieRecordPayload with just the HoodieKey and empty record value for deleting.
> [https://github.com/uber/hudi/issues/531]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)