You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2019/10/29 15:42:00 UTC

[jira] [Comment Edited] (HUDI-15) Add a delete() API to HoodieWriteClient as well as Spark datasource #531

    [ https://issues.apache.org/jira/browse/HUDI-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962041#comment-16962041 ] 

Vinoth Chandar edited comment on HUDI-15 at 10/29/19 3:41 PM:
--------------------------------------------------------------

Few questions/clarifications as I am working through this. I am just getting to know the code base, and so some questions are related to code and some are related to designing. As of now, have looked at HoodieWriteClient code flow. 
 # I see that already COW table supports delete by way to updating with empty records. Are rollbacks automatically taken care of w/o any special assistance? 
 # can we assume compaction will maintain the empty records for deleted entries? Or will it remove it. 
 # If compaction is going to remove the empty record, will the rollback of compactions might get tricky or its a no op specifically wrt deleted entries. 
 # If compaction is going to maintain empty records, won't that be a space constraint at some point when most of the records are deleted. 
 # Whats the expected behavior if someone tries to delete an already deleted entry? Just ignore that from input records to be deleted? 
 # If someone tries to update a deleted entry, we throw an exception or silently ignore and return those records to the caller? Whats the current behavior and whats the ideal behavior we want to get to. 

btw, have taken a note of the schema issue from the referenced link.  


was (Author: shivnarayan):
Few questions/clarifications as I am working through this. I am just getting to know the code base, and so some questions are related to code and some are related to designing. As of now, have looked at HoodieWriteClient code flow. 
 * I see that already COW table supports delete by way to updating with empty records. Are rollbacks automatically taken care of w/o any special assistance? 
 * can we assume compaction will maintain the empty records for deleted entries? Or will it remove it. 
 * If compaction is going to remove the empty record, will the rollback of compactions might get tricky or its a no op specifically wrt deleted entries. 
 * If compaction is going to maintain empty records, won't that be a space constraint at some point when most of the records are deleted. 
 * Whats the expected behavior if someone tries to delete an already deleted entry? Just ignore that from input records to be deleted? 
 * If someone tries to update a deleted entry, we throw an exception or silently ignore and return those records to the caller? Whats the current behavior and whats the ideal behavior we want to get to. 

btw, have taken a note of the schema issue from the referenced link.  

> Add a delete() API to HoodieWriteClient as well as Spark datasource #531
> ------------------------------------------------------------------------
>
>                 Key: HUDI-15
>                 URL: https://issues.apache.org/jira/browse/HUDI-15
>             Project: Apache Hudi (incubating)
>          Issue Type: New Feature
>          Components: Spark datasource, Write Client
>            Reporter: Vinoth Chandar
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.5.1
>
>
> Delete API needs to be supported as first class citizen via DeltaStreamer, WriteClient and datasources. Currently there are two ways to delete, soft deletes and hard deletes - https://hudi.apache.org/writing_data.html#deletes. We need to ensure for hard deletes, we are able to leverage EmptyHoodieRecordPayload with just the HoodieKey and empty record value for deleting.
> [https://github.com/uber/hudi/issues/531]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)