You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/16 15:25:37 UTC

[GitHub] [hudi] rahil-c commented on issue #3321: [SUPPORT] Setting _hoodie_is_deleted column is not deleting records when using Spark DataSource.

rahil-c commented on issue #3321:
URL: https://github.com/apache/hudi/issues/3321#issuecomment-899600636


   When I was running the reproduction, after casting the `hoodie_is_deleted` field to Boolean, as well as making it `nullable=True` I was still seeing a couple of issues, would be glad to get some thoughts on this.
   
   The first issue is when using the `master` branch and running the notebook example above, I saw that the end result was :
   ```
   
   +-------------------+------------------+-----------------+
   |_hoodie_commit_time|_hoodie_is_deleted|committed_records|
   +-------------------+------------------+-----------------+
   |20210812183614     |false             |20               |
   ```
   meaning that the record is still not getting deleted. 
   
   In the older hudi releases, like hudi 0.7.0 the end result after after making all the rows have`hoodie_is_deleted` set to false is this:
   
   ```
   +-------------------+------------------+-----------------+
   |_hoodie_commit_time|_hoodie_is_deleted|committed_records|
   +-------------------+------------------+-----------------+
   |20210812183614     |false             |19             |
   ```
   So its able to do the delete the record in 0.7.0, but not master which is odd even though the example is the same.
   The second issue is that in `0.7.0` if we dont specify `hoodie_is_deleted` for existing records, the final end result is this ( Unless this is expected behavior?)
   ```
   +-------------------+------------------+-----------------+
   |_hoodie_commit_time|_hoodie_is_deleted|committed_records|
   +-------------------+------------------+-----------------+
   |20210812212528     |null              |19               |
   ```
   
   Ive attached the Hudi Docs here for reference.
   https://hudi.apache.org/docs/writing_data/#deletes
   https://hudi.apache.org/blog/2020/01/15/delete-support-in-hudi/#deletion-with-hoodiedeltastreamer
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org