You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/02 23:37:12 UTC

[GitHub] [hudi] guanziyue commented on pull request #4880: [HUDI-2752] The MOR DELETE block breaks the event time sequence of CDC

guanziyue commented on pull request #4880:
URL: https://github.com/apache/hudi/pull/4880#issuecomment-1057501670


   Happy to see the discussion of this problem, I would like to share our solution of this problem. We chose to abandon delete block mechanism and treat all records as mutation on possibly existing record. 
   1. Remove the logic to handle '_hoodie_is_deleted' in getInsertValue method in payload so that this method always return a non-empty generic record. 
   2. When we meet a delete record, return a record which has filed 'has _hoodie_is_deleted' is true and ordering value field. I think this may be similar to the tombstone mentioned by @alexeykudinkin above.
   4. Store all records in data block in which some of them is actually delete marker record. 
   5. Handle physical deletion in HoodieWriteHandle. We have already take into consider in combineAndGetUpdateValue. But we removed the logic in getInsertValue before. Add it back in HoodieMergeHandle/HoodieCreateHandle because it is time to do physical operation.
   
   Pros: we reuse some existing logic and it should be backward-compatibility.
   Cons: _hoodie_is_deleted is no longer an optional field in hudi. We have to treat it same as other meta fields in HoodieRecord. In addition to this, this method will waste little storage because _hoodie_is_deleted is stored in parquet and always false. Not sure if we can remove it or not.
   
   Besides, I would like to share some more information which helped me find this problem. This problem has bothered another MVCC storage engine for a long time. [Hbase limitation](https://hbase.apache.org/book.html#_current_limitations). And they have a trade-off solution finally [HBASE-15968](https://issues.apache.org/jira/browse/HBASE-15968)](https://issues.apache.org/jira/browse/HBASE-15968) similar to the conclusion mentioned by @nsivabalan . Hope these are helping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org