You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/27 02:25:48 UTC

[GitHub] [hudi] danny0405 commented on issue #4030: [SUPPORT] Flink uses updated fields to update data

danny0405 commented on issue #4030:
URL: https://github.com/apache/hudi/issues/4030#issuecomment-1022785250


   > Hi, any new update on this one.
   > 
   > After review the PR and source code and try to contribute, I find it seems that the partial update cannot support the case that the partition path is changed. Because once a new record coming and if the partition path changed, `BucketAssignFunction` will output two records: one deleted record on old partition, and one new record, and these two records will be assigned to two StreamWriteFunction(HoodieWriteHandle), each write handler only can handle one fileId and incoming/delete record, which means we can merge/partial-update the incoming record with the record exists in old base file. Otherwise, we might need to implement a write handle which can across different file ID.
   > 
   > Please correct me if my understanding is wrong.
   
   Yes, your understanding is correct. Currently it is a little hard in the project to have efficient point look-up, we can not query the old partition record first before we write new, this is the reason why the current code just send a DELETE record with primary key.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org