You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/05/02 15:28:14 UTC

[GitHub] [incubator-hudi] nandini57 commented on issue #1582: [SUPPORT] PreCombineAndUpdate in Payload

nandini57 commented on issue #1582:
URL: https://github.com/apache/incubator-hudi/issues/1582#issuecomment-622970820


   My apologies. Let me try to explain.If i don't upsert the data with each batch where applicable,when i query back the table,it will have duplicates as batch  "n" need to have data from batch  "n-1","n-2" ... I need to do group by upsertKey ..max(commit_time) to get the latest view of data.Doing a group by with each read won't scale .
   
   Instead of this, if i can preserve the current_val with deleted identifier in CustomPayload and also return both incoming and current payload in Combine And Get, i can preserve the required data for audit and also read can filter out records with deleted identifier.
   
   Does this make sense.Any other ideas? Possibly ,making the copyOldRecord a configurable property with default as false if that doesn't impact anything else
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org