You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/15 07:22:14 UTC

[GitHub] [iceberg] openinx commented on issue #2308: Handle the case that RewriteFiles and RowDelta commit the transaction at the same time

openinx commented on issue #2308:
URL: https://github.com/apache/iceberg/issues/2308#issuecomment-799180514


   > Is this a typo: t1 -> t4?
   
   Yeah,  thanks @stevenzwu.  It's a typo, sorry for that.
   
   > It have to re-apply all the expensive rewrite computation based on the latest snapshot.
   
   Re-do the expensive rewrite process is not the solution that people want because if the streaming job is checkpointing for every 1min and the rewrite action take 4min to complete , then there is probability that the expensive rewrite action will be always retrying. 
   
   Besides the expensive resource cost from retrying,  there is a data semantic issue that will confuse people much.  I mean different txn commit order between RewriteFiles and  CDC/Upsert transaction will lead to different data set for the same dataset & operations.  Then people won't trust the truth of iceberg table. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org