You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/30 06:40:09 UTC

[GitHub] [iceberg] skandasa23 removed a comment on issue #2764: Deduplication support in RewriteDataFilesAction

skandasa23 removed a comment on issue #2764:
URL: https://github.com/apache/iceberg/issues/2764#issuecomment-877751509


   Thank you @rdblue for getting back on this. Agree with you on the semantics, it makes sense.
   RewriteDataFiles was chosen mainly to avoid streaming consumers to process the day's worth data again because of overwrite.[with COW implementation, I'm assuming that overwrite would return all the added datafiles between snapshots S and S-1]
   I guess there could be use cases to treat compaction+dedupe as overwrite and other use cases to treat it as replace, is it a good idea to introduce an option to specify whether the rewrite was an overwrite/replace?
   Please share your thoughts. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org