You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/07/29 19:30:13 UTC

[GitHub] [incubator-iceberg] aokolnychyi commented on issue #324: Optimistically query and replace files atomically

aokolnychyi commented on issue #324: Optimistically query and replace files atomically
URL: https://github.com/apache/incubator-iceberg/pull/324#issuecomment-516129948
 
 
   Thanks for working on this, @johnclara!
   
   We already have a solution internally that we were about to share as well. We decided to go with a new API instead of extending `RewriteFiles` because it uses `DataOperations.REPLACE`, meaning the assumption is that files are removed and replaced without changing the data. Therefore, we created `ModifyFiles` with `DataOperations.MODIFY`.
   
   We are accepting `baseSnapshotId` in the constructor instead of accepting it in the method. That way, it is transparent to the user and we can still track new files during retries. Also, we are accepting a row filter to do conflict resolution. Setting that row filter to `alwaysTrue` will be equivalent to what you have (i.e. retries will always fail). On top, that row filter allows us to do more fine-grained conflict resolution. For now, we limit the scope to partitions, meaning that if you are modifying files in one partition but a concurrent operation commits a new file to another partition, we can still commit successfully. Later, we will extend this logic beyond partitions by leveraging column stats and `InclusiveMetricsEvaluator` to detect conflicts.
   
   Let me share what I have so that we can discuss it in detail.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org