You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/29 17:56:22 UTC

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5666: Bug Fix for Expire Snapshots: Fix ancestor lookup during file cleanup

amogh-jahagirdar commented on code in PR #5666:
URL: https://github.com/apache/iceberg/pull/5666#discussion_r957643199


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -366,11 +367,19 @@ private void removeExpiredFiles(
     // Reads and deletes are done using Tasks.foreach(...).suppressFailureWhenFinished to complete
     // as much of the delete work as possible and avoid orphaned data or manifest files.
 
-    // this is the set of ancestors of the current table state. when removing snapshots, this must
-    // only remove files that were deleted in an ancestor of the current table state to avoid
+    // ToDo: This will be removed when reachability analysis is done so files across multiple
+    // branches can be removed
+    SnapshotRef branchToCleanup = Iterables.getFirst(base.refs().values(), null);

Review Comment:
   My thinking is the following:
   
   1.) Logically, a tagged snapshot would either need to exist on either a.) non-main branch b.) main-branch
   2.) If the tag exists on main a file cleanup couldn't be done in the first place (because main cannot age off so we'd have multiple refs), so this point wouldn't have been reached
   3.) If the tag exists on a non-main branch and the non-main branch ages off before the tagged snapshot which gets retained, then the tag ends up being de-facto "tip" of a lineage. In which case, the expiration logic would work as expected. If non-main branch still is retained, then we wouldn't reach this point (same case as 2, just that the other ref is the non-main branch). 
   
   Combining this with the fact that writes cannot be performed on tags leads me to believe that for purpose of expiration there's no need to differentiate tags and branches. 
   
   I could call this refToCleanup if that makes more sense to folks? But the only case where this is a tag is the case what I mentioned in 3.) in which case it's just a "dangling" snapshot which is referenced by a tag. @namrathamyske @rdblue @jackye1995 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org