You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/05 19:57:33 UTC

[GitHub] [iceberg] rdblue commented on a diff in pull request #5669: Core: Expire Snapshots reachability analysis

rdblue commented on code in PR #5669:
URL: https://github.com/apache/iceberg/pull/5669#discussion_r985315437


##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -602,8 +623,9 @@ public void testExpireOlderThanMultipleCalls() {
     }
 
     // Retain last 2 snapshots and expire older than t3
-    table
-        .expireSnapshots()
+    RemoveSnapshots removeSnapshots = (RemoveSnapshots) table.expireSnapshots();
+    removeSnapshots
+        .withIncrementalCleanup(incrementalCleanup)

Review Comment:
   Why doesn't this use `removeSnapshots(Table)`?



##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -555,8 +575,9 @@ public void testRetainLastKeepsExpiringSnapshot() {
     }
 
     // Retain last 2 snapshots and expire older than t3
-    table
-        .expireSnapshots()
+    RemoveSnapshots removeSnapshots = (RemoveSnapshots) table.expireSnapshots();
+    removeSnapshots
+        .withIncrementalCleanup(incrementalCleanup)

Review Comment:
   Why doesn't this use `removeSnapshots(Table)`?



##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -504,7 +519,12 @@ public void testRetainNLargerThanCurrentSnapshots() {
 
     // Retain last 4 snapshots
     Transaction tx = table.newTransaction();
-    tx.expireSnapshots().expireOlderThan(t3).retainLast(4).commit();
+    RemoveSnapshots removeSnapshots = (RemoveSnapshots) tx.expireSnapshots();

Review Comment:
   You should be able to use `removeSnapshots(tx.table())`.



##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -504,7 +519,12 @@ public void testRetainNLargerThanCurrentSnapshots() {
 
     // Retain last 4 snapshots
     Transaction tx = table.newTransaction();
-    tx.expireSnapshots().expireOlderThan(t3).retainLast(4).commit();
+    RemoveSnapshots removeSnapshots = (RemoveSnapshots) tx.expireSnapshots();

Review Comment:
   You should be able to use `removeSnapshots(tx.table())` for transaction tests. You could also add another helper, but I think just passing in the transaction table should work fine.



##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -114,7 +122,9 @@ public void testExpireOlderThanWithDelete() {
 
     Set<String> deletedFiles = Sets.newHashSet();
 
-    table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+    RemoveSnapshots removeSnapshots = (RemoveSnapshots) table.expireSnapshots();

Review Comment:
   Looks like this line was added by mistake?



##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -39,13 +39,21 @@
 
 @RunWith(Parameterized.class)
 public class TestRemoveSnapshots extends TableTestBase {
-  @Parameterized.Parameters(name = "formatVersion = {0}")
+  private final boolean incrementalCleanup;
+
+  @Parameterized.Parameters(name = "formatVersion = {0}, incrementalCleanup = {1}")

Review Comment:
   Overall, the tests look correct to me. Just a couple minor things but no blockers.



##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -326,319 +315,25 @@ public void commit() {
     }
   }
 
-  private void cleanExpiredSnapshots() {
-    // clean up the expired snapshots:
-    // 1. Get a list of the snapshots that were removed
-    // 2. Delete any data files that were deleted by those snapshots and are not in the table
-    // 3. Delete any manifests that are no longer used by current snapshots
-    // 4. Delete the manifest lists
+  ExpireSnapshots withIncrementalCleanup(boolean useIncrementalCleanup) {
+    this.incrementalCleanup = useIncrementalCleanup;
+    return this;
+  }
 
+  private void cleanExpiredSnapshots() {
     TableMetadata current = ops.refresh();
 
-    Set<Long> validIds = Sets.newHashSet();
-    for (Snapshot snapshot : current.snapshots()) {
-      validIds.add(snapshot.snapshotId());
-    }
-
-    Set<Long> expiredIds = Sets.newHashSet();
-    for (Snapshot snapshot : base.snapshots()) {
-      long snapshotId = snapshot.snapshotId();
-      if (!validIds.contains(snapshotId)) {
-        // the snapshot was expired
-        LOG.info("Expired snapshot: {}", snapshot);
-        expiredIds.add(snapshotId);
-      }
-    }
-
-    if (expiredIds.isEmpty()) {
-      // if no snapshots were expired, skip cleanup
-      return;
-    }
-
-    LOG.info("Committed snapshot changes; cleaning up expired manifests and data files.");
-
-    removeExpiredFiles(current.snapshots(), validIds, expiredIds);
-  }
-
-  @SuppressWarnings({"checkstyle:CyclomaticComplexity", "MethodLength"})
-  private void removeExpiredFiles(
-      List<Snapshot> snapshots, Set<Long> validIds, Set<Long> expiredIds) {
-    // Reads and deletes are done using Tasks.foreach(...).suppressFailureWhenFinished to complete
-    // as much of the delete work as possible and avoid orphaned data or manifest files.
-
-    // this is the set of ancestors of the current table state. when removing snapshots, this must
-    // only remove files that were deleted in an ancestor of the current table state to avoid
-    // physically deleting files that were logically deleted in a commit that was rolled back.
-    Set<Long> ancestorIds =
-        Sets.newHashSet(SnapshotUtil.ancestorIds(base.currentSnapshot(), base::snapshot));
-
-    Set<Long> pickedAncestorSnapshotIds = Sets.newHashSet();
-    for (long snapshotId : ancestorIds) {
-      String sourceSnapshotId =
-          base.snapshot(snapshotId).summary().get(SnapshotSummary.SOURCE_SNAPSHOT_ID_PROP);
-      if (sourceSnapshotId != null) {
-        // protect any snapshot that was cherry-picked into the current table state
-        pickedAncestorSnapshotIds.add(Long.parseLong(sourceSnapshotId));
-      }
+    if (incrementalCleanup == null) {
+      incrementalCleanup = current.refs().size() == 1;

Review Comment:
   Nice to see that this uses incremental if there's only one ref and incremental isn't specifically set. Good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org