You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/23 06:28:35 UTC

[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #5618: Support performing delete files on branches

amogh-jahagirdar opened a new pull request, #5618:
URL: https://github.com/apache/iceberg/pull/5618

   I realized as I was implementing snapshot producer operations for branches, that one operation that could be handled in a straightforward manner was delete files. It also helps with writing some of the test cases for the implementation of branch operations because for some of those tests we want to delete files on a given branch. So separating out this PR.
   
   cc: @rdblue @jackye1995 @namrathamyske 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996530283


##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1455,6 +1455,56 @@ public void testMaxSnapshotAgeMultipleBranches() {
     Assert.assertNull(table.ops().current().snapshot(initialSnapshotId));
   }
 
+  @Test
+  public void testRetainFilesOnRetainedBranches() {
+    // Append a file to main and test branch
+    String testBranch = "test-branch";
+    table.newAppend().appendFile(FILE_A).commit();
+    Snapshot a = table.currentSnapshot();
+    table.manageSnapshots().createBranch(testBranch, a.snapshotId()).commit();
+
+    // Delete A from main
+    table.newDelete().deleteFile(FILE_A).commit();
+    Snapshot deletionA = table.currentSnapshot();
+    // Add B to main
+    table.newAppend().appendFile(FILE_B).commit();
+    long tAfterCommits = waitUntilAfter(table.currentSnapshot().timestampMillis());
+    Set<String> deletedFiles = Sets.newHashSet();
+    Set<String> expectedDeletes = Sets.newHashSet();
+
+    // Only deletionA's manifest list and manifests should be removed
+    expectedDeletes.add(deletionA.manifestListLocation());
+    expectedDeletes.addAll(
+        deletionA.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));
+    table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+
+    Assert.assertEquals(2, Iterables.size(table.snapshots()));
+    Assert.assertEquals(expectedDeletes, deletedFiles);
+
+    // Delete A on test branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot branchDelete = table.snapshot(testBranch);
+    table.newAppend().appendFile(FILE_C).toBranch(testBranch).commit();
+    Snapshot testBranchHead = table.snapshot(testBranch);
+
+    deletedFiles = Sets.newHashSet();
+    expectedDeletes = Sets.newHashSet();
+    table.expireSnapshots().expireOlderThan(testBranchHead.timestampMillis()).deleteWith(deletedFiles::add).commit();

Review Comment:
   Does this need a `waitUntilAfter` like you used before?



##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1455,6 +1455,56 @@ public void testMaxSnapshotAgeMultipleBranches() {
     Assert.assertNull(table.ops().current().snapshot(initialSnapshotId));
   }
 
+  @Test
+  public void testRetainFilesOnRetainedBranches() {
+    // Append a file to main and test branch
+    String testBranch = "test-branch";
+    table.newAppend().appendFile(FILE_A).commit();
+    Snapshot a = table.currentSnapshot();
+    table.manageSnapshots().createBranch(testBranch, a.snapshotId()).commit();
+
+    // Delete A from main
+    table.newDelete().deleteFile(FILE_A).commit();
+    Snapshot deletionA = table.currentSnapshot();
+    // Add B to main
+    table.newAppend().appendFile(FILE_B).commit();
+    long tAfterCommits = waitUntilAfter(table.currentSnapshot().timestampMillis());
+    Set<String> deletedFiles = Sets.newHashSet();
+    Set<String> expectedDeletes = Sets.newHashSet();
+
+    // Only deletionA's manifest list and manifests should be removed
+    expectedDeletes.add(deletionA.manifestListLocation());
+    expectedDeletes.addAll(
+        deletionA.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));
+    table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+
+    Assert.assertEquals(2, Iterables.size(table.snapshots()));
+    Assert.assertEquals(expectedDeletes, deletedFiles);
+
+    // Delete A on test branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot branchDelete = table.snapshot(testBranch);
+    table.newAppend().appendFile(FILE_C).toBranch(testBranch).commit();

Review Comment:
   Comment about appending C?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997572312


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -349,6 +349,35 @@ void validateSnapshot(Snapshot old, Snapshot snap, long sequenceNumber, DataFile
     validateSnapshot(old, snap, (Long) sequenceNumber, newFiles);
   }
 
+  Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {

Review Comment:
   good point, yeah in this case it makes sense just to use the actual name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] namrathamyske commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
namrathamyske commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r956761421


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
+  private final String branch;
+
   @Parameterized.Parameters(name = "formatVersion = {0}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},

Review Comment:
   I am little confused regarding parameters. Does each {1, "testBranch"} translate to {formatVersion, branchname}?.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996527511


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -349,6 +349,35 @@ void validateSnapshot(Snapshot old, Snapshot snap, long sequenceNumber, DataFile
     validateSnapshot(old, snap, (Long) sequenceNumber, newFiles);
   }
 
+  Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {
+    Snapshot snapshot = null;
+
+    if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
+      snapshotUpdate.commit();
+      snapshot = tbl.currentSnapshot();
+    } else {
+      ((SnapshotProducer) snapshotUpdate.toBranch(branch)).commit();
+      snapshot = tbl.snapshot(branch);
+    }
+
+    return snapshot;
+  }
+
+  Snapshot apply(SnapshotUpdate snapshotUpdate, String branch) {
+    if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
+      return ((SnapshotProducer) snapshotUpdate).apply();
+    } else {
+      return ((SnapshotProducer) snapshotUpdate.toBranch(branch)).apply();
+    }
+  }
+
+  Snapshot currentSnapshot(Table tbl, String branch) {

Review Comment:
   How about `latest` rather than `current`? I think that describes it better since `current` has a specific meaning (the latest on the main branch).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997572791


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
-  @Parameterized.Parameters(name = "formatVersion = {0}")
+  private final String branch;
+
+  @Parameterized.Parameters(name = "formatVersion = {0}, branch = {1}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},
+      new Object[] {1, "testBranch"},
+      new Object[] {2, "main"},
+      new Object[] {2, "testBranch"}
+    };
   }
 
-  public TestDeleteFiles(int formatVersion) {
+  public TestDeleteFiles(int formatVersion, String branch) {
     super(formatVersion);
+    this.branch = branch;
   }
 
   @Test
   public void testMultipleDeletes() {
-    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot append =
+        commit(
+            table,
+            table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C),
+            branch);
 
     Assert.assertEquals("Metadata should be at version 1", 1L, (long) version());
-    Snapshot append = readMetadata().currentSnapshot();

Review Comment:
   I've updated with an additional latestSnapshot method which takes in `TableMetadata` . The cases which were using it before, are now using this method instead of through the table. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r956781597


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
+  private final String branch;
+
   @Parameterized.Parameters(name = "formatVersion = {0}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},

Review Comment:
   That's correct, we test for every combination of format version + branch. I'll need update the "Parameters" annotation so that we can get nicer test names. The branch parameter gets injected as an argument in the test class constructor. 



##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
+  private final String branch;
+
   @Parameterized.Parameters(name = "formatVersion = {0}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},

Review Comment:
   That's correct, we test for every combination of format version + branch. I'll need to update the "Parameters" annotation so that we can get nicer test names. The branch parameter gets injected as an argument in the test class constructor. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996530478


##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1455,6 +1455,56 @@ public void testMaxSnapshotAgeMultipleBranches() {
     Assert.assertNull(table.ops().current().snapshot(initialSnapshotId));
   }
 
+  @Test
+  public void testRetainFilesOnRetainedBranches() {
+    // Append a file to main and test branch
+    String testBranch = "test-branch";
+    table.newAppend().appendFile(FILE_A).commit();
+    Snapshot a = table.currentSnapshot();
+    table.manageSnapshots().createBranch(testBranch, a.snapshotId()).commit();
+
+    // Delete A from main
+    table.newDelete().deleteFile(FILE_A).commit();
+    Snapshot deletionA = table.currentSnapshot();
+    // Add B to main
+    table.newAppend().appendFile(FILE_B).commit();
+    long tAfterCommits = waitUntilAfter(table.currentSnapshot().timestampMillis());
+    Set<String> deletedFiles = Sets.newHashSet();
+    Set<String> expectedDeletes = Sets.newHashSet();
+
+    // Only deletionA's manifest list and manifests should be removed
+    expectedDeletes.add(deletionA.manifestListLocation());
+    expectedDeletes.addAll(
+        deletionA.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));
+    table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+
+    Assert.assertEquals(2, Iterables.size(table.snapshots()));
+    Assert.assertEquals(expectedDeletes, deletedFiles);
+
+    // Delete A on test branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot branchDelete = table.snapshot(testBranch);
+    table.newAppend().appendFile(FILE_C).toBranch(testBranch).commit();
+    Snapshot testBranchHead = table.snapshot(testBranch);
+
+    deletedFiles = Sets.newHashSet();
+    expectedDeletes = Sets.newHashSet();
+    table.expireSnapshots().expireOlderThan(testBranchHead.timestampMillis()).deleteWith(deletedFiles::add).commit();
+
+    //
+    expectedDeletes.add(a.manifestListLocation());
+    expectedDeletes.addAll(
+        a.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));

Review Comment:
   It looks like this `allManifests` logic may be worth putting in a private helper function, `manifestPaths(Snapshot, FileIO)`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997573067


##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1455,6 +1455,56 @@ public void testMaxSnapshotAgeMultipleBranches() {
     Assert.assertNull(table.ops().current().snapshot(initialSnapshotId));
   }
 
+  @Test
+  public void testRetainFilesOnRetainedBranches() {
+    // Append a file to main and test branch
+    String testBranch = "test-branch";
+    table.newAppend().appendFile(FILE_A).commit();
+    Snapshot a = table.currentSnapshot();
+    table.manageSnapshots().createBranch(testBranch, a.snapshotId()).commit();
+
+    // Delete A from main
+    table.newDelete().deleteFile(FILE_A).commit();
+    Snapshot deletionA = table.currentSnapshot();
+    // Add B to main
+    table.newAppend().appendFile(FILE_B).commit();
+    long tAfterCommits = waitUntilAfter(table.currentSnapshot().timestampMillis());
+    Set<String> deletedFiles = Sets.newHashSet();
+    Set<String> expectedDeletes = Sets.newHashSet();
+
+    // Only deletionA's manifest list and manifests should be removed
+    expectedDeletes.add(deletionA.manifestListLocation());
+    expectedDeletes.addAll(
+        deletionA.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));
+    table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+
+    Assert.assertEquals(2, Iterables.size(table.snapshots()));
+    Assert.assertEquals(expectedDeletes, deletedFiles);
+
+    // Delete A on test branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot branchDelete = table.snapshot(testBranch);
+    table.newAppend().appendFile(FILE_C).toBranch(testBranch).commit();
+    Snapshot testBranchHead = table.snapshot(testBranch);
+
+    deletedFiles = Sets.newHashSet();
+    expectedDeletes = Sets.newHashSet();
+    table.expireSnapshots().expireOlderThan(testBranchHead.timestampMillis()).deleteWith(deletedFiles::add).commit();

Review Comment:
   Ah yes, there should be a waitUntilAfter 



##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1455,6 +1455,56 @@ public void testMaxSnapshotAgeMultipleBranches() {
     Assert.assertNull(table.ops().current().snapshot(initialSnapshotId));
   }
 
+  @Test
+  public void testRetainFilesOnRetainedBranches() {
+    // Append a file to main and test branch
+    String testBranch = "test-branch";
+    table.newAppend().appendFile(FILE_A).commit();
+    Snapshot a = table.currentSnapshot();
+    table.manageSnapshots().createBranch(testBranch, a.snapshotId()).commit();
+
+    // Delete A from main
+    table.newDelete().deleteFile(FILE_A).commit();
+    Snapshot deletionA = table.currentSnapshot();
+    // Add B to main
+    table.newAppend().appendFile(FILE_B).commit();
+    long tAfterCommits = waitUntilAfter(table.currentSnapshot().timestampMillis());
+    Set<String> deletedFiles = Sets.newHashSet();
+    Set<String> expectedDeletes = Sets.newHashSet();
+
+    // Only deletionA's manifest list and manifests should be removed
+    expectedDeletes.add(deletionA.manifestListLocation());
+    expectedDeletes.addAll(
+        deletionA.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));
+    table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+
+    Assert.assertEquals(2, Iterables.size(table.snapshots()));
+    Assert.assertEquals(expectedDeletes, deletedFiles);
+
+    // Delete A on test branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot branchDelete = table.snapshot(testBranch);
+    table.newAppend().appendFile(FILE_C).toBranch(testBranch).commit();
+    Snapshot testBranchHead = table.snapshot(testBranch);
+
+    deletedFiles = Sets.newHashSet();
+    expectedDeletes = Sets.newHashSet();
+    table.expireSnapshots().expireOlderThan(testBranchHead.timestampMillis()).deleteWith(deletedFiles::add).commit();
+
+    //
+    expectedDeletes.add(a.manifestListLocation());
+    expectedDeletes.addAll(
+        a.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996530604


##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1455,6 +1455,56 @@ public void testMaxSnapshotAgeMultipleBranches() {
     Assert.assertNull(table.ops().current().snapshot(initialSnapshotId));
   }
 
+  @Test
+  public void testRetainFilesOnRetainedBranches() {
+    // Append a file to main and test branch
+    String testBranch = "test-branch";
+    table.newAppend().appendFile(FILE_A).commit();
+    Snapshot a = table.currentSnapshot();
+    table.manageSnapshots().createBranch(testBranch, a.snapshotId()).commit();
+
+    // Delete A from main
+    table.newDelete().deleteFile(FILE_A).commit();
+    Snapshot deletionA = table.currentSnapshot();
+    // Add B to main
+    table.newAppend().appendFile(FILE_B).commit();
+    long tAfterCommits = waitUntilAfter(table.currentSnapshot().timestampMillis());
+    Set<String> deletedFiles = Sets.newHashSet();
+    Set<String> expectedDeletes = Sets.newHashSet();
+
+    // Only deletionA's manifest list and manifests should be removed
+    expectedDeletes.add(deletionA.manifestListLocation());
+    expectedDeletes.addAll(
+        deletionA.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));
+    table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+
+    Assert.assertEquals(2, Iterables.size(table.snapshots()));
+    Assert.assertEquals(expectedDeletes, deletedFiles);
+
+    // Delete A on test branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot branchDelete = table.snapshot(testBranch);
+    table.newAppend().appendFile(FILE_C).toBranch(testBranch).commit();
+    Snapshot testBranchHead = table.snapshot(testBranch);
+
+    deletedFiles = Sets.newHashSet();
+    expectedDeletes = Sets.newHashSet();
+    table.expireSnapshots().expireOlderThan(testBranchHead.timestampMillis()).deleteWith(deletedFiles::add).commit();
+
+    //

Review Comment:
   ???



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] namrathamyske commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
namrathamyske commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r956761421


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
+  private final String branch;
+
   @Parameterized.Parameters(name = "formatVersion = {0}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},

Review Comment:
   I am little confused why formatVersion parameters were removed. Does each {1, "testBranch"} translate to formatVersion, branchname?.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997564163


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
-  @Parameterized.Parameters(name = "formatVersion = {0}")
+  private final String branch;
+
+  @Parameterized.Parameters(name = "formatVersion = {0}, branch = {1}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},
+      new Object[] {1, "testBranch"},
+      new Object[] {2, "main"},
+      new Object[] {2, "testBranch"}
+    };
   }
 
-  public TestDeleteFiles(int formatVersion) {
+  public TestDeleteFiles(int formatVersion, String branch) {
     super(formatVersion);
+    this.branch = branch;
   }
 
   @Test
   public void testMultipleDeletes() {
-    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot append =

Review Comment:
   Sure, I left it as is for now.  I did add a new method for reading from the metadata for the test cases which were doing that before just so it's consistent (although as you said, it doesn't impact anything from a correctness standpoint). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996529884


##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1455,6 +1455,56 @@ public void testMaxSnapshotAgeMultipleBranches() {
     Assert.assertNull(table.ops().current().snapshot(initialSnapshotId));
   }
 
+  @Test
+  public void testRetainFilesOnRetainedBranches() {
+    // Append a file to main and test branch
+    String testBranch = "test-branch";
+    table.newAppend().appendFile(FILE_A).commit();
+    Snapshot a = table.currentSnapshot();
+    table.manageSnapshots().createBranch(testBranch, a.snapshotId()).commit();
+
+    // Delete A from main
+    table.newDelete().deleteFile(FILE_A).commit();
+    Snapshot deletionA = table.currentSnapshot();
+    // Add B to main
+    table.newAppend().appendFile(FILE_B).commit();
+    long tAfterCommits = waitUntilAfter(table.currentSnapshot().timestampMillis());
+    Set<String> deletedFiles = Sets.newHashSet();

Review Comment:
   I don't think these declarations have much to do with the block of code "Add B to main" so they should be separated by an empty line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997590059


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -349,6 +349,35 @@ void validateSnapshot(Snapshot old, Snapshot snap, long sequenceNumber, DataFile
     validateSnapshot(old, snap, (Long) sequenceNumber, newFiles);
   }
 
+  Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {
+    Snapshot snapshot = null;
+
+    if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
+      snapshotUpdate.commit();
+      snapshot = tbl.currentSnapshot();
+    } else {
+      ((SnapshotProducer) snapshotUpdate.toBranch(branch)).commit();

Review Comment:
   AppendFiles doesn't extend SnapshotProducer, so the casting would just have to shift to the caller.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#issuecomment-1281679029

   Since @rdblue also approved I will go ahead and merge this, thanks for the contribution @amogh-jahagirdar! And thanks for the review @rdblue !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r956781921


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -349,6 +349,33 @@ void validateSnapshot(Snapshot old, Snapshot snap, long sequenceNumber, DataFile
     validateSnapshot(old, snap, (Long) sequenceNumber, newFiles);
   }
 
+  Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {
+    Snapshot snapshot = null;
+    if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
+      snapshotUpdate.commit();
+      snapshot = tbl.currentSnapshot();
+    } else {
+      ((SnapshotProducer) snapshotUpdate.toBranch(branch)).commit();
+      snapshot = tbl.snapshot(tbl.refs().get(branch).snapshotId());
+    }
+    return snapshot;
+  }
+
+  Snapshot apply(SnapshotUpdate snapshotUpdate, String branch) {
+    if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
+      return ((SnapshotProducer) snapshotUpdate).apply();
+    } else {
+      return ((SnapshotProducer) snapshotUpdate.toBranch(branch)).apply();
+    }
+  }

Review Comment:
   I could combine these into a common method apply(Table tl, SnapshotUpdate snapshotUpdate, String branch, boolean commit) and then only commit if commit is true. but for now opting to keep it explicit and separate, unless there's any objections



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r954274514


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -300,6 +300,70 @@ public void testDeleteCaseSensitivity() {
         statuses(Status.DELETED));
   }
 
+  @Test
+  public void testMultipleFileDeletesOnBranch() {
+    String testBranch = "testBranch";
+    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot main = readMetadata().currentSnapshot();
+
+    // Validate main history is as expected
+    Assert.assertEquals("Should have 1 manifest", 1, main.allManifests(FILE_IO).size());
+    validateSnapshot(null, main, FILE_A, FILE_B, FILE_C);
+
+    // Perform first delete on branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot afterDeletingA = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingA.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingA.allManifests(table.io()).get(0),
+        ids(afterDeletingA.snapshotId(), main.snapshotId(), main.snapshotId()),
+        files(FILE_A, FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING, Status.EXISTING));
+    table.newDelete().deleteFile(FILE_B).toBranch(testBranch).commit();
+
+    // Perform second delete
+    Snapshot afterDeletingB = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingB.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingB.allManifests(FILE_IO).get(0),
+        ids(afterDeletingB.snapshotId(), main.snapshotId()),
+        files(FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING));
+  }
+
+  @Test
+  public void testDeleteWithRowFilterWithCombinedPredicatesOnBranch() {

Review Comment:
   Here's the comment on RowDelta: https://github.com/apache/iceberg/pull/5234/files#r953201851



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r954274030


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -300,6 +300,70 @@ public void testDeleteCaseSensitivity() {
         statuses(Status.DELETED));
   }
 
+  @Test
+  public void testMultipleFileDeletesOnBranch() {
+    String testBranch = "testBranch";
+    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot main = readMetadata().currentSnapshot();
+
+    // Validate main history is as expected
+    Assert.assertEquals("Should have 1 manifest", 1, main.allManifests(FILE_IO).size());
+    validateSnapshot(null, main, FILE_A, FILE_B, FILE_C);
+
+    // Perform first delete on branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot afterDeletingA = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingA.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingA.allManifests(table.io()).get(0),
+        ids(afterDeletingA.snapshotId(), main.snapshotId(), main.snapshotId()),
+        files(FILE_A, FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING, Status.EXISTING));
+    table.newDelete().deleteFile(FILE_B).toBranch(testBranch).commit();
+
+    // Perform second delete
+    Snapshot afterDeletingB = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingB.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingB.allManifests(FILE_IO).get(0),
+        ids(afterDeletingB.snapshotId(), main.snapshotId()),
+        files(FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING));
+  }
+
+  @Test
+  public void testDeleteWithRowFilterWithCombinedPredicatesOnBranch() {

Review Comment:
   Similar to the PR for `RowDelta`, can we parameterize this so that we test all of the existing cases on a branch that is independent from `main`? Then we only need tests for modifications to both main and a separate branch (to validate that they evolve independently).
   
   We'll also need to update snapshot expiration tests to ensure that file deletion happens as expected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] namrathamyske commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
namrathamyske commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r956761421


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
+  private final String branch;
+
   @Parameterized.Parameters(name = "formatVersion = {0}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},

Review Comment:
   I am little confused why formatVersion parameters were removed. Does each {1, "testBranch"} translate to {formatVersion, branchname}?.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#issuecomment-1281710840

   Thanks for the review @rdblue ! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996528993


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
-  @Parameterized.Parameters(name = "formatVersion = {0}")
+  private final String branch;
+
+  @Parameterized.Parameters(name = "formatVersion = {0}, branch = {1}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},
+      new Object[] {1, "testBranch"},
+      new Object[] {2, "main"},
+      new Object[] {2, "testBranch"}
+    };
   }
 
-  public TestDeleteFiles(int formatVersion) {
+  public TestDeleteFiles(int formatVersion, String branch) {
     super(formatVersion);
+    this.branch = branch;
   }
 
   @Test
   public void testMultipleDeletes() {
-    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot append =

Review Comment:
   Thinking about this a bit more, returning the committed `Snapshot` from `commit` is fine. Up to you what to do here, I'm probably being pedantic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] namrathamyske commented on a diff in pull request #5618: Support performing delete files on branches

Posted by GitBox <gi...@apache.org>.
namrathamyske commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r952210073


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -300,6 +300,70 @@ public void testDeleteCaseSensitivity() {
         statuses(Status.DELETED));
   }
 
+  @Test
+  public void testMultipleFileDeletesOnBranch() {
+    String testBranch = "testBranch";
+    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot main = readMetadata().currentSnapshot();
+
+    // Validate main history is as expected
+    Assert.assertEquals("Should have 1 manifest", 1, main.allManifests(FILE_IO).size());
+    validateSnapshot(null, main, FILE_A, FILE_B, FILE_C);
+
+    // Perform first delete on branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot afterDeletingA = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingA.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingA.allManifests(table.io()).get(0),
+        ids(afterDeletingA.snapshotId(), main.snapshotId(), main.snapshotId()),
+        files(FILE_A, FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING, Status.EXISTING));
+    table.newDelete().deleteFile(FILE_B).toBranch(testBranch).commit();
+
+    // Perform second delete
+    Snapshot afterDeletingB = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingB.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingB.allManifests(FILE_IO).get(0),
+        ids(afterDeletingB.snapshotId(), main.snapshotId()),
+        files(FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING));
+  }
+
+  @Test
+  public void testDeleteWithRowFilterWithCombinedPredicatesOnBranch() {
+    String testBranch = "testBranch";
+    // add both data files
+    table
+        .newFastAppend()
+        .appendFile(DATA_FILE_BUCKET_0_IDS_0_2)
+        .appendFile(DATA_FILE_BUCKET_0_IDS_8_10)
+        .commit();
+
+    Snapshot initialSnapshot = table.currentSnapshot();
+    Assert.assertEquals("Should have 1 manifest", 1, initialSnapshot.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        initialSnapshot.allManifests(FILE_IO).get(0),
+        ids(initialSnapshot.snapshotId(), initialSnapshot.snapshotId()),
+        files(DATA_FILE_BUCKET_0_IDS_0_2, DATA_FILE_BUCKET_0_IDS_8_10),
+        statuses(Status.ADDED, Status.ADDED));
+
+    // delete the second one using a filter that relies on metrics and partition data
+    Expression partPredicate = Expressions.equal(Expressions.bucket("data", 16), 0);
+    Expression rowPredicate = Expressions.greaterThan("id", 5);
+    Expression predicate = Expressions.and(partPredicate, rowPredicate);
+    table.newDelete().deleteFromRowFilter(predicate).toBranch(testBranch).commit();
+
+    Snapshot deleteSnapshot = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, deleteSnapshot.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        deleteSnapshot.allManifests(FILE_IO).get(0),
+        ids(initialSnapshot.snapshotId(), deleteSnapshot.snapshotId()),

Review Comment:
   In test cases initialSnapshot is always behind deleteSnapshot. Should there be a check when it's ahead of deleteSnapshot ? Not sure if it's necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r952219398


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -300,6 +300,70 @@ public void testDeleteCaseSensitivity() {
         statuses(Status.DELETED));
   }
 
+  @Test
+  public void testMultipleFileDeletesOnBranch() {
+    String testBranch = "testBranch";
+    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot main = readMetadata().currentSnapshot();
+
+    // Validate main history is as expected
+    Assert.assertEquals("Should have 1 manifest", 1, main.allManifests(FILE_IO).size());
+    validateSnapshot(null, main, FILE_A, FILE_B, FILE_C);
+
+    // Perform first delete on branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot afterDeletingA = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingA.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingA.allManifests(table.io()).get(0),
+        ids(afterDeletingA.snapshotId(), main.snapshotId(), main.snapshotId()),
+        files(FILE_A, FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING, Status.EXISTING));
+    table.newDelete().deleteFile(FILE_B).toBranch(testBranch).commit();
+
+    // Perform second delete
+    Snapshot afterDeletingB = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingB.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingB.allManifests(FILE_IO).get(0),
+        ids(afterDeletingB.snapshotId(), main.snapshotId()),
+        files(FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING));
+  }
+
+  @Test
+  public void testDeleteWithRowFilterWithCombinedPredicatesOnBranch() {
+    String testBranch = "testBranch";
+    // add both data files
+    table
+        .newFastAppend()
+        .appendFile(DATA_FILE_BUCKET_0_IDS_0_2)
+        .appendFile(DATA_FILE_BUCKET_0_IDS_8_10)
+        .commit();
+
+    Snapshot initialSnapshot = table.currentSnapshot();
+    Assert.assertEquals("Should have 1 manifest", 1, initialSnapshot.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        initialSnapshot.allManifests(FILE_IO).get(0),
+        ids(initialSnapshot.snapshotId(), initialSnapshot.snapshotId()),
+        files(DATA_FILE_BUCKET_0_IDS_0_2, DATA_FILE_BUCKET_0_IDS_8_10),
+        statuses(Status.ADDED, Status.ADDED));
+
+    // delete the second one using a filter that relies on metrics and partition data
+    Expression partPredicate = Expressions.equal(Expressions.bucket("data", 16), 0);
+    Expression rowPredicate = Expressions.greaterThan("id", 5);
+    Expression predicate = Expressions.and(partPredicate, rowPredicate);
+    table.newDelete().deleteFromRowFilter(predicate).toBranch(testBranch).commit();
+
+    Snapshot deleteSnapshot = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, deleteSnapshot.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        deleteSnapshot.allManifests(FILE_IO).get(0),
+        ids(initialSnapshot.snapshotId(), deleteSnapshot.snapshotId()),

Review Comment:
   Sorry not sure what you mean by "ahead" and "behind"? Do you mean if we should validate the state of main after the branch writes? 
   
   In that case, it makes sense but I don't think its particularly necessary because the existing tests should make sure we're committing snapshots to the correct branch in the first place meaning the existing tests would guarantee us that the operations we perform are done on the correct lineage, and existing lineages would be untouched. That being said, no harm in adding more validations! @jackye1995 @rdblue any thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r954273088


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -300,6 +300,70 @@ public void testDeleteCaseSensitivity() {
         statuses(Status.DELETED));
   }
 
+  @Test
+  public void testMultipleFileDeletesOnBranch() {
+    String testBranch = "testBranch";
+    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot main = readMetadata().currentSnapshot();
+
+    // Validate main history is as expected
+    Assert.assertEquals("Should have 1 manifest", 1, main.allManifests(FILE_IO).size());
+    validateSnapshot(null, main, FILE_A, FILE_B, FILE_C);
+
+    // Perform first delete on branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot afterDeletingA = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingA.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingA.allManifests(table.io()).get(0),
+        ids(afterDeletingA.snapshotId(), main.snapshotId(), main.snapshotId()),
+        files(FILE_A, FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING, Status.EXISTING));
+    table.newDelete().deleteFile(FILE_B).toBranch(testBranch).commit();
+
+    // Perform second delete
+    Snapshot afterDeletingB = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingB.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingB.allManifests(FILE_IO).get(0),
+        ids(afterDeletingB.snapshotId(), main.snapshotId()),
+        files(FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING));
+  }
+
+  @Test
+  public void testDeleteWithRowFilterWithCombinedPredicatesOnBranch() {
+    String testBranch = "testBranch";
+    // add both data files
+    table
+        .newFastAppend()
+        .appendFile(DATA_FILE_BUCKET_0_IDS_0_2)
+        .appendFile(DATA_FILE_BUCKET_0_IDS_8_10)
+        .commit();
+
+    Snapshot initialSnapshot = table.currentSnapshot();
+    Assert.assertEquals("Should have 1 manifest", 1, initialSnapshot.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        initialSnapshot.allManifests(FILE_IO).get(0),
+        ids(initialSnapshot.snapshotId(), initialSnapshot.snapshotId()),
+        files(DATA_FILE_BUCKET_0_IDS_0_2, DATA_FILE_BUCKET_0_IDS_8_10),
+        statuses(Status.ADDED, Status.ADDED));
+
+    // delete the second one using a filter that relies on metrics and partition data
+    Expression partPredicate = Expressions.equal(Expressions.bucket("data", 16), 0);
+    Expression rowPredicate = Expressions.greaterThan("id", 5);
+    Expression predicate = Expressions.and(partPredicate, rowPredicate);
+    table.newDelete().deleteFromRowFilter(predicate).toBranch(testBranch).commit();
+
+    Snapshot deleteSnapshot = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, deleteSnapshot.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        deleteSnapshot.allManifests(FILE_IO).get(0),
+        ids(initialSnapshot.snapshotId(), deleteSnapshot.snapshotId()),

Review Comment:
   I think these tests are specific enough that we don't need to validate the order of commits. Entries with `Status.DELETED` will only appear in a single manifest and are dropped if the manifest is rewritten. So validating DELETE entries can only succeed if the delete operation produced the manifest.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
jackye1995 merged PR #5618:
URL: https://github.com/apache/iceberg/pull/5618


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996530478


##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1455,6 +1455,56 @@ public void testMaxSnapshotAgeMultipleBranches() {
     Assert.assertNull(table.ops().current().snapshot(initialSnapshotId));
   }
 
+  @Test
+  public void testRetainFilesOnRetainedBranches() {
+    // Append a file to main and test branch
+    String testBranch = "test-branch";
+    table.newAppend().appendFile(FILE_A).commit();
+    Snapshot a = table.currentSnapshot();
+    table.manageSnapshots().createBranch(testBranch, a.snapshotId()).commit();
+
+    // Delete A from main
+    table.newDelete().deleteFile(FILE_A).commit();
+    Snapshot deletionA = table.currentSnapshot();
+    // Add B to main
+    table.newAppend().appendFile(FILE_B).commit();
+    long tAfterCommits = waitUntilAfter(table.currentSnapshot().timestampMillis());
+    Set<String> deletedFiles = Sets.newHashSet();
+    Set<String> expectedDeletes = Sets.newHashSet();
+
+    // Only deletionA's manifest list and manifests should be removed
+    expectedDeletes.add(deletionA.manifestListLocation());
+    expectedDeletes.addAll(
+        deletionA.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));
+    table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+
+    Assert.assertEquals(2, Iterables.size(table.snapshots()));
+    Assert.assertEquals(expectedDeletes, deletedFiles);
+
+    // Delete A on test branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot branchDelete = table.snapshot(testBranch);
+    table.newAppend().appendFile(FILE_C).toBranch(testBranch).commit();
+    Snapshot testBranchHead = table.snapshot(testBranch);
+
+    deletedFiles = Sets.newHashSet();
+    expectedDeletes = Sets.newHashSet();
+    table.expireSnapshots().expireOlderThan(testBranchHead.timestampMillis()).deleteWith(deletedFiles::add).commit();
+
+    //
+    expectedDeletes.add(a.manifestListLocation());
+    expectedDeletes.addAll(
+        a.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));

Review Comment:
   It looks like this `allManifests` logic may be worth putting in a private helper function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997573301


##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -1455,6 +1455,56 @@ public void testMaxSnapshotAgeMultipleBranches() {
     Assert.assertNull(table.ops().current().snapshot(initialSnapshotId));
   }
 
+  @Test
+  public void testRetainFilesOnRetainedBranches() {
+    // Append a file to main and test branch
+    String testBranch = "test-branch";
+    table.newAppend().appendFile(FILE_A).commit();
+    Snapshot a = table.currentSnapshot();
+    table.manageSnapshots().createBranch(testBranch, a.snapshotId()).commit();
+
+    // Delete A from main
+    table.newDelete().deleteFile(FILE_A).commit();
+    Snapshot deletionA = table.currentSnapshot();
+    // Add B to main
+    table.newAppend().appendFile(FILE_B).commit();
+    long tAfterCommits = waitUntilAfter(table.currentSnapshot().timestampMillis());
+    Set<String> deletedFiles = Sets.newHashSet();
+    Set<String> expectedDeletes = Sets.newHashSet();
+
+    // Only deletionA's manifest list and manifests should be removed
+    expectedDeletes.add(deletionA.manifestListLocation());
+    expectedDeletes.addAll(
+        deletionA.allManifests(table.io()).stream().map(manifest -> manifest.path()).collect(Collectors.toSet()));
+    table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+
+    Assert.assertEquals(2, Iterables.size(table.snapshots()));
+    Assert.assertEquals(expectedDeletes, deletedFiles);
+
+    // Delete A on test branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot branchDelete = table.snapshot(testBranch);
+    table.newAppend().appendFile(FILE_C).toBranch(testBranch).commit();
+    Snapshot testBranchHead = table.snapshot(testBranch);
+
+    deletedFiles = Sets.newHashSet();
+    expectedDeletes = Sets.newHashSet();
+    table.expireSnapshots().expireOlderThan(testBranchHead.timestampMillis()).deleteWith(deletedFiles::add).commit();
+
+    //

Review Comment:
   Oops, probably I intended to comment something and just forgot before pushing. removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#issuecomment-1229135999

   The tests for delete files depend on merge appends for some test cases, and vice versa. So to have meaningful branch write tests, I've added the implementation for merge append as well and updated all the test cases to be parameterized so existing cases work for main and branch cases. I still need to follow up on updated expiration tests and then the tests which can test evolution of different branches independently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r956781597


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
+  private final String branch;
+
   @Parameterized.Parameters(name = "formatVersion = {0}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},

Review Comment:
   That's correct, we test for every combination of format version + branch. I'll need update the @Parameters so that we can get nicer test names. The branch parameter gets injected as an argument in the test class constructor. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996527073


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -349,6 +349,35 @@ void validateSnapshot(Snapshot old, Snapshot snap, long sequenceNumber, DataFile
     validateSnapshot(old, snap, (Long) sequenceNumber, newFiles);
   }
 
+  Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {

Review Comment:
   Minor: we avoid unnecessary abbreviations, like `tbl` in place of `table`. It's often more awkward to type the shortened version and it makes the code harder to read. I think in this case both are true. `t` and `b` right after one another are hard to type, but I it's easy to type `tab`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997590059


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -349,6 +349,35 @@ void validateSnapshot(Snapshot old, Snapshot snap, long sequenceNumber, DataFile
     validateSnapshot(old, snap, (Long) sequenceNumber, newFiles);
   }
 
+  Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {
+    Snapshot snapshot = null;
+
+    if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
+      snapshotUpdate.commit();
+      snapshot = tbl.currentSnapshot();
+    } else {
+      ((SnapshotProducer) snapshotUpdate.toBranch(branch)).commit();

Review Comment:
   MergeAppend doesn't implement SnapshotProducer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997564163


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
-  @Parameterized.Parameters(name = "formatVersion = {0}")
+  private final String branch;
+
+  @Parameterized.Parameters(name = "formatVersion = {0}, branch = {1}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},
+      new Object[] {1, "testBranch"},
+      new Object[] {2, "main"},
+      new Object[] {2, "testBranch"}
+    };
   }
 
-  public TestDeleteFiles(int formatVersion) {
+  public TestDeleteFiles(int formatVersion, String branch) {
     super(formatVersion);
+    this.branch = branch;
   }
 
   @Test
   public void testMultipleDeletes() {
-    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot append =

Review Comment:
   Sure, I left it as is for now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r956781597


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
+  private final String branch;
+
   @Parameterized.Parameters(name = "formatVersion = {0}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},

Review Comment:
   That's correct, we test for every combination of format version + branch. I'll need update the @Parameters so that we can get nicer test names. The second parameter gets injected as an argument in the test file constructor. 



##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
+  private final String branch;
+
   @Parameterized.Parameters(name = "formatVersion = {0}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},

Review Comment:
   That's correct, we test for every combination of format version + branch. I'll need update the @Parameters so that we can get nicer test names. The second parameter gets injected as an argument in the test class constructor. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5618: Support performing delete files on branches

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r952929574


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -300,6 +300,70 @@ public void testDeleteCaseSensitivity() {
         statuses(Status.DELETED));
   }
 
+  @Test
+  public void testMultipleFileDeletesOnBranch() {
+    String testBranch = "testBranch";
+    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot main = readMetadata().currentSnapshot();
+
+    // Validate main history is as expected
+    Assert.assertEquals("Should have 1 manifest", 1, main.allManifests(FILE_IO).size());
+    validateSnapshot(null, main, FILE_A, FILE_B, FILE_C);
+
+    // Perform first delete on branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot afterDeletingA = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingA.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingA.allManifests(table.io()).get(0),
+        ids(afterDeletingA.snapshotId(), main.snapshotId(), main.snapshotId()),
+        files(FILE_A, FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING, Status.EXISTING));
+    table.newDelete().deleteFile(FILE_B).toBranch(testBranch).commit();
+
+    // Perform second delete
+    Snapshot afterDeletingB = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingB.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingB.allManifests(FILE_IO).get(0),
+        ids(afterDeletingB.snapshotId(), main.snapshotId()),
+        files(FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING));
+  }
+
+  @Test
+  public void testDeleteWithRowFilterWithCombinedPredicatesOnBranch() {
+    String testBranch = "testBranch";
+    // add both data files
+    table
+        .newFastAppend()
+        .appendFile(DATA_FILE_BUCKET_0_IDS_0_2)
+        .appendFile(DATA_FILE_BUCKET_0_IDS_8_10)
+        .commit();
+
+    Snapshot initialSnapshot = table.currentSnapshot();
+    Assert.assertEquals("Should have 1 manifest", 1, initialSnapshot.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        initialSnapshot.allManifests(FILE_IO).get(0),
+        ids(initialSnapshot.snapshotId(), initialSnapshot.snapshotId()),
+        files(DATA_FILE_BUCKET_0_IDS_0_2, DATA_FILE_BUCKET_0_IDS_8_10),
+        statuses(Status.ADDED, Status.ADDED));
+
+    // delete the second one using a filter that relies on metrics and partition data
+    Expression partPredicate = Expressions.equal(Expressions.bucket("data", 16), 0);
+    Expression rowPredicate = Expressions.greaterThan("id", 5);
+    Expression predicate = Expressions.and(partPredicate, rowPredicate);
+    table.newDelete().deleteFromRowFilter(predicate).toBranch(testBranch).commit();
+
+    Snapshot deleteSnapshot = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, deleteSnapshot.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        deleteSnapshot.allManifests(FILE_IO).get(0),
+        ids(initialSnapshot.snapshotId(), deleteSnapshot.snapshotId()),

Review Comment:
   Yes agree with what Amogh says, but we can always add a test to verify the behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996529350


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -300,6 +322,34 @@ public void testDeleteCaseSensitivity() {
         statuses(Status.DELETED));
   }
 
+  @Test
+  public void testDeleteFilesOnIndependentBranches() {
+    String testBranch = "testBranch";
+    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot initialSnapshot = table.currentSnapshot();
+    // Delete A on test branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot testBranchTip = table.snapshot(testBranch);
+
+    // Delete B and C on main
+    table.newDelete().deleteFile(FILE_B).deleteFile(FILE_C).commit();
+    Snapshot delete2 = table.currentSnapshot();
+
+    // Verify B and C on testBranch
+    validateManifestEntries(
+        testBranchTip.allManifests(FILE_IO).get(0),

Review Comment:
   It's much better to use something like `Iterables.getOnlyElement` rather than `.get(0)` to validate your assumptions about there being only one manifest.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996529425


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -300,6 +322,34 @@ public void testDeleteCaseSensitivity() {
         statuses(Status.DELETED));
   }
 
+  @Test
+  public void testDeleteFilesOnIndependentBranches() {
+    String testBranch = "testBranch";
+    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot initialSnapshot = table.currentSnapshot();
+    // Delete A on test branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot testBranchTip = table.snapshot(testBranch);

Review Comment:
   `delete1`? It's weird to have `delete2`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996527381


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -349,6 +349,35 @@ void validateSnapshot(Snapshot old, Snapshot snap, long sequenceNumber, DataFile
     validateSnapshot(old, snap, (Long) sequenceNumber, newFiles);
   }
 
+  Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {
+    Snapshot snapshot = null;
+
+    if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
+      snapshotUpdate.commit();
+      snapshot = tbl.currentSnapshot();
+    } else {
+      ((SnapshotProducer) snapshotUpdate.toBranch(branch)).commit();

Review Comment:
   Why not just pass a `SnapshotProducer` to this method? That would eliminate casting right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997590378


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -349,6 +349,35 @@ void validateSnapshot(Snapshot old, Snapshot snap, long sequenceNumber, DataFile
     validateSnapshot(old, snap, (Long) sequenceNumber, newFiles);
   }
 
+  Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {

Review Comment:
   I forgot why I did tbl, it was because table was already used and was failing the hidden field checkstyle. I updated to ignore the checkstyle warning in favor of more readable variable names.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997576984


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -349,6 +349,35 @@ void validateSnapshot(Snapshot old, Snapshot snap, long sequenceNumber, DataFile
     validateSnapshot(old, snap, (Long) sequenceNumber, newFiles);
   }
 
+  Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {
+    Snapshot snapshot = null;
+
+    if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
+      snapshotUpdate.commit();
+      snapshot = tbl.currentSnapshot();
+    } else {
+      ((SnapshotProducer) snapshotUpdate.toBranch(branch)).commit();
+      snapshot = tbl.snapshot(branch);
+    }
+
+    return snapshot;
+  }
+
+  Snapshot apply(SnapshotUpdate snapshotUpdate, String branch) {
+    if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
+      return ((SnapshotProducer) snapshotUpdate).apply();
+    } else {
+      return ((SnapshotProducer) snapshotUpdate.toBranch(branch)).apply();
+    }
+  }
+
+  Snapshot currentSnapshot(Table tbl, String branch) {

Review Comment:
   updated to `latest`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r997590059


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -349,6 +349,35 @@ void validateSnapshot(Snapshot old, Snapshot snap, long sequenceNumber, DataFile
     validateSnapshot(old, snap, (Long) sequenceNumber, newFiles);
   }
 
+  Snapshot commit(Table tbl, SnapshotUpdate snapshotUpdate, String branch) {
+    Snapshot snapshot = null;
+
+    if (branch.equals(SnapshotRef.MAIN_BRANCH)) {
+      snapshotUpdate.commit();
+      snapshot = tbl.currentSnapshot();
+    } else {
+      ((SnapshotProducer) snapshotUpdate.toBranch(branch)).commit();

Review Comment:
   AppendFiles doesn't implement SnapshotProducer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996528646


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
-  @Parameterized.Parameters(name = "formatVersion = {0}")
+  private final String branch;
+
+  @Parameterized.Parameters(name = "formatVersion = {0}, branch = {1}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},
+      new Object[] {1, "testBranch"},
+      new Object[] {2, "main"},
+      new Object[] {2, "testBranch"}
+    };
   }
 
-  public TestDeleteFiles(int formatVersion) {
+  public TestDeleteFiles(int formatVersion, String branch) {
     super(formatVersion);
+    this.branch = branch;
   }
 
   @Test
   public void testMultipleDeletes() {
-    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot append =
+        commit(
+            table,
+            table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C),
+            branch);
 
     Assert.assertEquals("Metadata should be at version 1", 1L, (long) version());
-    Snapshot append = readMetadata().currentSnapshot();

Review Comment:
   This is actually different validation behavior because the `readMetadata()` method doesn't go through the table. It goes directly to the backing store to get the metadata. I'm not sure that this is a problem, but I'd probably lean toward keeping the existing behavior. We can just make a version of `latestSnapshot` that accepts `TableMetadata` as returned by `readMetadata()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r996528317


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
-  @Parameterized.Parameters(name = "formatVersion = {0}")
+  private final String branch;
+
+  @Parameterized.Parameters(name = "formatVersion = {0}, branch = {1}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},
+      new Object[] {1, "testBranch"},
+      new Object[] {2, "main"},
+      new Object[] {2, "testBranch"}
+    };
   }
 
-  public TestDeleteFiles(int formatVersion) {
+  public TestDeleteFiles(int formatVersion, String branch) {
     super(formatVersion);
+    this.branch = branch;
   }
 
   @Test
   public void testMultipleDeletes() {
-    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot append =

Review Comment:
   Style (and minor): I'd prefer to prepare the change and then pass it in to avoid this formatting. I think it would be cleaner and more readable like this:
   
   ```java
   SnapshotProducer action = table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C);
   commit(table, action, branch);
   
   Snapshot append = latestSnapshot(readMetadata(), branch);
   ```
   
   I also prefer to call `latestSnapshot` separately to get the `Snapshot`. We don't need to put too much on `commit`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r956546724


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -300,6 +300,70 @@ public void testDeleteCaseSensitivity() {
         statuses(Status.DELETED));
   }
 
+  @Test
+  public void testMultipleFileDeletesOnBranch() {
+    String testBranch = "testBranch";
+    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot main = readMetadata().currentSnapshot();
+
+    // Validate main history is as expected
+    Assert.assertEquals("Should have 1 manifest", 1, main.allManifests(FILE_IO).size());
+    validateSnapshot(null, main, FILE_A, FILE_B, FILE_C);
+
+    // Perform first delete on branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot afterDeletingA = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingA.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingA.allManifests(table.io()).get(0),
+        ids(afterDeletingA.snapshotId(), main.snapshotId(), main.snapshotId()),
+        files(FILE_A, FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING, Status.EXISTING));
+    table.newDelete().deleteFile(FILE_B).toBranch(testBranch).commit();
+
+    // Perform second delete
+    Snapshot afterDeletingB = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingB.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingB.allManifests(FILE_IO).get(0),
+        ids(afterDeletingB.snapshotId(), main.snapshotId()),
+        files(FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING));
+  }
+
+  @Test
+  public void testDeleteWithRowFilterWithCombinedPredicatesOnBranch() {

Review Comment:
   I've parameterized the tests, it does seem like for meaningful tests to be written here we also need to have an implementation for MergeAppend. So I added that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] namrathamyske commented on a diff in pull request #5618: Support performing merge appends and delete files on branches

Posted by GitBox <gi...@apache.org>.
namrathamyske commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r956761421


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -68,38 +68,46 @@ public class TestDeleteFiles extends TableTestBase {
                   ))
           .build();
 
+  private final String branch;
+
   @Parameterized.Parameters(name = "formatVersion = {0}")
   public static Object[] parameters() {
-    return new Object[] {1, 2};
+    return new Object[][] {
+      new Object[] {1, "main"},

Review Comment:
   I am little confused why formatVersion parameters were removed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5618: Support performing delete files on branches

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5618:
URL: https://github.com/apache/iceberg/pull/5618#discussion_r952219398


##########
core/src/test/java/org/apache/iceberg/TestDeleteFiles.java:
##########
@@ -300,6 +300,70 @@ public void testDeleteCaseSensitivity() {
         statuses(Status.DELETED));
   }
 
+  @Test
+  public void testMultipleFileDeletesOnBranch() {
+    String testBranch = "testBranch";
+    table.newAppend().appendFile(FILE_A).appendFile(FILE_B).appendFile(FILE_C).commit();
+    Snapshot main = readMetadata().currentSnapshot();
+
+    // Validate main history is as expected
+    Assert.assertEquals("Should have 1 manifest", 1, main.allManifests(FILE_IO).size());
+    validateSnapshot(null, main, FILE_A, FILE_B, FILE_C);
+
+    // Perform first delete on branch
+    table.newDelete().deleteFile(FILE_A).toBranch(testBranch).commit();
+    Snapshot afterDeletingA = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingA.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingA.allManifests(table.io()).get(0),
+        ids(afterDeletingA.snapshotId(), main.snapshotId(), main.snapshotId()),
+        files(FILE_A, FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING, Status.EXISTING));
+    table.newDelete().deleteFile(FILE_B).toBranch(testBranch).commit();
+
+    // Perform second delete
+    Snapshot afterDeletingB = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, afterDeletingB.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        afterDeletingB.allManifests(FILE_IO).get(0),
+        ids(afterDeletingB.snapshotId(), main.snapshotId()),
+        files(FILE_B, FILE_C),
+        statuses(Status.DELETED, Status.EXISTING));
+  }
+
+  @Test
+  public void testDeleteWithRowFilterWithCombinedPredicatesOnBranch() {
+    String testBranch = "testBranch";
+    // add both data files
+    table
+        .newFastAppend()
+        .appendFile(DATA_FILE_BUCKET_0_IDS_0_2)
+        .appendFile(DATA_FILE_BUCKET_0_IDS_8_10)
+        .commit();
+
+    Snapshot initialSnapshot = table.currentSnapshot();
+    Assert.assertEquals("Should have 1 manifest", 1, initialSnapshot.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        initialSnapshot.allManifests(FILE_IO).get(0),
+        ids(initialSnapshot.snapshotId(), initialSnapshot.snapshotId()),
+        files(DATA_FILE_BUCKET_0_IDS_0_2, DATA_FILE_BUCKET_0_IDS_8_10),
+        statuses(Status.ADDED, Status.ADDED));
+
+    // delete the second one using a filter that relies on metrics and partition data
+    Expression partPredicate = Expressions.equal(Expressions.bucket("data", 16), 0);
+    Expression rowPredicate = Expressions.greaterThan("id", 5);
+    Expression predicate = Expressions.and(partPredicate, rowPredicate);
+    table.newDelete().deleteFromRowFilter(predicate).toBranch(testBranch).commit();
+
+    Snapshot deleteSnapshot = table.snapshot(table.refs().get(testBranch).snapshotId());
+    Assert.assertEquals("Should have 1 manifest", 1, deleteSnapshot.allManifests(FILE_IO).size());
+    validateManifestEntries(
+        deleteSnapshot.allManifests(FILE_IO).get(0),
+        ids(initialSnapshot.snapshotId(), deleteSnapshot.snapshotId()),

Review Comment:
   Sorry not sure what you mean by "ahead" and "behind"? Do you mean if we should validate the state of main after the branch writes? 
   
   In that case, it makes sense but I don't think its particularly necessary because the existing tests in SnapshotProducer should make sure we're committing snapshots to the correct branch in the first place meaning the existing tests would guarantee us that the operations we perform are done on the correct lineage, and existing lineages would be untouched. That being said, no harm in adding more validations! @jackye1995 @rdblue any thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org