You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "ChristinaTech (via GitHub)" <gi...@apache.org> on 2023/05/27 03:09:41 UTC

[GitHub] [iceberg] ChristinaTech opened a new pull request, #7718: Fixed NPE in SnapshotUtil.ancestorsOf

ChristinaTech opened a new pull request, #7718:
URL: https://github.com/apache/iceberg/pull/7718

   Calling `hasNext()` on the any Iterator returned by the Iterable from `SnapshotUtil.ancestorsOf` throws a NullPointerException if it is called again after the prior call returned false. This only happens if the ancestor chain is broken by an expired snapshot, not if it reaches a snapshot with a null parentId.
   
   This is because when `next.parentId()` returns null the iterator will return false without changing its state, but when the lookup function returns null, as it would when the snapshot's parent is expired, it sets the `next` variable to null as well. The first time this doesn't cause an issue as the next line null-checks it, but if `hasNext()` is called again it will attempt to call `next.parentId()` on the now-null next variable, throwing an NPE.
   
   I considered fixing this by storing the result of `lookup.apply` in an intermediate variable and null-checking it before assigning it to `next`, but if for any reason the lookup function wasn't deterministic this could cause `hasNext()` to return true after it has already returned false, which would be a separate violation of the Iterator API contract from the one this PR aims to fix. Alternatively, if lookup were expensive it would just be wasted effort to call it again.
   
   I added a new test case that replicated the bug in order to validate my change fixed it, and in the process did some refactoring in the unit test to reduce duplicate code and (at least attempted to) give the snapshot id variables more intuitive names. Also added additional assertions to `isParentAncestorOf` and `isAncestorOf` tests to validate they work as intended with expired snapshots as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7718: Core: Fixed NPE in SnapshotUtil.ancestorsOf

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7718:
URL: https://github.com/apache/iceberg/pull/7718#discussion_r1212755554


##########
core/src/test/java/org/apache/iceberg/util/TestSnapshotUtil.java:
##########
@@ -60,12 +62,23 @@ public class TestSnapshotUtil {
           .withRecordCount(1)
           .build();
 
-  private long snapshotAId;
+  private long snapshotBaseTimestamp;
+  private long snapshotBaseId;
+  private long snapshotBranchId;
+  private long snapshotMain1Id;
+  private long snapshotMain2Id;
+  private long snapshotFork0Id;
+  private long snapshotFork1Id;
+  private long snapshotFork2Id;
+
+  private Snapshot putSnapshot(String branch) {

Review Comment:
   `putSnapshot` is a bit misleading, what about `appendFileTo(String branch)`?



##########
core/src/test/java/org/apache/iceberg/util/TestSnapshotUtil.java:
##########
@@ -60,12 +62,23 @@ public class TestSnapshotUtil {
           .withRecordCount(1)
           .build();
 
-  private long snapshotAId;
+  private long snapshotBaseTimestamp;
+  private long snapshotBaseId;
+  private long snapshotBranchId;
+  private long snapshotMain1Id;
+  private long snapshotMain2Id;
+  private long snapshotFork0Id;
+  private long snapshotFork1Id;
+  private long snapshotFork2Id;
+
+  private Snapshot putSnapshot(String branch) {
+    table.newFastAppend().appendFile(FILE_A).toBranch(branch).commit();
+    return table.snapshot(branch);
+  }
 
-  private long snapshotATimestamp;
-  private long snapshotBId;
-  private long snapshotCId;
-  private long snapshotDId;
+  private Snapshot putSnapshot() {

Review Comment:
   `appendFileToMain()` maybe?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra merged pull request #7718: Core: Fixed NPE in SnapshotUtil.ancestorsOf

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra merged PR #7718:
URL: https://github.com/apache/iceberg/pull/7718


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ChristinaTech commented on a diff in pull request #7718: Core: Fixed NPE in SnapshotUtil.ancestorsOf

Posted by "ChristinaTech (via GitHub)" <gi...@apache.org>.
ChristinaTech commented on code in PR #7718:
URL: https://github.com/apache/iceberg/pull/7718#discussion_r1227130093


##########
core/src/test/java/org/apache/iceberg/util/TestSnapshotUtil.java:
##########
@@ -101,54 +116,72 @@ public void cleanupTables() {
 
   @Test
   public void isParentAncestorOf() {
-    Assert.assertTrue(SnapshotUtil.isParentAncestorOf(table, snapshotBId, snapshotAId));
-    Assert.assertFalse(SnapshotUtil.isParentAncestorOf(table, snapshotCId, snapshotBId));
+    Assert.assertTrue(SnapshotUtil.isParentAncestorOf(table, snapshotMain1Id, snapshotBaseId));
+    Assert.assertFalse(SnapshotUtil.isParentAncestorOf(table, snapshotBranchId, snapshotMain1Id));
+    Assert.assertTrue(SnapshotUtil.isParentAncestorOf(table, snapshotFork2Id, snapshotFork0Id));
   }
 
   @Test
   public void isAncestorOf() {
-    Assert.assertTrue(SnapshotUtil.isAncestorOf(table, snapshotBId, snapshotAId));
-    Assert.assertFalse(SnapshotUtil.isAncestorOf(table, snapshotCId, snapshotBId));
+    Assert.assertTrue(SnapshotUtil.isAncestorOf(table, snapshotMain1Id, snapshotBaseId));
+    Assert.assertFalse(SnapshotUtil.isAncestorOf(table, snapshotBranchId, snapshotMain1Id));
+    Assert.assertFalse(SnapshotUtil.isAncestorOf(table, snapshotFork2Id, snapshotFork0Id));
 
-    Assert.assertTrue(SnapshotUtil.isAncestorOf(table, snapshotBId));
-    Assert.assertFalse(SnapshotUtil.isAncestorOf(table, snapshotCId));
+    Assert.assertTrue(SnapshotUtil.isAncestorOf(table, snapshotMain1Id));
+    Assert.assertFalse(SnapshotUtil.isAncestorOf(table, snapshotBranchId));
   }
 
   @Test
   public void currentAncestors() {
     Iterable<Snapshot> snapshots = SnapshotUtil.currentAncestors(table);
-    expectedSnapshots(new long[] {snapshotDId, snapshotBId, snapshotAId}, snapshots);
+    expectedSnapshots(new long[] {snapshotMain2Id, snapshotMain1Id, snapshotBaseId}, snapshots);
 
     List<Long> snapshotList = SnapshotUtil.currentAncestorIds(table);
     Assert.assertArrayEquals(
-        new Long[] {snapshotDId, snapshotBId, snapshotAId}, snapshotList.toArray(new Long[0]));
+        new Long[] {snapshotMain2Id, snapshotMain1Id, snapshotBaseId},
+        snapshotList.toArray(new Long[0]));
   }
 
   @Test
   public void oldestAncestor() {
     Snapshot snapshot = SnapshotUtil.oldestAncestor(table);
-    Assert.assertEquals(snapshotAId, snapshot.snapshotId());
+    Assert.assertEquals(snapshotBaseId, snapshot.snapshotId());
 
-    snapshot = SnapshotUtil.oldestAncestorOf(table, snapshotDId);
-    Assert.assertEquals(snapshotAId, snapshot.snapshotId());
+    snapshot = SnapshotUtil.oldestAncestorOf(table, snapshotMain2Id);
+    Assert.assertEquals(snapshotBaseId, snapshot.snapshotId());
 
-    snapshot = SnapshotUtil.oldestAncestorAfter(table, snapshotATimestamp + 1);
-    Assert.assertEquals(snapshotBId, snapshot.snapshotId());
+    snapshot = SnapshotUtil.oldestAncestorAfter(table, snapshotBaseTimestamp + 1);
+    Assert.assertEquals(snapshotMain1Id, snapshot.snapshotId());
   }
 
   @Test
   public void snapshotsBetween() {
     List<Long> snapshotIdsBetween =
-        SnapshotUtil.snapshotIdsBetween(table, snapshotAId, snapshotDId);
+        SnapshotUtil.snapshotIdsBetween(table, snapshotBaseId, snapshotMain2Id);
     Assert.assertArrayEquals(
-        new Long[] {snapshotDId, snapshotBId}, snapshotIdsBetween.toArray(new Long[0]));
+        new Long[] {snapshotMain2Id, snapshotMain1Id}, snapshotIdsBetween.toArray(new Long[0]));
 
     Iterable<Snapshot> ancestorsBetween =
-        SnapshotUtil.ancestorsBetween(table, snapshotDId, snapshotBId);
-    expectedSnapshots(new long[] {snapshotDId}, ancestorsBetween);
+        SnapshotUtil.ancestorsBetween(table, snapshotMain2Id, snapshotMain1Id);
+    expectedSnapshots(new long[] {snapshotMain2Id}, ancestorsBetween);
+
+    ancestorsBetween = SnapshotUtil.ancestorsBetween(table, snapshotMain2Id, snapshotBranchId);
+    expectedSnapshots(
+        new long[] {snapshotMain2Id, snapshotMain1Id, snapshotBaseId}, ancestorsBetween);
+  }
+
+  @Test
+  public void ancestorsOf() {
+    Iterable<Snapshot> snapshots = SnapshotUtil.ancestorsOf(snapshotFork2Id, table::snapshot);
+    expectedSnapshots(new long[] {snapshotFork2Id, snapshotFork1Id}, snapshots);
+
+    Iterator<Snapshot> snapshotIter = snapshots.iterator();
+    while (snapshotIter.hasNext()) {
+      snapshotIter.next();
+    }
 
-    ancestorsBetween = SnapshotUtil.ancestorsBetween(table, snapshotDId, snapshotCId);
-    expectedSnapshots(new long[] {snapshotDId, snapshotBId, snapshotAId}, ancestorsBetween);
+    // Once snapshot iterator has been exhausted, call hasNext again to make sure it is stable.
+    Assert.assertFalse(snapshotIter.hasNext());

Review Comment:
   Fixed in follow-up commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ChristinaTech commented on a diff in pull request #7718: Core: Fixed NPE in SnapshotUtil.ancestorsOf

Posted by "ChristinaTech (via GitHub)" <gi...@apache.org>.
ChristinaTech commented on code in PR #7718:
URL: https://github.com/apache/iceberg/pull/7718#discussion_r1224689798


##########
core/src/test/java/org/apache/iceberg/util/TestSnapshotUtil.java:
##########
@@ -60,12 +62,23 @@ public class TestSnapshotUtil {
           .withRecordCount(1)
           .build();
 
-  private long snapshotAId;
+  private long snapshotBaseTimestamp;
+  private long snapshotBaseId;
+  private long snapshotBranchId;
+  private long snapshotMain1Id;
+  private long snapshotMain2Id;
+  private long snapshotFork0Id;
+  private long snapshotFork1Id;
+  private long snapshotFork2Id;
+
+  private Snapshot putSnapshot(String branch) {

Review Comment:
   Will change it to this.



##########
core/src/test/java/org/apache/iceberg/util/TestSnapshotUtil.java:
##########
@@ -60,12 +62,23 @@ public class TestSnapshotUtil {
           .withRecordCount(1)
           .build();
 
-  private long snapshotAId;
+  private long snapshotBaseTimestamp;
+  private long snapshotBaseId;
+  private long snapshotBranchId;
+  private long snapshotMain1Id;
+  private long snapshotMain2Id;
+  private long snapshotFork0Id;
+  private long snapshotFork1Id;
+  private long snapshotFork2Id;
+
+  private Snapshot putSnapshot(String branch) {
+    table.newFastAppend().appendFile(FILE_A).toBranch(branch).commit();
+    return table.snapshot(branch);
+  }
 
-  private long snapshotATimestamp;
-  private long snapshotBId;
-  private long snapshotCId;
-  private long snapshotDId;
+  private Snapshot putSnapshot() {

Review Comment:
   Will change it to this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7718: Core: Fixed NPE in SnapshotUtil.ancestorsOf

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7718:
URL: https://github.com/apache/iceberg/pull/7718#discussion_r1226431900


##########
core/src/test/java/org/apache/iceberg/util/TestSnapshotUtil.java:
##########
@@ -101,54 +116,72 @@ public void cleanupTables() {
 
   @Test
   public void isParentAncestorOf() {
-    Assert.assertTrue(SnapshotUtil.isParentAncestorOf(table, snapshotBId, snapshotAId));
-    Assert.assertFalse(SnapshotUtil.isParentAncestorOf(table, snapshotCId, snapshotBId));
+    Assert.assertTrue(SnapshotUtil.isParentAncestorOf(table, snapshotMain1Id, snapshotBaseId));
+    Assert.assertFalse(SnapshotUtil.isParentAncestorOf(table, snapshotBranchId, snapshotMain1Id));
+    Assert.assertTrue(SnapshotUtil.isParentAncestorOf(table, snapshotFork2Id, snapshotFork0Id));
   }
 
   @Test
   public void isAncestorOf() {
-    Assert.assertTrue(SnapshotUtil.isAncestorOf(table, snapshotBId, snapshotAId));
-    Assert.assertFalse(SnapshotUtil.isAncestorOf(table, snapshotCId, snapshotBId));
+    Assert.assertTrue(SnapshotUtil.isAncestorOf(table, snapshotMain1Id, snapshotBaseId));
+    Assert.assertFalse(SnapshotUtil.isAncestorOf(table, snapshotBranchId, snapshotMain1Id));
+    Assert.assertFalse(SnapshotUtil.isAncestorOf(table, snapshotFork2Id, snapshotFork0Id));
 
-    Assert.assertTrue(SnapshotUtil.isAncestorOf(table, snapshotBId));
-    Assert.assertFalse(SnapshotUtil.isAncestorOf(table, snapshotCId));
+    Assert.assertTrue(SnapshotUtil.isAncestorOf(table, snapshotMain1Id));
+    Assert.assertFalse(SnapshotUtil.isAncestorOf(table, snapshotBranchId));
   }
 
   @Test
   public void currentAncestors() {
     Iterable<Snapshot> snapshots = SnapshotUtil.currentAncestors(table);
-    expectedSnapshots(new long[] {snapshotDId, snapshotBId, snapshotAId}, snapshots);
+    expectedSnapshots(new long[] {snapshotMain2Id, snapshotMain1Id, snapshotBaseId}, snapshots);
 
     List<Long> snapshotList = SnapshotUtil.currentAncestorIds(table);
     Assert.assertArrayEquals(
-        new Long[] {snapshotDId, snapshotBId, snapshotAId}, snapshotList.toArray(new Long[0]));
+        new Long[] {snapshotMain2Id, snapshotMain1Id, snapshotBaseId},
+        snapshotList.toArray(new Long[0]));
   }
 
   @Test
   public void oldestAncestor() {
     Snapshot snapshot = SnapshotUtil.oldestAncestor(table);
-    Assert.assertEquals(snapshotAId, snapshot.snapshotId());
+    Assert.assertEquals(snapshotBaseId, snapshot.snapshotId());
 
-    snapshot = SnapshotUtil.oldestAncestorOf(table, snapshotDId);
-    Assert.assertEquals(snapshotAId, snapshot.snapshotId());
+    snapshot = SnapshotUtil.oldestAncestorOf(table, snapshotMain2Id);
+    Assert.assertEquals(snapshotBaseId, snapshot.snapshotId());
 
-    snapshot = SnapshotUtil.oldestAncestorAfter(table, snapshotATimestamp + 1);
-    Assert.assertEquals(snapshotBId, snapshot.snapshotId());
+    snapshot = SnapshotUtil.oldestAncestorAfter(table, snapshotBaseTimestamp + 1);
+    Assert.assertEquals(snapshotMain1Id, snapshot.snapshotId());
   }
 
   @Test
   public void snapshotsBetween() {
     List<Long> snapshotIdsBetween =
-        SnapshotUtil.snapshotIdsBetween(table, snapshotAId, snapshotDId);
+        SnapshotUtil.snapshotIdsBetween(table, snapshotBaseId, snapshotMain2Id);
     Assert.assertArrayEquals(
-        new Long[] {snapshotDId, snapshotBId}, snapshotIdsBetween.toArray(new Long[0]));
+        new Long[] {snapshotMain2Id, snapshotMain1Id}, snapshotIdsBetween.toArray(new Long[0]));
 
     Iterable<Snapshot> ancestorsBetween =
-        SnapshotUtil.ancestorsBetween(table, snapshotDId, snapshotBId);
-    expectedSnapshots(new long[] {snapshotDId}, ancestorsBetween);
+        SnapshotUtil.ancestorsBetween(table, snapshotMain2Id, snapshotMain1Id);
+    expectedSnapshots(new long[] {snapshotMain2Id}, ancestorsBetween);
+
+    ancestorsBetween = SnapshotUtil.ancestorsBetween(table, snapshotMain2Id, snapshotBranchId);
+    expectedSnapshots(
+        new long[] {snapshotMain2Id, snapshotMain1Id, snapshotBaseId}, ancestorsBetween);
+  }
+
+  @Test
+  public void ancestorsOf() {
+    Iterable<Snapshot> snapshots = SnapshotUtil.ancestorsOf(snapshotFork2Id, table::snapshot);
+    expectedSnapshots(new long[] {snapshotFork2Id, snapshotFork1Id}, snapshots);
+
+    Iterator<Snapshot> snapshotIter = snapshots.iterator();
+    while (snapshotIter.hasNext()) {
+      snapshotIter.next();
+    }
 
-    ancestorsBetween = SnapshotUtil.ancestorsBetween(table, snapshotDId, snapshotCId);
-    expectedSnapshots(new long[] {snapshotDId, snapshotBId, snapshotAId}, ancestorsBetween);
+    // Once snapshot iterator has been exhausted, call hasNext again to make sure it is stable.
+    Assert.assertFalse(snapshotIter.hasNext());

Review Comment:
   nit: `Assertions.assertThat(snapshotIter).isExhausted()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org