You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/19 23:04:42 UTC

[GitHub] [iceberg] rdblue commented on a change in pull request #3039: Introduce spark3 option to read stream from a timestamp

rdblue commented on a change in pull request #3039:
URL: https://github.com/apache/iceberg/pull/3039#discussion_r711817298



##########
File path: core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java
##########
@@ -65,12 +65,18 @@ public static boolean ancestorOf(Table table, long snapshotId, long ancestorSnap
   }
 
   /**
-   * Traverses the history of the table's current snapshot and finds the oldest Snapshot.
-   * @return null if there is no current snapshot in the table, else the oldest Snapshot.
+   * Traverses the history of the table's current snapshot
+   * and finds the oldest Snapshot after or equal to the timestamp in milliseconds.
+   * @return null if there is no current snapshot in the table,
+   * else the oldest Snapshot after or equal to the timestamp in milliseconds.
    */
-  public static Snapshot oldestSnapshot(Table table) {
+  public static Snapshot oldestSnapshot(Table table, long timestampMillis) {

Review comment:
       I don't think the original purpose of this method and the timestamp lookup should be mixed together. This is no longer the "oldest snapshot" and is now the oldest snapshot newer than some timestamp. I think this should be a new method called `oldestAncestorAfter`.
   
   Also, we need to consider boundary conditions. When resuming from a snapshot, if the previous snapshot is not found in the table's known snapshots, then we can't resume because there may have been unknown changes. This is similar. I think we should only return a snapshot from the `oldestAncestorAfter` method if its parent is still known and older than the timestamp. That's the only way we can ensure that there wasn't a snapshot after the timestamp that was aged off.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org