You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/24 11:12:27 UTC

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5984: Core, API: Support incremental scanning with branch

hililiwei commented on code in PR #5984:
URL: https://github.com/apache/iceberg/pull/5984#discussion_r1056798443


##########
api/src/main/java/org/apache/iceberg/IncrementalScan.java:
##########
@@ -21,6 +21,23 @@
 /** API for configuring an incremental scan. */
 public interface IncrementalScan<ThisT, T extends ScanTask, G extends ScanTaskGroup<T>>
     extends Scan<ThisT, T, G> {
+
+  /**
+   * Instructs this scan to look for changes starting from a particular snapshot (inclusive).
+   *
+   * <p>If the start snapshot is not configured, it is defaulted to the oldest ancestor of the end
+   * snapshot (inclusive).
+   *
+   * @param fromSnapshotId the start snapshot ID (inclusive)
+   * @param referenceName the ref used
+   * @return this for method chaining
+   * @throws IllegalArgumentException if the start snapshot is not an ancestor of the end snapshot
+   */
+  default ThisT fromSnapshotInclusive(long fromSnapshotId, String referenceName) {

Review Comment:
   Agree with @stevenzwu.
   Yes, tag is a fixed point in time, but when using it for incremental read, we can think of it semantically the same as using `fromSnapshot(Long snapshotId)`.
   Just like the @stevenzwu's example, I have  daily tags(`20220101` `20220102`), If I want to read the incremental data from `20220102` to the current., so I can use `fromSnapshotExclusive("20220102")`:
   ```
   table.newIncrementalScan()
         .fromSnapshotExclusive("20220102")
         .planTasks()
   ```
   Another way is to use snapshot time to find the snapshot id first, but sometimes that doesn't work. For example, we can generate tags based on the event time of the data, or we tag the snapshot only after the application has completed. The application may finish at 3:00 2022/01/02, and tag the newly generated snapshot as `20220102`. If we use snapshot time `2022-01-02 00:00:00` to find the snapshot ID, incorrect incremental data will be return.
   
   cc @rdblue 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org