You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "chenjunjiedada (via GitHub)" <gi...@apache.org> on 2023/04/13 12:11:38 UTC

[GitHub] [iceberg] chenjunjiedada opened a new pull request, #7338: Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode

chenjunjiedada opened a new pull request, #7338:
URL: https://github.com/apache/iceberg/pull/7338

   When consuming a table in `TABLE_SCAN_THEN_INCREMENTAL` mode and its snapshot history has expired, data can be lost. This is because `checkScanMode` returns incremental mode when the scan context is streaming. To address this issue, we have added a case to handle the `TABLE_SCAN_THEN_INCREMENTAL` mode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on pull request #7338: Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on PR #7338:
URL: https://github.com/apache/iceberg/pull/7338#issuecomment-1513628826

   @chenjunjiedada can you create a backport PR too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #7338: Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode

Posted by "chenjunjiedada (via GitHub)" <gi...@apache.org>.
chenjunjiedada commented on code in PR #7338:
URL: https://github.com/apache/iceberg/pull/7338#discussion_r1166735951


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSplitPlanner.java:
##########
@@ -136,12 +137,21 @@ static CloseableIterable<CombinedScanTask> planTasks(
     }
   }
 
-  private enum ScanMode {
+  @VisibleForTesting
+  enum ScanMode {
     BATCH,
     INCREMENTAL_APPEND_SCAN
   }
 
-  private static ScanMode checkScanMode(ScanContext context) {
+  @VisibleForTesting
+  static ScanMode checkScanMode(ScanContext context) {

Review Comment:
   >For the conditions here, is there any other simpler logic? E.g., is it enough to just remove the context.isStreaming() condition in the original if clause?
   
   Yes, it looks more simple and more direct.
   
   > Also I think it is better safer/more clear to construct a new ScanContext object and set the useSnapshotId.
   
   Agree, we can use `scanContext.copyWithSnapshotId` to achieve that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7338: Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7338:
URL: https://github.com/apache/iceberg/pull/7338#discussion_r1165658208


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSplitPlanner.java:
##########
@@ -136,12 +137,21 @@ static CloseableIterable<CombinedScanTask> planTasks(
     }
   }
 
-  private enum ScanMode {
+  @VisibleForTesting
+  enum ScanMode {
     BATCH,
     INCREMENTAL_APPEND_SCAN
   }
 
-  private static ScanMode checkScanMode(ScanContext context) {
+  @VisibleForTesting
+  static ScanMode checkScanMode(ScanContext context) {

Review Comment:
   @chenjunjiedada  thx for catching the bug and creating the PR fix.
   
   For the conditions here, is there any other simpler logic? E.g., is it enough to just remove the `context.isStreaming()` condition in the original if clause?
   
   Also I think it is better safer/more clear to construct a new `ScanContext` object and set the `useSnapshotId`.
   ```
       if (scanContext.streamingStartingStrategy()
           == StreamingStartingStrategy.TABLE_SCAN_THEN_INCREMENTAL) {
         // do a batch table scan first
         splits = FlinkSplitPlanner.planIcebergSourceSplits(table, scanContext, workerPool);
         LOG.info(
             "Discovered {} splits from initial batch table scan with snapshot Id {}",
             splits.size(),
             startSnapshot.snapshotId());
         // For TABLE_SCAN_THEN_INCREMENTAL, incremental mode starts exclusive from the startSnapshot
         toPosition =
             IcebergEnumeratorPosition.of(startSnapshot.snapshotId(), startSnapshot.timestampMillis());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on pull request #7338: Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on PR #7338:
URL: https://github.com/apache/iceberg/pull/7338#issuecomment-1510038817

   @chenjunjiedada thx for finding and fixing this bug


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu merged pull request #7338: Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu merged PR #7338:
URL: https://github.com/apache/iceberg/pull/7338


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org