You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/16 08:48:13 UTC

[GitHub] [iceberg] liubo1022126 opened a new issue #2702: Flink: Streaming read without startSnapshotId set will get all data first time.

liubo1022126 opened a new issue #2702:
URL: https://github.com/apache/iceberg/issues/2702


   when I have a new streaming read without startSnapshotId set, like below:
   
   ```
           DataStream<RowData> stream = FlinkSource.forRowData()
                   .env(env)
                   .tableLoader(tableLoader)
                   .streaming(true)
   //                .asOfTimestamp(1623326854881l)
   //                .startSnapshotId(2410302255210126103l)
   //                .snapshotId(7106594945028010714l)
   //                .filters(filters)
                   .build();
   ```
   
   when job submit, it return all data which like a batch select on current snapshot, and then, it will listen to the new snapshot data.
   
   I wonder if this is necessary because streaming reading focuses on the reading of new data, and with batch and stream unified, the data on current snapshot maybe huge, If really need to consider some reasons, Is it a better choice to add a switch for control?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org