You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/30 23:30:12 UTC
[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #5967: Flink: Support read options in flink source

chenjunjiedada commented on code in PR #5967:
URL: https://github.com/apache/iceberg/pull/5967#discussion_r1036549618


##########
docs/flink-getting-started.md:
##########
@@ -683,7 +683,47 @@ env.execute("Test Iceberg DataStream");
 OVERWRITE and UPSERT can't be set together. In UPSERT mode, if the table is partitioned, the partition fields should be included in equality fields.
 {{< /hint >}}
 
-## Write options
+## Options
+### Read options
+
+Flink read options are passed when configuring the FlinkSink, like this:
+
+```
+IcebergSource.forRowData()
+    .tableLoader(tableResource.tableLoader())
+    .assignerFactory(new SimpleSplitAssignerFactory())
+    .streaming(scanContext.isStreaming())
+    .streamingStartingStrategy(scanContext.streamingStartingStrategy())
+    .startSnapshotTimestamp(scanContext.startSnapshotTimestamp())
+    .startSnapshotId(scanContext.startSnapshotId())
+    .set("monitor-interval", "10s")
+    .build()
+```
+For Flink SQL, read options can be passed in via SQL hints like this:
+```
+SELECT * FROM tableName /*+ OPTIONS('monitor-interval'='10s') */
+...
+```
+
+| Flink option                | Default                            | Description                                                  |
+| --------------------------- | ---------------------------------- | ------------------------------------------------------------ |
+| snapshot-id                 |                                    | For time travel in batch mode. Read data from the specified snapshot-id. |
+| case-sensitive              | false                              | Whether the sql is case sensitive                            |
+| as-of-timestamp             |                                    | For time travel in batch mode. Read data from the most recent snapshot as of the given time in milliseconds. |
+| starting-strategy           | INCREMENTAL_FROM_LATEST_SNAPSHOT   | Starting strategy for streaming execution. TABLE_SCAN_THEN_INCREMENTAL: Do a regular table scan then switch to the incremental mode. The incremental mode starts from the current snapshot exclusive. INCREMENTAL_FROM_LATEST_SNAPSHOT: Start incremental mode from the latest snapshot inclusive. If it is an empty map, all future append snapshots should be discovered. INCREMENTAL_FROM_EARLIEST_SNAPSHOT: Start incremental mode from the earliest snapshot inclusive. If it is an empty map, all future append snapshots should be discovered. INCREMENTAL_FROM_SNAPSHOT_ID: Start incremental mode from a snapshot with a specific id inclusive. INCREMENTAL_FROM_SNAPSHOT_TIMESTAMP: Start incremental mode from a snapshot with a specific timestamp inclusive. If the timestamp is between two snapshots, it should start from the snapshot after the timestamp. Just for FIP27 Source |
+| start-snapshot-timestamp    |                                    | Start to read data from the most recent snapshot as of the given time in milliseconds. |
+| start-snapshot-id           |                                    | Start to read data from the specified snapshot-id.           |
+| end-snapshot-id             | The latest snapshot id             | Specifies the end snapshot.                                  |
+| split-size                  | Table read.split.target-size       | Overrides this table's read.split.target-size                |
+| split-lookback              | Table read.split.planning-lookback | Overrides this table's read.split.planning-lookback          |
+| split-file-open-cost        | Table read.split.open-file-cost    | Overrides this table's read.split.open-file-cost             |
+| streaming                   | false                              | Sets whether the current task runs in streaming or batch mode. |

Review Comment:
   I think we need to change the streaming to default true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org