You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/25 13:04:34 UTC

[GitHub] [iceberg] openinx opened a new pull request #2147: Docs: Add new flink features for release 0.11.0

openinx opened a new pull request #2147:
URL: https://github.com/apache/iceberg/pull/2147


   This address the issue: https://github.com/apache/iceberg/issues/2136


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2147: Docs: Add new flink features for release 0.11.0

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2147:
URL: https://github.com/apache/iceberg/pull/2147#discussion_r564941488



##########
File path: site/docs/flink.md
##########
@@ -28,13 +28,14 @@ we only integrate iceberg with apache flink 1.11.x .
 | [SQL create table like](#create-table-like)                            | ✔️                 |                                                        |
 | [SQL alter table](#alter-table)                                        | ✔️                 | Only support altering table properties, Columns/PartitionKey changes are not supported now|
 | [SQL drop_table](#drop-table)                                          | ✔️                 |                                                        |
-| [SQL select](#querying-with-sql)                                       | ✔️                 | Only support batch mode now.                           |
+| [SQL select](#querying-with-sql)                                       | ✔️                 | Support both streaming and batch mode                  |
 | [SQL insert into](#insert-into)                                        | ✔️ ️               | Support both streaming and batch mode                  |
 | [SQL insert overwrite](#insert-overwrite)                              | ✔️ ️               |                                                        |
 | [DataStream read](#reading-with-datastream)                            | ✔️ ️               |                                                        |
 | [DataStream append](#appending-data)                                   | ✔️ ️               |                                                        |
 | [DataStream overwrite](#overwrite-data)                                | ✔️ ️               |                                                        |
-| [Metadata tables](#inspecting-tables)                                  |    ️               |                                                        |
+| [Metadata tables](#inspecting-tables)                                  |    ️               | Support Java API but does not support Flink SQL        |
+| [Rewrite files action](#rewrite-files-action)                          | ✔️ ️               |                                                        |

Review comment:
       Yeah, that's why I am asking if we should add a column to suggest which ones work for 1.12.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2147: Docs: Add new flink features for release 0.11.0

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2147:
URL: https://github.com/apache/iceberg/pull/2147#discussion_r563727469



##########
File path: site/docs/flink.md
##########
@@ -224,24 +224,49 @@ DROP TABLE hive_catalog.default.sample;
 
 ## Querying with SQL
 
-Iceberg does not support streaming read in flink now, it's still working in-progress. But it support batch read to scan the existing records in iceberg table.
+Iceberg support both streaming and batch read in flink now. we could execute the following sql command to switch the execute type from 'streaming' mode to 'batch' mode, and vice versa:
+
+```sql
+-- Execute the flink job in streaming mode for current session context
+SET execution.type = streaming
+
+-- Execute the flink job in batch mode for current session context
+SET execution.type = batch
+```
+
+### Flink batch read
+
+If want to check all the rows in iceberg table by submitting a flink __batch__ job, you could execute the following sentences:
 
 ```sql
 -- Execute the flink job in batch mode for current session context
 SET execution.type = batch ;
 SELECT * FROM sample       ;
 ```
 
-Notice: we could execute the following sql command to switch the execute type from 'streaming' mode to 'batch' mode, and vice versa:
+### Flink streaming read
+
+Iceberg supports processing incremental data in flink streaming jobs which starts from a historical snapshot-id:
 
 ```sql
--- Execute the flink job in streaming mode for current session context
-SET execution.type = streaming
+-- Submit the flink job in streaming mode for current session.
+SET execution.type = streaming ;
 
--- Execute the flink job in batch mode for current session context
-SET execution.type = batch
+-- Enable this switch because streaming read SQL will provide few job options in flink SQL hint options.
+SET table.dynamic-table-options.enabled=true;
+
+-- Read all the records from the iceberg current snapshot, and then read incremental data starting from that snapshot.
+SELECT * FROM sample /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/ ;

Review comment:
       Though we've switched the `execution.type=streaming`  , we still need the hint option `'streaming'='true'` here because  the `execution.type=streaming` will not be passed to `StreamTableSource`  and table sql job won't know whether it is a `streaming` job or `batch` job ( in flink 1.11 and flink 1.12 ).  That's the imperfect place from flink SQL's implementation,  we will try to improve it in the future flink release.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2147: Docs: Add new flink features for release 0.11.0

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2147:
URL: https://github.com/apache/iceberg/pull/2147


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on a change in pull request #2147: Docs: Add new flink features for release 0.11.0

Posted by GitBox <gi...@apache.org>.
yyanyy commented on a change in pull request #2147:
URL: https://github.com/apache/iceberg/pull/2147#discussion_r564784008



##########
File path: site/docs/flink.md
##########
@@ -224,24 +225,49 @@ DROP TABLE hive_catalog.default.sample;
 
 ## Querying with SQL
 
-Iceberg does not support streaming read in flink now, it's still working in-progress. But it support batch read to scan the existing records in iceberg table.
+Iceberg support both streaming and batch read in flink now. we could execute the following sql command to switch the execute type from 'streaming' mode to 'batch' mode, and vice versa:
+
+```sql
+-- Execute the flink job in streaming mode for current session context
+SET execution.type = streaming
+
+-- Execute the flink job in batch mode for current session context
+SET execution.type = batch
+```
+
+### Flink batch read
+
+If want to check all the rows in iceberg table by submitting a flink __batch__ job, you could execute the following sentences:
 
 ```sql
 -- Execute the flink job in batch mode for current session context
 SET execution.type = batch ;
 SELECT * FROM sample       ;
 ```
 
-Notice: we could execute the following sql command to switch the execute type from 'streaming' mode to 'batch' mode, and vice versa:
+### Flink streaming read
+
+Iceberg supports processing incremental data in flink streaming jobs which starts from a historical snapshot-id:
 
 ```sql
--- Execute the flink job in streaming mode for current session context
-SET execution.type = streaming
+-- Submit the flink job in streaming mode for current session.
+SET execution.type = streaming ;
 
--- Execute the flink job in batch mode for current session context
-SET execution.type = batch
+-- Enable this switch because streaming read SQL will provide few job options in flink SQL hint options.
+SET table.dynamic-table-options.enabled=true;

Review comment:
       Is this to ensure `OPTIONS(...)`  in the commands below will not be disregard by flink? If so "requires" might be better than "will provide". Also after `'streaming'='true'` is no longer needed in options, what if we don't specify options at all (or we don't specify 'monitor-interval'='1s' now)? Does it default to some value, or it will  just not work? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2147: Docs: Add new flink features for release 0.11.0

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2147:
URL: https://github.com/apache/iceberg/pull/2147#issuecomment-767815036


   I'm going to merge this as-is and we can fix minor points in follow-ups. Better to have imperfect documentation than none.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2147: Docs: Add new flink features for release 0.11.0

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2147:
URL: https://github.com/apache/iceberg/pull/2147#discussion_r564809780



##########
File path: site/docs/flink.md
##########
@@ -28,13 +28,14 @@ we only integrate iceberg with apache flink 1.11.x .
 | [SQL create table like](#create-table-like)                            | ✔️                 |                                                        |
 | [SQL alter table](#alter-table)                                        | ✔️                 | Only support altering table properties, Columns/PartitionKey changes are not supported now|
 | [SQL drop_table](#drop-table)                                          | ✔️                 |                                                        |
-| [SQL select](#querying-with-sql)                                       | ✔️                 | Only support batch mode now.                           |
+| [SQL select](#querying-with-sql)                                       | ✔️                 | Support both streaming and batch mode                  |
 | [SQL insert into](#insert-into)                                        | ✔️ ️               | Support both streaming and batch mode                  |
 | [SQL insert overwrite](#insert-overwrite)                              | ✔️ ️               |                                                        |
 | [DataStream read](#reading-with-datastream)                            | ✔️ ️               |                                                        |
 | [DataStream append](#appending-data)                                   | ✔️ ️               |                                                        |
 | [DataStream overwrite](#overwrite-data)                                | ✔️ ️               |                                                        |
-| [Metadata tables](#inspecting-tables)                                  |    ️               |                                                        |
+| [Metadata tables](#inspecting-tables)                                  |    ️               | Support Java API but does not support Flink SQL        |
+| [Rewrite files action](#rewrite-files-action)                          | ✔️ ️               |                                                        |

Review comment:
       I don't think that support currently works in 1.12, according to the docs?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2147: Docs: Add new flink features for release 0.11.0

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2147:
URL: https://github.com/apache/iceberg/pull/2147#discussion_r564694477



##########
File path: site/docs/flink.md
##########
@@ -28,13 +28,14 @@ we only integrate iceberg with apache flink 1.11.x .
 | [SQL create table like](#create-table-like)                            | ✔️                 |                                                        |
 | [SQL alter table](#alter-table)                                        | ✔️                 | Only support altering table properties, Columns/PartitionKey changes are not supported now|
 | [SQL drop_table](#drop-table)                                          | ✔️                 |                                                        |
-| [SQL select](#querying-with-sql)                                       | ✔️                 | Only support batch mode now.                           |
+| [SQL select](#querying-with-sql)                                       | ✔️                 | Support both streaming and batch mode                  |
 | [SQL insert into](#insert-into)                                        | ✔️ ️               | Support both streaming and batch mode                  |
 | [SQL insert overwrite](#insert-overwrite)                              | ✔️ ️               |                                                        |
 | [DataStream read](#reading-with-datastream)                            | ✔️ ️               |                                                        |
 | [DataStream append](#appending-data)                                   | ✔️ ️               |                                                        |
 | [DataStream overwrite](#overwrite-data)                                | ✔️ ️               |                                                        |
-| [Metadata tables](#inspecting-tables)                                  |    ️               |                                                        |
+| [Metadata tables](#inspecting-tables)                                  |    ️               | Support Java API but does not support Flink SQL        |
+| [Rewrite files action](#rewrite-files-action)                          | ✔️ ️               |                                                        |

Review comment:
       should we add a 1.12 column?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2147: Docs: Add new flink features for release 0.11.0

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2147:
URL: https://github.com/apache/iceberg/pull/2147#discussion_r563727469



##########
File path: site/docs/flink.md
##########
@@ -224,24 +224,49 @@ DROP TABLE hive_catalog.default.sample;
 
 ## Querying with SQL
 
-Iceberg does not support streaming read in flink now, it's still working in-progress. But it support batch read to scan the existing records in iceberg table.
+Iceberg support both streaming and batch read in flink now. we could execute the following sql command to switch the execute type from 'streaming' mode to 'batch' mode, and vice versa:
+
+```sql
+-- Execute the flink job in streaming mode for current session context
+SET execution.type = streaming
+
+-- Execute the flink job in batch mode for current session context
+SET execution.type = batch
+```
+
+### Flink batch read
+
+If want to check all the rows in iceberg table by submitting a flink __batch__ job, you could execute the following sentences:
 
 ```sql
 -- Execute the flink job in batch mode for current session context
 SET execution.type = batch ;
 SELECT * FROM sample       ;
 ```
 
-Notice: we could execute the following sql command to switch the execute type from 'streaming' mode to 'batch' mode, and vice versa:
+### Flink streaming read
+
+Iceberg supports processing incremental data in flink streaming jobs which starts from a historical snapshot-id:
 
 ```sql
--- Execute the flink job in streaming mode for current session context
-SET execution.type = streaming
+-- Submit the flink job in streaming mode for current session.
+SET execution.type = streaming ;
 
--- Execute the flink job in batch mode for current session context
-SET execution.type = batch
+-- Enable this switch because streaming read SQL will provide few job options in flink SQL hint options.
+SET table.dynamic-table-options.enabled=true;
+
+-- Read all the records from the iceberg current snapshot, and then read incremental data starting from that snapshot.
+SELECT * FROM sample /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/ ;

Review comment:
       Though we've switched the `execution.type=streaming`  , we still need the hint option `'streaming'='true'` here because  the `execution.type=streaming` will not be passed to `StreamTableSource`  and table sql job won't know whether it is a `streaming` job or `batch` job ( in flink 1.11 and flink 1.12 ).  That's the imperfect place from flink SQL's implementation,  we will try to improve it in the future flink release.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org