You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/28 08:36:50 UTC

[GitHub] [iceberg] XuQianJin-Stars opened a new pull request #2389: Doc: fix error and enhanced iceberg catalog description for the flink DataStream API

XuQianJin-Stars opened a new pull request #2389:
URL: https://github.com/apache/iceberg/pull/2389


   fix error and enhanced iceberg catalog description for the flink DataStream API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] XuQianJin-Stars commented on a change in pull request #2389: Doc: fix error and enhanced iceberg catalog description for the flink DataStream API

Posted by GitBox <gi...@apache.org>.

XuQianJin-Stars commented on a change in pull request #2389:
URL: https://github.com/apache/iceberg/pull/2389#discussion_r607437111



##########
File path: site/docs/flink.md
##########
@@ -312,17 +312,47 @@ INSERT OVERWRITE hive_catalog.default.sample PARTITION(data='a') SELECT 6;
 For a partitioned iceberg table, when all the partition columns are set a value in `PARTITION` clause, it is inserting into a static partition, otherwise if partial partition columns (prefix part of all partition columns) are set a value in `PARTITION` clause, it is writing the query result into a dynamic partition.
 For an unpartitioned iceberg table, its data will be completely overwritten by `INSERT OVERWRITE`.
 
-## Reading with DataStream
+## Iceberg Operation with DataStream API

Review comment:
       > I agree. And adding a section on accessing the Iceberg Java API can be a separate level-2 section, so there would be no need to change the heading level of all the remaining headings in this doc. I think that would be much better because it is fewer changes.
   
   well, I will change this later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx commented on a change in pull request #2389: Doc: fix error and enhanced iceberg catalog description for the flink DataStream API

Posted by GitBox <gi...@apache.org>.

openinx commented on a change in pull request #2389:
URL: https://github.com/apache/iceberg/pull/2389#discussion_r603233452



##########
File path: site/docs/flink.md
##########
@@ -312,17 +312,47 @@ INSERT OVERWRITE hive_catalog.default.sample PARTITION(data='a') SELECT 6;
 For a partitioned iceberg table, when all the partition columns are set a value in `PARTITION` clause, it is inserting into a static partition, otherwise if partial partition columns (prefix part of all partition columns) are set a value in `PARTITION` clause, it is writing the query result into a dynamic partition.
 For an unpartitioned iceberg table, its data will be completely overwritten by `INSERT OVERWRITE`.
 
-## Reading with DataStream
+## Iceberg Operation with DataStream API
+### Load Iceberg Catalog
+#### Load Hadoop Catalog
+
+```java
+    Map<String, String> properties = new HashMap<>();
+    properties.put("type", "iceberg");
+    properties.put("catalog-type", "hadoop");
+    properties.put("property-version", "1");
+    properties.put("warehouse", "hdfs://nn:8020/warehouse/path");
+
+    CatalogLoader catalogLoader = CatalogLoader.hadoop(HADOOP_CATALOG, new Configuration(), properties);
+```
+
+#### Load Hive Catalog
+
+```java
+    Map<String, String> properties = new HashMap<>();
+    properties.put("type", "iceberg");
+    properties.put("catalog-type", "hive");
+    properties.put("property-version", "1");
+    properties.put("warehouse", "hdfs://nn:8020/warehouse/path");
+    properties.put("uri", "thrift://localhost:9083");
+    properties.put("clients", Integer.toString(2));
+
+    CatalogLoader catalogLoader = CatalogLoader.hive(HIVE_CATALOG, new Configuration(), properties);
+```
+
+*Note*: The following are examples of Load Hadoop Catalog.
+
+### Reading with DataStream
 
 Iceberg support streaming or batch read in Java API now.
 
-### Batch Read
+#### Batch Read
 
 This example will read all records from iceberg table and then print to the stdout console in flink batch job:
 
 ```java
 StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment();
-TableLoader tableLoader = TableLoader.fromHadooptable("hdfs://nn:8020/warehouse/path");
+TableLoader tableLoader = TableLoader.fromHadoopTable("hdfs://nn:8020/warehouse/path");

Review comment:
       Thanks for the fixing !




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx commented on a change in pull request #2389: Doc: fix error and enhanced iceberg catalog description for the flink DataStream API

Posted by GitBox <gi...@apache.org>.

openinx commented on a change in pull request #2389:
URL: https://github.com/apache/iceberg/pull/2389#discussion_r603231781



##########
File path: site/docs/flink.md
##########
@@ -312,17 +312,47 @@ INSERT OVERWRITE hive_catalog.default.sample PARTITION(data='a') SELECT 6;
 For a partitioned iceberg table, when all the partition columns are set a value in `PARTITION` clause, it is inserting into a static partition, otherwise if partial partition columns (prefix part of all partition columns) are set a value in `PARTITION` clause, it is writing the query result into a dynamic partition.
 For an unpartitioned iceberg table, its data will be completely overwritten by `INSERT OVERWRITE`.
 
-## Reading with DataStream
+## Iceberg Operation with DataStream API
+### Load Iceberg Catalog
+#### Load Hadoop Catalog
+
+```java
+    Map<String, String> properties = new HashMap<>();

Review comment:
       Let's remove the indent for the following java sentences here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx commented on a change in pull request #2389: Doc: fix error and enhanced iceberg catalog description for the flink DataStream API

Posted by GitBox <gi...@apache.org>.

openinx commented on a change in pull request #2389:
URL: https://github.com/apache/iceberg/pull/2389#discussion_r603232297



##########
File path: site/docs/flink.md
##########
@@ -312,17 +312,47 @@ INSERT OVERWRITE hive_catalog.default.sample PARTITION(data='a') SELECT 6;
 For a partitioned iceberg table, when all the partition columns are set a value in `PARTITION` clause, it is inserting into a static partition, otherwise if partial partition columns (prefix part of all partition columns) are set a value in `PARTITION` clause, it is writing the query result into a dynamic partition.
 For an unpartitioned iceberg table, its data will be completely overwritten by `INSERT OVERWRITE`.
 
-## Reading with DataStream
+## Iceberg Operation with DataStream API
+### Load Iceberg Catalog
+#### Load Hadoop Catalog
+
+```java
+    Map<String, String> properties = new HashMap<>();
+    properties.put("type", "iceberg");
+    properties.put("catalog-type", "hadoop");
+    properties.put("property-version", "1");
+    properties.put("warehouse", "hdfs://nn:8020/warehouse/path");
+
+    CatalogLoader catalogLoader = CatalogLoader.hadoop(HADOOP_CATALOG, new Configuration(), properties);

Review comment:
       Need a sentence to indicate that how to load `TableLoader` &  load iceberg table.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2389: Doc: fix error and enhanced iceberg catalog description for the flink DataStream API

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2389:
URL: https://github.com/apache/iceberg/pull/2389#discussion_r605308098



##########
File path: site/docs/flink.md
##########
@@ -312,17 +312,47 @@ INSERT OVERWRITE hive_catalog.default.sample PARTITION(data='a') SELECT 6;
 For a partitioned iceberg table, when all the partition columns are set a value in `PARTITION` clause, it is inserting into a static partition, otherwise if partial partition columns (prefix part of all partition columns) are set a value in `PARTITION` clause, it is writing the query result into a dynamic partition.
 For an unpartitioned iceberg table, its data will be completely overwritten by `INSERT OVERWRITE`.
 
-## Reading with DataStream
+## Iceberg Operation with DataStream API

Review comment:
       I agree. And adding a section on accessing the Iceberg Java API can be a separate level-2 section, so there would be no need to change the heading level of all the remaining headings in this doc. I think that would be much better because it is fewer changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx commented on a change in pull request #2389: Doc: fix error and enhanced iceberg catalog description for the flink DataStream API

Posted by GitBox <gi...@apache.org>.

openinx commented on a change in pull request #2389:
URL: https://github.com/apache/iceberg/pull/2389#discussion_r603226642



##########
File path: site/docs/flink.md
##########
@@ -312,17 +312,47 @@ INSERT OVERWRITE hive_catalog.default.sample PARTITION(data='a') SELECT 6;
 For a partitioned iceberg table, when all the partition columns are set a value in `PARTITION` clause, it is inserting into a static partition, otherwise if partial partition columns (prefix part of all partition columns) are set a value in `PARTITION` clause, it is writing the query result into a dynamic partition.
 For an unpartitioned iceberg table, its data will be completely overwritten by `INSERT OVERWRITE`.
 
-## Reading with DataStream
+## Iceberg Operation with DataStream API

Review comment:
       I'd like to name this title as `Access iceberg table in Java API`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx commented on a change in pull request #2389: Doc: fix error and enhanced iceberg catalog description for the flink DataStream API

Posted by GitBox <gi...@apache.org>.

openinx commented on a change in pull request #2389:
URL: https://github.com/apache/iceberg/pull/2389#discussion_r603231454



##########
File path: site/docs/flink.md
##########
@@ -312,17 +312,47 @@ INSERT OVERWRITE hive_catalog.default.sample PARTITION(data='a') SELECT 6;
 For a partitioned iceberg table, when all the partition columns are set a value in `PARTITION` clause, it is inserting into a static partition, otherwise if partial partition columns (prefix part of all partition columns) are set a value in `PARTITION` clause, it is writing the query result into a dynamic partition.
 For an unpartitioned iceberg table, its data will be completely overwritten by `INSERT OVERWRITE`.
 
-## Reading with DataStream
+## Iceberg Operation with DataStream API
+### Load Iceberg Catalog

Review comment:
       Here I think we should introduce the background why we introduce the `CatalogLoader` & `TableLoader`:  We flink operator want to access the iceberg table while Catalog & Table are not serializable because they depends on some resources that could not be serializable (such as Connection). So we have to introduce the `Loader` to maintain the configurations which are required to initialize the `Catalog`  and `Table`. 
   
   Also we'd better to list the general `CatalogLoader`s and explain what do they mean: 
   
   a. `HiveCatalogLoader` ; 
   b. `HadoopCatalogLoader` ; 
   c. `CustomCatalogLoader`. 
   
   Ditto for `TableLoader`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx commented on a change in pull request #2389: Doc: fix error and enhanced iceberg catalog description for the flink DataStream API

Posted by GitBox <gi...@apache.org>.

openinx commented on a change in pull request #2389:
URL: https://github.com/apache/iceberg/pull/2389#discussion_r603234967



##########
File path: site/docs/flink.md
##########
@@ -312,17 +312,47 @@ INSERT OVERWRITE hive_catalog.default.sample PARTITION(data='a') SELECT 6;
 For a partitioned iceberg table, when all the partition columns are set a value in `PARTITION` clause, it is inserting into a static partition, otherwise if partial partition columns (prefix part of all partition columns) are set a value in `PARTITION` clause, it is writing the query result into a dynamic partition.
 For an unpartitioned iceberg table, its data will be completely overwritten by `INSERT OVERWRITE`.
 
-## Reading with DataStream
+## Iceberg Operation with DataStream API
+### Load Iceberg Catalog
+#### Load Hadoop Catalog
+
+```java
+    Map<String, String> properties = new HashMap<>();
+    properties.put("type", "iceberg");
+    properties.put("catalog-type", "hadoop");
+    properties.put("property-version", "1");
+    properties.put("warehouse", "hdfs://nn:8020/warehouse/path");
+
+    CatalogLoader catalogLoader = CatalogLoader.hadoop(HADOOP_CATALOG, new Configuration(), properties);
+```
+
+#### Load Hive Catalog
+
+```java
+    Map<String, String> properties = new HashMap<>();
+    properties.put("type", "iceberg");
+    properties.put("catalog-type", "hive");
+    properties.put("property-version", "1");
+    properties.put("warehouse", "hdfs://nn:8020/warehouse/path");
+    properties.put("uri", "thrift://localhost:9083");
+    properties.put("clients", Integer.toString(2));
+
+    CatalogLoader catalogLoader = CatalogLoader.hive(HIVE_CATALOG, new Configuration(), properties);
+```
+
+*Note*: The following are examples of Load Hadoop Catalog.

Review comment:
       Nit:  `The following are examples of Load Hadoop Catalog.`  -> `The following will take loading hadoop table as an example to demonstrate how to use Java API to design flink data stream jobs`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org