You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/28 03:00:43 UTC

[GitHub] [iceberg] simuhunluo opened a new issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

simuhunluo opened a new issue #2168:
URL: https://github.com/apache/iceberg/issues/2168


   Just found this, but only spark example supported
   https://github.com/apache/iceberg/blob/master/site/docs/aws.md
   
   It would be better if there is a complete FlinkSQL+s3 example. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-809549272


   Cool, sorry I was out during the weekend. It seems worth adding a section describing these details about Flink in the documentation since there have been multiple people asking the same question, let me just do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo closed issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo closed issue #2168:
URL: https://github.com/apache/iceberg/issues/2168


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768934798


   @bkahloon  where the metadata store, Together with data in s3?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ayush-san commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
ayush-san commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-810768260


   I moved to HadoopFileIO instead of S3FileIO as S3FileIO was asking for many aws services like dynamodb/glue/kms and we have to add them individually. My code didn't require these libraries but I think S3FileIO is only integrated with glue(Not sure). 
   
   None of these aws sdk jars were on EMR also


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo removed a comment on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo removed a comment on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768792619


   In addition, if you want to use the open source minio as a poc verification, how to configure access_key, secret_access_key , endpoint, DynamoDb ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bkahloon commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
bkahloon commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769209110


   Are you asking about Iceberg metadata files?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769656812


   Metadata does not need to be in the same storage as the data. But it is stored in a storage.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769670937


   The interface only allows defining a single FileIO for both operations. However, if you read the interface, you will realize that you can check the file path to see if it is metadata file or not, and then choose to write metadata and data differently. It is a very flexible interface.
   
   In fact, the `HadoopFileIO` is doing such a thing right now for `HiveCatalog`. If you define a s3 path for your `warehouse`, then all your data files goes to s3, but by default all your metadata files are stored in the HDFS that the Hive metastore operates on. 
   
   btw, just to be clear, I am not advocating you to use mysql, that's just an example. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bkahloon commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
bkahloon commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768873706


   I was able to use the following changes in the `sql-client-defaults.yaml` file to get an integration between Flink SQL Client, Iceberg and AWS Glue
   
   ``` 
   catalogs: 
     - name: iceberg
       type: iceberg
       catalog-impl: org.apache.iceberg.aws.glue.GlueCatalog
       lock-impl: org.apache.iceberg.aws.glue.DynamoLockManager
       lock.table: icebergGlueLockTable
       warehouse: s3://warehouse-bucket/
   ```
   
   I also added the Iceberg Flink runtime jar to Flink's lib directory as well as the AWS SDK and HTTP client jars as well. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo closed issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo closed issue #2168:
URL: https://github.com/apache/iceberg/issues/2168


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ayush-san commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
ayush-san commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-808718975


   @jackye1995 I am passing `io-impl` to hive catalog in flink to save data in s3. But I am getting the following error 
   
   ```
   Exception in thread "main" java.lang.NoClassDefFoundError: software/amazon/awssdk/services/s3/model/ObjectCannedACL
   	at org.apache.iceberg.aws.AwsProperties.<init>(AwsProperties.java:236)
   	at org.apache.iceberg.aws.s3.S3FileIO.initialize(S3FileIO.java:108)
   	at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:237)
   	at org.apache.iceberg.hive.HiveCatalog.initialize(HiveCatalog.java:168)
   	at org.apache.iceberg.hive.HiveCatalog.<init>(HiveCatalog.java:147)
   	at org.apache.iceberg.flink.CatalogLoader$HiveCatalogLoader.loadCatalog(CatalogLoader.java:112)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769641714


   > @bkahloon where the metadata store, Together with data in s3?
   
   Iceberg table metadata (table version metadata, snapshot, manifest), are always stored in the storage, because itself can become "big metadata". For GlueCatalog, it is stored under the path configured under `warehouse`. For Hive, it uses Hive's metastore config key `hive.metastore.warehouse.dir. ([link](https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L490-L519
   )).
   
   Both catalog supports overriding this metadata path at namespace or table level.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769221942


   @bkahloon  Yeah.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo removed a comment on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo removed a comment on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768792679


   In addition, if want to use the open source minio as a poc verification, how to configure access_key, secret_access_key , endpoint, DynamoDb ?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo closed issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo closed issue #2168:
URL: https://github.com/apache/iceberg/issues/2168


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bkahloon commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
bkahloon commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769230427


   @simuhunluo I have not able to write to an Iceberg table yet, so I'm not sure what will happen in regards to the metadata files (I haven't looked through all the source code for the Glue Catalog as well). I had some issues with writing to the Iceberg table and have opened #2172 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ayush-san edited a comment on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
ayush-san edited a comment on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-808718975


   @jackye1995 I am passing `io-impl` to hive catalog in flink to save data in s3. I am getting the following error when trying to write data in S3 but I was able to write data to hdfs
   
   ```
   Exception in thread "main" java.lang.NoClassDefFoundError: software/amazon/awssdk/services/s3/model/ObjectCannedACL
   	at org.apache.iceberg.aws.AwsProperties.<init>(AwsProperties.java:236)
   	at org.apache.iceberg.aws.s3.S3FileIO.initialize(S3FileIO.java:108)
   	at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:237)
   	at org.apache.iceberg.hive.HiveCatalog.initialize(HiveCatalog.java:168)
   	at org.apache.iceberg.hive.HiveCatalog.<init>(HiveCatalog.java:147)
   	at org.apache.iceberg.flink.CatalogLoader$HiveCatalogLoader.loadCatalog(CatalogLoader.java:112)
   ```
   
   Here's my code
   
   ```Java
   Configuration hadoopConf = new Configuration();
   hadoopConf.set(ConfigProperties.ENGINE_HIVE_ENABLED, "true");
   Map<String, String> catalogProperties = new HashMap<String, String>() {
       {
           put("uri", "thrift://AWS_EMR_MASTER_IP:9083");
           put("warehouse", "s3://bucket/warehouse/prefix");
           put("io-impl", "org.apache.iceberg.aws.s3.S3FileIO");
       }
   };
   
   CatalogLoader catalogLoader = CatalogLoader.hive("hive_catalog", hadoopConf, catalogProperties);
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 edited a comment on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
jackye1995 edited a comment on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769641714


   > @bkahloon where the metadata store, Together with data in s3?
   
   Iceberg table metadata (table version metadata, snapshot, manifest), are always stored in the storage, because itself can become "big metadata". For GlueCatalog, it is stored under the path configured under `warehouse`. For Hive, it is determined by Hive's metastore config key `hive.metastore.warehouse.dir` value in Hive's runtime environment. ([link](https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L490-L519
   )).
   
   Both catalog supports overriding this metadata path at namespace or table level.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-812233747


   > S3FileIO was asking for many aws services like dynamodb/glue/kms and we have to add them individually
   
   This is a part of the effort to move to AWS SDK v2.
   
   There are users who only need individual AWS dependencies to minimize dependency size, that's why we chose this approach. If you don't care about the jar size, you can just use the aws bundle. More details are described in the PR I created.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769627622


   @bkahloon   ok, be glad to share.  
   
   In fact , I am trying to use hive metastore service  but not aws glue table.  
   + After creating catalog with flink sql client(using minio as s3 service poc verification),
   ```
   CREATE CATALOG hive_catalog with(
     'type'='iceberg',
     'catalog-type'='hive',
     'uri'='thrift://localhost:9083',
     'warehouse'='s3://mybucket/'
   );
   ```
   + When try to create the database, some error occurred
   ```
   create database hive_catalog.mydb;
   ```
   + error content:
   ```
   Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Got exception: org.apache.hadoop.fs.s3.S3Exception org.jets3t.service.S3ServiceException: S3 Error Message. 
   -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message:
    <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>minioadmin</AWSAccessKeyId><RequestId>707081D8D8FAE888</RequestId><HostId>h0zIBYj2MJDZ7H9uWtFXwkLyp8HWUk3F7mAr8DfrTym4HyKBFuJAqpMb3hOcEg3F3iOZkb0HBug=</HostId></Error>
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:39343) ~[?:?]
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:39311) ~[?:?]
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result.read(ThriftHiveMetastore.java:39245) ~[?:?]
           at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[?:?]
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_database(ThriftHiveMetastore.java:1106) ~[?:?]
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_database(ThriftHiveMetastore.java:1093) ~[?:?]
           at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:809) ~[?:?]
           at org.apache.iceberg.hive.HiveCatalog.lambda$createNamespace$7(HiveCatalog.java:302) ~[?:?]
           at org.apache.iceberg.hive.ClientPool.run(ClientPool.java:54) ~[?:?]
           at org.apache.iceberg.hive.HiveCatalog.createNamespace(HiveCatalog.java:301) ~[?:?]
           at org.apache.iceberg.flink.FlinkCatalog.createDatabase(FlinkCatalog.java:200) ~[?:?]
           at org.apache.iceberg.flink.FlinkCatalog.createDatabase(FlinkCatalog.java:193) ~[?:?]
           at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:968) ~[flink-table_2.12-1.11.2.jar:1.11.2]
           at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:690) ~[flink-table_2.12-1.11.2.jar:1.11.2]
           at org.apache.flink.table.client.gateway.local.LocalExecutor.lambda$executeSql$7(LocalExecutor.java:360) ~[flink-sql-client_2.11-1.11.2.jar:1.11.2]
           at org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255) ~[flink-sql-client_2.11-1.11.2.jar:1.11.2]
           at org.apache.flink.table.client.gateway.local.LocalExecutor.executeSql(LocalExecutor.java:360) ~[flink-sql-client_2.11-1.11.2.jar:1.11.2]
           ... 8 more
   ```
   
   It is easy to understand why this error occurs. `minioadmin` is my local minio servicekey, It does not exist on the aws s3 sever.
   But what I am puzzled is, the caller of s3 api should be iceberg, not hive metastore. 
   Anyone has any ideas?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768792619


   In addition, if you want to use the open source minio as a poc verification, how to configure access_key, secret_access_key , endpoint, DynamoDb ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769667376


   In other words, I can implement MysqlIO to store metadata , and S3FileIO/HadoopFileIO(iceberg has supported) to store data.  Will this cause performance problems?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768932103


    👍👍 Thanks for your answer.  
   
   > I was able to use the following changes in the `sql-client-defaults.yaml` file to get an integration between Flink SQL Client, Iceberg and AWS Glue
   > 
   > ```
   > catalogs: 
   >   - name: iceberg
   >     type: iceberg
   >     catalog-impl: org.apache.iceberg.aws.glue.GlueCatalog
   >     lock-impl: org.apache.iceberg.aws.glue.DynamoLockManager
   >     lock.table: icebergGlueLockTable
   >     warehouse: s3://warehouse-bucket/
   > ```
   > 
   > I also added the Iceberg Flink runtime jar to Flink's lib directory as well as the AWS SDK and HTTP client jars as well.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ayush-san commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
ayush-san commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-808722969


   Please ignore my above message, as I was testing my job in the local environment it didn't have s3 library. I fixed it by adding software.amazon.awssdk.s3 library in my local.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769655284


   Thanks for your answer. I have gained a lot. Metadata must store with data. This issue will be closed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ayush-san edited a comment on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
ayush-san edited a comment on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-808718975


   @jackye1995 I am passing `io-impl` to hive catalog in flink to save data in s3. But I am getting the following error 
   
   ```
   Exception in thread "main" java.lang.NoClassDefFoundError: software/amazon/awssdk/services/s3/model/ObjectCannedACL
   	at org.apache.iceberg.aws.AwsProperties.<init>(AwsProperties.java:236)
   	at org.apache.iceberg.aws.s3.S3FileIO.initialize(S3FileIO.java:108)
   	at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:237)
   	at org.apache.iceberg.hive.HiveCatalog.initialize(HiveCatalog.java:168)
   	at org.apache.iceberg.hive.HiveCatalog.<init>(HiveCatalog.java:147)
   	at org.apache.iceberg.flink.CatalogLoader$HiveCatalogLoader.loadCatalog(CatalogLoader.java:112)
   ```
   
   Here's my code
   
   ```Java
   Configuration hadoopConf = new Configuration();
   hadoopConf.set(ConfigProperties.ENGINE_HIVE_ENABLED, "true");
   Map<String, String> catalogProperties = new HashMap<String, String>() {
       {
           put("uri", "thrift://AWS_EMR_MASTER_IP:9083");
           put("warehouse", "s3://bucket/warehouse/prefix");
           put("io-impl", "org.apache.iceberg.aws.s3.S3FileIO");
       }
   };
   
   CatalogLoader catalogLoader = CatalogLoader.hive("hive_catalog", hadoopConf, catalogProperties);
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768870857


   In general, each compute engine has its own page to explain loading custom catalog, that's why only spark example is given in AWS page. Probably good to add this note in the AWS page. I am gathering the small fixes for the 0.11 website right now, I will update them all at once, thanks for the suggestion.
   
   To answer your question around Flink, use the guide here: https://iceberg.apache.org/flink/#custom-catalog, the minimum you need is:
   
   ```
   CREATE CATALOG my_catalog WITH (
     'type'='iceberg',
     'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog',
     'warehouse'='s3://my-bucket/my/key/prefix'
   );
   ```
   
   AWS credentials are accessed through the default credential chain that tries to retrieve credentials in a series of approaches: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html
   
   If you need more customization such as endpoint, you can load your own AwsClientFactory through the guide https://iceberg.apache.org/aws/#aws-client-customization


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768792679


   In addition, if want to use the open source minio as a poc verification, how to configure access_key, secret_access_key , endpoint, DynamoDb ?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768861794


   cc @jackye1995 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bkahloon edited a comment on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
bkahloon edited a comment on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769230427


   @simuhunluo I have not able to write to an Iceberg table yet, so I'm not sure what will happen in regards to the metadata files (I haven't looked through all the source code for the Glue Catalog as well). I had some issues with writing to the Iceberg table and have opened #2172 . If you get it working please let me know as well. Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768792770


   In addition, if want to use the open source [minio ](https://github.com/minio/minio/)as a poc verification, how to configure access_key, secret_access_key , endpoint, DynamoDb ?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769658717


   The storage should be a filesystem? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-769661862


   No, S3 is not a file system for example. It completely depends on your implementation of `org.apache.iceberg.io.FileIO`. When you use `HiveCatalog` and `HadoopCatalog`, it by default uses `HadoopFileIO` which treats `s3://` as a file system. But if you use `ClueCatalog`, it uses `S3FileIO` which does not have file system assumptions (which also means better performance).
   
   So when I say "storage", it is just conceptual, mostly because we have such a concept of "file IO". But technically it all depends on your implementation. You can implement a `MySQLIO` that write data to MySQL if you want, and it can be loaded by all catalogs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] simuhunluo commented on issue #2168: Flink: [doc] Is there a full example for Iceberg+Flink+S3 ?

Posted by GitBox <gi...@apache.org>.
simuhunluo commented on issue #2168:
URL: https://github.com/apache/iceberg/issues/2168#issuecomment-768787598


   In this scene, where to configure aws_access_key_id ,aws_secret_access_key .  In hive-site.xml ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org