You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/20 03:20:07 UTC

[GitHub] [iceberg] MrDannyWu opened a new issue, #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

MrDannyWu opened a new issue, #6021:
URL: https://github.com/apache/iceberg/issues/6021

   ### Query engine
   
   Flink CDC
   
   ### Question
   
   [ERROR] Could not execute SQL statement. Reason:
   java.lang.IllegalStateException: Equality field columns shouldn't be empty when configuring to use UPSERT data stream.
   table properties：
   ![image](https://user-images.githubusercontent.com/19458048/196848534-b168a11d-ac5f-420e-8a14-857a371098ce.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] github-actions[bot] closed issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] closed issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
URL: https://github.com/apache/iceberg/issues/6021


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] MrDannyWu commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by GitBox <gi...@apache.org>.

MrDannyWu commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1285477060

   OK,the reason why I use hive to create a table is that I use flink sql to
   create an iceberg table, but the data cannot be queried in hive.
   
   On Thu, Oct 20, 2022 at 7:33 PM xzw_deepnova ***@***.***>
   wrote:
   
   > I am not sure whether the primary key of a table in hive can be applied to
   > the primary key of a table in flink, because I rarely use hive and I am not
   > familiar with it. I can try the sql client show create table in flink to
   > check whether the table is created with a primary key.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/issues/6021#issuecomment-1285372540>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AEUOQAGYIBJ6FAWG4J64OEDWEEUYJANCNFSM6AAAAAARJWQIBY>
   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xuzhiwen1255 commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by GitBox <gi...@apache.org>.

xuzhiwen1255 commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1285220982

   Can you look at your sql?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] MrDannyWu commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by GitBox <gi...@apache.org>.

MrDannyWu commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1296575962

   > Seems the primiary key can't be derived from the iceberg's HiveCatalog even though you have defined it using Hive Cli .
   > 
   > 1. you can define with the following using Flink SQL
   > 
   > ```
   > CREATE TABLE ods_data_1_1 (
   >     xxx,
   >     xxx,
   >     PRIMARY KEY(`id`) NOT ENFORCED
   > ) WITH (
   >     'connector'='iceberg',
   >     'catalog-name'='hive_prod',
   >     'catalog-database'='default',
   >     'catalog-table'='ods_data_1_1',
   >     'uri'='hdfs://nn-1:8020/user/hive/warehouse/',
   >     'warehouse'='hdfs://nn:8020/path/to/warehouse'
   > );
   > ```
   > 
   > Note: it won't create a table, it's just a mapping to the table crated before in Hive. Refer the [flink-connector](https://iceberg.apache.org/docs/latest/flink-connector/) for more details. Then write the data using the sql normally.
   > 
   > 2. You can use Flink DDL to create the table, the Hive should have the ability to read the table.  Have you ever followed the document [enable Iceberg support](https://iceberg.apache.org/docs/latest/hive/#enabling-iceberg-support-in-hive)?
   
   Ok, so far I have solved this problem. When I created the Iceberg table (Hive Ctatlog) in the Flink SQL Client, I lost the parameters. After carefully reviewing the Iceberg and Flink documents and trying many times, Upsert can be implemented. Thanks to every helper!
   
   ### The correct steps are as follows:
   1. create catalog
   ```
   CREATE CATALOG hive_iceberg
   WITH (
       'clients'='5',
       'type'='iceberg',
       'catalog-type'='hive',
       'property-version'='1',
       'uri'='thrift://xxxx:9083',
       'warehouse'='hdfs://xxxx/user/hive/warehouse/'
   );
   ```
   
   2. create db
   ```
   CREATE DATABASE ods;
   ```
   
   3. create table
   ```
   CREATE TABLE IF NOT EXISTS hive_iceberg.ods.table(
   id BIGINT,
   ...
   PRIMARY KEY (id) NOT ENFORCED
   )
   WITH (
       'format-version'='2',
       'iceberg.mr.catalog'='hive',
       'engine.hive.enabled'='true',
       'write.upsert.enabled'='true',
       'write.metadata.previous-versions-max'='3',
       'hive.vectorized.execution.enabled'='false',
       'write.metadata.delete-after-commit.enabled'='true',
       'warehouse'='hdfs://xxxx/user/hive/warehouse/ods/table'
   )
   ;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] luoyuxia commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by GitBox <gi...@apache.org>.

luoyuxia commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1296233911

   Seems the primiary key can't be derived from the iceberg's HiveCatalog even though you have defined it using Hive Cli . 
   
   1. you can define with the following using Flink SQL
   ```
   CREATE TABLE ods_data_1_1 (
       xxx,
       xxx,
       PRIMARY KEY(`id`) NOT ENFORCED
   ) WITH (
       'connector'='iceberg',
       'catalog-name'='hive_prod',
       'catalog-database'='default',
       'catalog-table'='ods_data_1_1',
       'uri'='hdfs://nn-1:8020/user/hive/warehouse/',
       'warehouse'='hdfs://nn:8020/path/to/warehouse'
   );
   ```
   Note: it won't create a table, it's just a mapping to the table crated before in Hive 
   Then write the data using the sql normally.
   
   2. You can use Flink DDL to create the table, the Hive should have the ability to read the table.  Have you ever followed the document [enable Iceberg support] (https://iceberg.apache.org/docs/latest/hive/#enabling-iceberg-support-in-hive)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] github-actions[bot] commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1528902310

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] MrDannyWu commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by GitBox <gi...@apache.org>.

MrDannyWu commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1286361800

   
   
   
   > Flink SQL> CREATE CATALOG hc WITH (
   > 
   > > 'type'='iceberg',
   > > 'catalog-type'='hive',
   > > 'uri'='thrift://xxx:9083',
   > > 'clients'='5',
   > > 'property-version'='1',
   > > 'warehouse'='hdfs://xxx:8020/user/iceberg/warehouse5'
   > > );
   > > [INFO] Execute statement succeed.
   > 
   > Flink SQL> use hc; [ERROR] Could not execute SQL statement. Reason: org.apache.flink.table.catalog.exceptions.CatalogException: A database with name [hc] does not exist in the catalog: [default_catalog].
   > 
   > Flink SQL> use catalog hc; [INFO] Execute statement succeed.
   > 
   > Flink SQL> create database db1; [INFO] Execute statement succeed.
   > 
   > Flink SQL> use db1; [INFO] Execute statement succeed.
   > 
   > Flink SQL> create table t1(id int ,name string); [INFO] Execute statement succeed.
   > 
   > Flink SQL> show tables; +------------+ | table name | +------------+ | t1 | +------------+ 1 row in set
   > 
   > Flink SQL>
   > 
   > -------------------- hive client ------------------- WARNING: Hive CLI is deprecated and migration to Beeline is recommended. hive> add jar /root/ iceberg-hive-runtime-1.0.0.jar > ; Added [/root/, iceberg-hive-runtime-1.0.0.jar] to class path Added resources: [/root/, iceberg-hive-runtime-1.0.0.jar] hive> show databases; OK db db1 default migrate one_hdfs test x_baili Time taken: 1.344 seconds, Fetched: 7 row(s) hive> use db1; OK Time taken: 0.045 seconds hive> show tables; OK t1 Time taken: 0.07 seconds, Fetched: 1 row(s) hive>
   > 
   > **I have the same configuration with you, but I can also check it. I think there is something wrong with my hive operation. My Hive version is lower than 2.1.1**
   
   Okay, thanks a lot,looking forward to your reply again, I am also looking for a solution at the same time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] MrDannyWu commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by GitBox <gi...@apache.org>.

MrDannyWu commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1284867853

   I have set the primary key according to the official documentation and set upsert enable;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xuzhiwen1255 commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by GitBox <gi...@apache.org>.

xuzhiwen1255 commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1286334384

   Flink SQL>  CREATE CATALOG hc WITH (
   >   'type'='iceberg',
   >   'catalog-type'='hive',
   >   'uri'='thrift://xxx:9083',
   >   'clients'='5',
   >   'property-version'='1',
   >   'warehouse'='hdfs://xxx:8020/user/iceberg/warehouse5'
   > );
   [INFO] Execute statement succeed.
   
   Flink SQL> use hc;
   [ERROR] Could not execute SQL statement. Reason:
   org.apache.flink.table.catalog.exceptions.CatalogException: A database with name [hc] does not exist in the catalog: [default_catalog].
   
   Flink SQL> use catalog hc;
   [INFO] Execute statement succeed.
   
   Flink SQL> create database db1;
   [INFO] Execute statement succeed.
   
   Flink SQL> use db1;
   [INFO] Execute statement succeed.
   
   Flink SQL> create table t1(id int ,name string);
   [INFO] Execute statement succeed.
   
   Flink SQL> show tables;
   +------------+
   | table name |
   +------------+
   |         t1 |
   +------------+
   1 row in set
   
   Flink SQL> 
   
   
   -------------------- hive client -------------------
   WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
   hive> add jar /root/ iceberg-hive-runtime-1.0.0.jar
       > ;
   Added [/root/, iceberg-hive-runtime-1.0.0.jar] to class path
   Added resources: [/root/, iceberg-hive-runtime-1.0.0.jar]
   hive> show databases;
   OK
   db
   db1
   default
   migrate
   one_hdfs
   test
   x_baili
   Time taken: 1.344 seconds, Fetched: 7 row(s)
   hive> use db1;
   OK
   Time taken: 0.045 seconds
   hive> show tables;
   OK
   t1
   Time taken: 0.07 seconds, Fetched: 1 row(s)
   hive> 
   
   **I have the same configuration with you, but I can also check it. I think there is something wrong with my hive operation. My Hive version is lower than 2.1.1**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] MrDannyWu commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by GitBox <gi...@apache.org>.

MrDannyWu commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1285297768

   > Can you look at your sql?
   
   Thank you for your reply and help, I will provide you with my steps.
   
   The background of the problem is that I want to synchronize mysql data to Iceberg (Hive Catalog) through Flink CDC. The default is to write to Iceberg in Append mode. If there is update or delete, there will be duplicate data. If I want to use Upsert mode, there is a problem. In fact, I just want to know how to write Iceberg (Hive Catalog) through Upsert.
   
   step 1: create table on hive
   
   SET engine.hive.enabled=true;
   SET iceberg.engine.hive.enabled=true;
   SET iceberg.mr.catalog=hive;
   SET hive.vectorized.execution.enabled=false;
   add jar /soft/iceberg-hive-runtime-1.0.0.jar;
   
   CREATE EXTERNAL TABLE ods_data_1_1(
       `id` BIGINT,
       `name` STRING,
       `age` BIGINT,
       `gender` STRING,
       `amount` BIGINT,
       `geohash_code` STRING,
       `status` BIGINT,
       `location` STRING,
       PRIMARY KEY (`id`) NOT ENFORCED
       )
   STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
   LOCATION 'hdfs://nn-1:8020/user/hive/warehouse/danny_test/ods_data_1_1'
   TBLPROPERTIES (
       'format-version'='2',
       'iceberg.mr.catalog'='hadoop',
       'iceberg.mr.catalog.hadoop.warehouse.location'='hdfs://nn-1:8020/user/hive/warehouse/danny_test/ods_data_1_1',
       'write.upsert.enabled'='true',
       'write.metadata.delete-after-commit.enabled'='true',
       'write.metadata.previous-versions-max'='2'
     );
   
   step 2:Flink CDC
   SET execution.checkpointing.interval = 3s;
   SET execution.result-mode=table;
   SET execution.result-mode=tableau;
   SET yarn.application.queue=root;
   
   # create source table
   CREATE TABLE ods_data_1_1 (
       `id` BIGINT,
       `name` STRING,
       `age` BIGINT,
       `gender` STRING,
       `amount` BIGINT,
       `geohash_code` STRING,
       `status` BIGINT,
       `location` STRING,
       PRIMARY KEY (`id`) NOT ENFORCED
     ) WITH (
       'connector' = 'mysql-cdc',
       'hostname' = 'x.x.x.x',
       'port' = '3306',
       'username' = 'name',
       'password' = 'pass',
       'database-name' = 'db1',
       'table-name' = 'tb1'
     );
   
   # create catalog
   CREATE CATALOG ice WITH (
     'type'='iceberg',
     'catalog-type'='hive',
     'uri'='thrift://nn-1:9083',
     'clients'='5',
     'property-version'='1',
     'warehouse'='hdfs://nn-1:8020/user/hive/warehouse/'
   );
   
   # submit job
   insert into ice.danny_test.ods_data_1_1  select * from ods_data_1_1 ;
   report error:
   >[ERROR] Could not execute SQL statement. Reason:
   java.lang.IllegalStateException: Equality field columns shouldn't be empty when configuring to use UPSERT data stream.
   
   # PS
   Flink 14.4 iceberg 1.0.0 hive 3.x
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] xuzhiwen1255 commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by GitBox <gi...@apache.org>.

xuzhiwen1255 commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1285372540

   I am not sure whether the primary key of a table in hive can be applied to the primary key of a table in flink, because I rarely use hive and I am not familiar with it. I can try the sql client show create table in flink to check whether the table is created with a primary key.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] github-actions[bot] commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1603464481

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org