You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/20 03:20:07 UTC
[GitHub] [iceberg] MrDannyWu opened a new issue, #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
MrDannyWu opened a new issue, #6021:
URL: https://github.com/apache/iceberg/issues/6021
### Query engine
Flink CDC
### Question
[ERROR] Could not execute SQL statement. Reason:
java.lang.IllegalStateException: Equality field columns shouldn't be empty when configuring to use UPSERT data stream.
table properties:
![image](https://user-images.githubusercontent.com/19458048/196848534-b168a11d-ac5f-420e-8a14-857a371098ce.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] closed issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
URL: https://github.com/apache/iceberg/issues/6021
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] MrDannyWu commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by GitBox <gi...@apache.org>.
MrDannyWu commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1285477060
OK,the reason why I use hive to create a table is that I use flink sql to
create an iceberg table, but the data cannot be queried in hive.
On Thu, Oct 20, 2022 at 7:33 PM xzw_deepnova ***@***.***>
wrote:
> I am not sure whether the primary key of a table in hive can be applied to
> the primary key of a table in flink, because I rarely use hive and I am not
> familiar with it. I can try the sql client show create table in flink to
> check whether the table is created with a primary key.
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/iceberg/issues/6021#issuecomment-1285372540>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AEUOQAGYIBJ6FAWG4J64OEDWEEUYJANCNFSM6AAAAAARJWQIBY>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] xuzhiwen1255 commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by GitBox <gi...@apache.org>.
xuzhiwen1255 commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1285220982
Can you look at your sql?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] MrDannyWu commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by GitBox <gi...@apache.org>.
MrDannyWu commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1296575962
> Seems the primiary key can't be derived from the iceberg's HiveCatalog even though you have defined it using Hive Cli .
>
> 1. you can define with the following using Flink SQL
>
> ```
> CREATE TABLE ods_data_1_1 (
> xxx,
> xxx,
> PRIMARY KEY(`id`) NOT ENFORCED
> ) WITH (
> 'connector'='iceberg',
> 'catalog-name'='hive_prod',
> 'catalog-database'='default',
> 'catalog-table'='ods_data_1_1',
> 'uri'='hdfs://nn-1:8020/user/hive/warehouse/',
> 'warehouse'='hdfs://nn:8020/path/to/warehouse'
> );
> ```
>
> Note: it won't create a table, it's just a mapping to the table crated before in Hive. Refer the [flink-connector](https://iceberg.apache.org/docs/latest/flink-connector/) for more details. Then write the data using the sql normally.
>
> 2. You can use Flink DDL to create the table, the Hive should have the ability to read the table. Have you ever followed the document [enable Iceberg support](https://iceberg.apache.org/docs/latest/hive/#enabling-iceberg-support-in-hive)?
Ok, so far I have solved this problem. When I created the Iceberg table (Hive Ctatlog) in the Flink SQL Client, I lost the parameters. After carefully reviewing the Iceberg and Flink documents and trying many times, Upsert can be implemented. Thanks to every helper!
### The correct steps are as follows:
1. create catalog
```
CREATE CATALOG hive_iceberg
WITH (
'clients'='5',
'type'='iceberg',
'catalog-type'='hive',
'property-version'='1',
'uri'='thrift://xxxx:9083',
'warehouse'='hdfs://xxxx/user/hive/warehouse/'
);
```
2. create db
```
CREATE DATABASE ods;
```
3. create table
```
CREATE TABLE IF NOT EXISTS hive_iceberg.ods.table(
id BIGINT,
...
PRIMARY KEY (id) NOT ENFORCED
)
WITH (
'format-version'='2',
'iceberg.mr.catalog'='hive',
'engine.hive.enabled'='true',
'write.upsert.enabled'='true',
'write.metadata.previous-versions-max'='3',
'hive.vectorized.execution.enabled'='false',
'write.metadata.delete-after-commit.enabled'='true',
'warehouse'='hdfs://xxxx/user/hive/warehouse/ods/table'
)
;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] luoyuxia commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by GitBox <gi...@apache.org>.
luoyuxia commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1296233911
Seems the primiary key can't be derived from the iceberg's HiveCatalog even though you have defined it using Hive Cli .
1. you can define with the following using Flink SQL
```
CREATE TABLE ods_data_1_1 (
xxx,
xxx,
PRIMARY KEY(`id`) NOT ENFORCED
) WITH (
'connector'='iceberg',
'catalog-name'='hive_prod',
'catalog-database'='default',
'catalog-table'='ods_data_1_1',
'uri'='hdfs://nn-1:8020/user/hive/warehouse/',
'warehouse'='hdfs://nn:8020/path/to/warehouse'
);
```
Note: it won't create a table, it's just a mapping to the table crated before in Hive
Then write the data using the sql normally.
2. You can use Flink DDL to create the table, the Hive should have the ability to read the table. Have you ever followed the document [enable Iceberg support] (https://iceberg.apache.org/docs/latest/hive/#enabling-iceberg-support-in-hive)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1528902310
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] MrDannyWu commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by GitBox <gi...@apache.org>.
MrDannyWu commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1286361800
> Flink SQL> CREATE CATALOG hc WITH (
>
> > 'type'='iceberg',
> > 'catalog-type'='hive',
> > 'uri'='thrift://xxx:9083',
> > 'clients'='5',
> > 'property-version'='1',
> > 'warehouse'='hdfs://xxx:8020/user/iceberg/warehouse5'
> > );
> > [INFO] Execute statement succeed.
>
> Flink SQL> use hc; [ERROR] Could not execute SQL statement. Reason: org.apache.flink.table.catalog.exceptions.CatalogException: A database with name [hc] does not exist in the catalog: [default_catalog].
>
> Flink SQL> use catalog hc; [INFO] Execute statement succeed.
>
> Flink SQL> create database db1; [INFO] Execute statement succeed.
>
> Flink SQL> use db1; [INFO] Execute statement succeed.
>
> Flink SQL> create table t1(id int ,name string); [INFO] Execute statement succeed.
>
> Flink SQL> show tables; +------------+ | table name | +------------+ | t1 | +------------+ 1 row in set
>
> Flink SQL>
>
> -------------------- hive client ------------------- WARNING: Hive CLI is deprecated and migration to Beeline is recommended. hive> add jar /root/ iceberg-hive-runtime-1.0.0.jar > ; Added [/root/, iceberg-hive-runtime-1.0.0.jar] to class path Added resources: [/root/, iceberg-hive-runtime-1.0.0.jar] hive> show databases; OK db db1 default migrate one_hdfs test x_baili Time taken: 1.344 seconds, Fetched: 7 row(s) hive> use db1; OK Time taken: 0.045 seconds hive> show tables; OK t1 Time taken: 0.07 seconds, Fetched: 1 row(s) hive>
>
> **I have the same configuration with you, but I can also check it. I think there is something wrong with my hive operation. My Hive version is lower than 2.1.1**
Okay, thanks a lot,looking forward to your reply again, I am also looking for a solution at the same time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] MrDannyWu commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by GitBox <gi...@apache.org>.
MrDannyWu commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1284867853
I have set the primary key according to the official documentation and set upsert enable;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] xuzhiwen1255 commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by GitBox <gi...@apache.org>.
xuzhiwen1255 commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1286334384
Flink SQL> CREATE CATALOG hc WITH (
> 'type'='iceberg',
> 'catalog-type'='hive',
> 'uri'='thrift://xxx:9083',
> 'clients'='5',
> 'property-version'='1',
> 'warehouse'='hdfs://xxx:8020/user/iceberg/warehouse5'
> );
[INFO] Execute statement succeed.
Flink SQL> use hc;
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.catalog.exceptions.CatalogException: A database with name [hc] does not exist in the catalog: [default_catalog].
Flink SQL> use catalog hc;
[INFO] Execute statement succeed.
Flink SQL> create database db1;
[INFO] Execute statement succeed.
Flink SQL> use db1;
[INFO] Execute statement succeed.
Flink SQL> create table t1(id int ,name string);
[INFO] Execute statement succeed.
Flink SQL> show tables;
+------------+
| table name |
+------------+
| t1 |
+------------+
1 row in set
Flink SQL>
-------------------- hive client -------------------
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> add jar /root/ iceberg-hive-runtime-1.0.0.jar
> ;
Added [/root/, iceberg-hive-runtime-1.0.0.jar] to class path
Added resources: [/root/, iceberg-hive-runtime-1.0.0.jar]
hive> show databases;
OK
db
db1
default
migrate
one_hdfs
test
x_baili
Time taken: 1.344 seconds, Fetched: 7 row(s)
hive> use db1;
OK
Time taken: 0.045 seconds
hive> show tables;
OK
t1
Time taken: 0.07 seconds, Fetched: 1 row(s)
hive>
**I have the same configuration with you, but I can also check it. I think there is something wrong with my hive operation. My Hive version is lower than 2.1.1**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] MrDannyWu commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by GitBox <gi...@apache.org>.
MrDannyWu commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1285297768
> Can you look at your sql?
Thank you for your reply and help, I will provide you with my steps.
The background of the problem is that I want to synchronize mysql data to Iceberg (Hive Catalog) through Flink CDC. The default is to write to Iceberg in Append mode. If there is update or delete, there will be duplicate data. If I want to use Upsert mode, there is a problem. In fact, I just want to know how to write Iceberg (Hive Catalog) through Upsert.
step 1: create table on hive
SET engine.hive.enabled=true;
SET iceberg.engine.hive.enabled=true;
SET iceberg.mr.catalog=hive;
SET hive.vectorized.execution.enabled=false;
add jar /soft/iceberg-hive-runtime-1.0.0.jar;
CREATE EXTERNAL TABLE ods_data_1_1(
`id` BIGINT,
`name` STRING,
`age` BIGINT,
`gender` STRING,
`amount` BIGINT,
`geohash_code` STRING,
`status` BIGINT,
`location` STRING,
PRIMARY KEY (`id`) NOT ENFORCED
)
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 'hdfs://nn-1:8020/user/hive/warehouse/danny_test/ods_data_1_1'
TBLPROPERTIES (
'format-version'='2',
'iceberg.mr.catalog'='hadoop',
'iceberg.mr.catalog.hadoop.warehouse.location'='hdfs://nn-1:8020/user/hive/warehouse/danny_test/ods_data_1_1',
'write.upsert.enabled'='true',
'write.metadata.delete-after-commit.enabled'='true',
'write.metadata.previous-versions-max'='2'
);
step 2:Flink CDC
SET execution.checkpointing.interval = 3s;
SET execution.result-mode=table;
SET execution.result-mode=tableau;
SET yarn.application.queue=root;
# create source table
CREATE TABLE ods_data_1_1 (
`id` BIGINT,
`name` STRING,
`age` BIGINT,
`gender` STRING,
`amount` BIGINT,
`geohash_code` STRING,
`status` BIGINT,
`location` STRING,
PRIMARY KEY (`id`) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'x.x.x.x',
'port' = '3306',
'username' = 'name',
'password' = 'pass',
'database-name' = 'db1',
'table-name' = 'tb1'
);
# create catalog
CREATE CATALOG ice WITH (
'type'='iceberg',
'catalog-type'='hive',
'uri'='thrift://nn-1:9083',
'clients'='5',
'property-version'='1',
'warehouse'='hdfs://nn-1:8020/user/hive/warehouse/'
);
# submit job
insert into ice.danny_test.ods_data_1_1 select * from ods_data_1_1 ;
report error:
>[ERROR] Could not execute SQL statement. Reason:
java.lang.IllegalStateException: Equality field columns shouldn't be empty when configuring to use UPSERT data stream.
# PS
Flink 14.4 iceberg 1.0.0 hive 3.x
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] xuzhiwen1255 commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by GitBox <gi...@apache.org>.
xuzhiwen1255 commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1285372540
I am not sure whether the primary key of a table in hive can be applied to the primary key of a table in flink, because I rarely use hive and I am not familiar with it. I can try the sql client show create table in flink to check whether the table is created with a primary key.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] commented on issue #6021: When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6021:
URL: https://github.com/apache/iceberg/issues/6021#issuecomment-1603464481
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org