You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/22 07:52:40 UTC
[GitHub] [hudi] onlywangyh opened a new issue, #5394: flink cdc sink hudi failed to add hive partition fields for hive sync
onlywangyh opened a new issue, #5394:
URL: https://github.com/apache/hudi/issues/5394
**To Reproduce**
Steps to reproduce the behavior:
1. create a mysql table like :
```
CREATE TABLE `timeTypeTest` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`datetime1` datetime DEFAULT NULL,
`date1` date DEFAULT NULL,
`datetime16` datetime(6) DEFAULT NULL,
`time16` time DEFAULT NULL,
`timestamp16` timestamp(6) NULL DEFAULT NULL,
`timestamp16Partition` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
```
2. insert a data
`insert into mydb.timeTypeTest values ('2', '2020-07-30 10:08:22', '2020-07-30', '2020-07-30 10:08:22.000000', '10:08:22', '2020-07-30 10:08:22.000000', '2020-07-30')`
4. start a flink cdc to sink hudi with my config properties:
```
--hive-sync-enable=ture
--hive-sync-jdbc-url=jdbc:hive2://localhost:10000
--hive-sync-db=testDb
--hive-sync-table=testTable
--record-key-field=id
--partition-path-field=timestamp16
--hive-sync-partition-fields=inc_day
--hive-style-partitioning=true
--hive-sync-mode=jdbc
--hive-sync-username=hive
--hive-sync-password=hive
hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM-dd
hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled=true
hive_sync.partition_extractor_class=org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator
```
**Expected behavior**
create a hive table testTable with string partition field _inc_day_ and add a partition "2020-07-30". But actually the partition field is _timestamp16_ with bigint type.
```
show partitions testTable; ---- "2020-07-30"
select timestamp16 from testTable; ----- null
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] codope closed issue #5394: hudi use partition path field as hive partition field error in flink
Posted by GitBox <gi...@apache.org>.
codope closed issue #5394: hudi use partition path field as hive partition field error in flink
URL: https://github.com/apache/hudi/issues/5394
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] onlywangyh commented on issue #5394: flink cdc sink hudi failed to add hive partition fields for hive sync
Posted by GitBox <gi...@apache.org>.
onlywangyh commented on issue #5394:
URL: https://github.com/apache/hudi/issues/5394#issuecomment-1108132152
If i keep the same params like `--partition-path-field=timestamp16, --hive-sync-partition-fields=timestamp16`. There will be some question:
1、In the schema the _timestamp16_ is a bigint type. When we use _timestamp16_ as a partition field. It will be a string type in hive schema. The bigint type can't convert to a string. So that `select timestamp16 from testTable;` will also return null.
2、In KeyGenerator we use the _PARTITIONPATH_FIELD_NAME_ to get a partition path, we use the _HIVE_SYNC_PARTITION_FIELDS_ as a partition field sync to hive . These two params will be good when the field is string type . But The TimestampBasedAvroKeyGenerator relies on timestamps for the partition field. The field values are interpreted as timestamps and not just converted to string while generating partition path value for records . So when use the TimestampBasedAvroKeyGenerator we will get a string partition path like `2020-07-30` . I think the _PARTITIONPATH_FIELD_NAME、HIVE_SYNC_PARTITION_FIELDS_ should diff to avoid the origin partition path field as a hive partition field cause some loss of precision、converted err
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #5394: flink cdc sink hudi failed to add hive partition fields for hive sync
Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5394:
URL: https://github.com/apache/hudi/issues/5394#issuecomment-1107990411
Hello, i think this is expected because these two params:
```java
--partition-path-field=timestamp16
--hive-sync-partition-fields=inc_day
```
should keep the same.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] codope commented on issue #5394: hudi use partition path field as hive partition field error in flink
Posted by GitBox <gi...@apache.org>.
codope commented on issue #5394:
URL: https://github.com/apache/hudi/issues/5394#issuecomment-1109754308
HUDI-3978 to track the issue. Closing it as we have a patch in review.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org