You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "yuehanwang (Jira)" <ji...@apache.org> on 2022/04/26 07:16:00 UTC

[jira] [Updated] (HUDI-3962) hudi use partition path field as hive partition field in flink

     [ https://issues.apache.org/jira/browse/HUDI-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

yuehanwang updated HUDI-3962:
-----------------------------
    Summary: hudi use partition path field as hive partition field in flink  (was: flink cdc sink hudi failed to add hive partition fields for hive sync)

> hudi use partition path field as hive partition field in flink
> --------------------------------------------------------------
>
>                 Key: HUDI-3962
>                 URL: https://issues.apache.org/jira/browse/HUDI-3962
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: yuehanwang
>            Assignee: loukey_j
>            Priority: Major
>              Labels: pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> h1. flink cdc sink hudi failed to add hive partition fields for hive sync
>  
> Steps to reproduce the behavior:
> 1. create a mysql table like :  
> ```
> CREATE TABLE `timeTypeTest` (
>   `id` int(11) NOT NULL AUTO_INCREMENT,
>   `datetime1` datetime DEFAULT NULL,
>   `date1` date DEFAULT NULL,
>   `datetime16` datetime(6) DEFAULT NULL,
>   `time16` time DEFAULT NULL,
>   `timestamp16` timestamp(6) NULL DEFAULT NULL,
>   `timestamp16Partition` varchar(45) DEFAULT NULL,
>   PRIMARY KEY (`id`),
>   UNIQUE KEY `id_UNIQUE` (`id`)
> ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
> ```
> 2. insert a data
> `insert into mydb.timeTypeTest values ('2', '2020-07-30 10:08:22', '2020-07-30', '2020-07-30 10:08:22.000000', '10:08:22', '2020-07-30 10:08:22.000000', '2020-07-30')`
> 4. start a flink cdc to sink hudi with my config properties:
> ```
> --hive-sync-enable=ture
> --hive-sync-jdbc-url=jdbc:hive2://localhost:10000
> --hive-sync-db=testDb
> --hive-sync-table=testTable
> --record-key-field=id
> --partition-path-field=timestamp16
> --hive-sync-partition-fields=inc_day
> --hive-style-partitioning=true
> --hive-sync-mode=jdbc
> --hive-sync-username=hive
> --hive-sync-password=hive
> hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
> hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM-dd
> hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled=true
> hive_sync.partition_extractor_class=org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator
> ```
> **Expected behavior**
> create a hive table testTable with string partition field _inc_day_ and add a partition "2020-07-30".  But actually the partition field is _timestamp16_ with bigint type.
> ```
> show partitions testTable;  ---- "2020-07-30"
> select timestamp16 from testTable; ----- null
> ```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)