You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/25 06:34:35 UTC

[GitHub] [hudi] onlywangyh commented on issue #5394: flink cdc sink hudi failed to add hive partition fields for hive sync

onlywangyh commented on issue #5394:
URL: https://github.com/apache/hudi/issues/5394#issuecomment-1108132152

If i keep the same params like `--partition-path-field=timestamp16, --hive-sync-partition-fields=timestamp16`. There will be some question:
1、In the schema the _timestamp16_ is a bigint type. When we use _timestamp16_ as a partition field. It will be a string type in hive schema. The bigint type can't convert to a string. So that `select timestamp16 from testTable;` will also return null.
2、In KeyGenerator we use the _PARTITIONPATH_FIELD_NAME_ to get a partition path, we use the _HIVE_SYNC_PARTITION_FIELDS_ as a partition field sync to hive . These two params will be good when the field is string type . But The TimestampBasedAvroKeyGenerator relies on timestamps for the partition field. The field values are interpreted as timestamps and not just converted to string while generating partition path value for records . So when use the TimestampBasedAvroKeyGenerator we will get a string partition path like `2020-07-30` . I think the _PARTITIONPATH_FIELD_NAME、HIVE_SYNC_PARTITION_FIELDS_ should diff to avoid the origin partition path field as a hive partition field cause some loss of precision、converted err

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org