You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "HunterXHunter (Jira)" <ji...@apache.org> on 2023/01/20 17:44:00 UTC
[jira] [Assigned] (HUDI-5584) When the table to be synchronized already exists in hive, need to update serde/table properties
[ https://issues.apache.org/jira/browse/HUDI-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
HunterXHunter reassigned HUDI-5584:
-----------------------------------
Assignee: HunterXHunter
> When the table to be synchronized already exists in hive, need to update serde/table properties
> -----------------------------------------------------------------------------------------------
>
> Key: HUDI-5584
> URL: https://issues.apache.org/jira/browse/HUDI-5584
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: HunterXHunter
> Assignee: HunterXHunter
> Priority: Major
> Labels: pull-request-available
>
> when we set hoodie.datasource.hive_sync.table.strategy='ro', we expect only one table to be synchronized to hive without suffix _ro.
> But sometimes tables have been created in hive early,
> like:
> {code:java}
> create table hive.test.HUDI_5584 (
> id int,
> ts int)
> using hudi
> tblproperties (
> type = 'mor',
> primaryKey = 'id',
> preCombineField = 'ts',
> hoodie.datasource.hive_sync.enable = 'true',
> hoodie.datasource.hive_sync.table.strategy='ro'
> ) location '/tmp/HUDI_5584' {code}
> and show create table .
> {code:java}
> CREATE EXTERNAL TABLE `hudi_5584`(
> `_hoodie_commit_time` string,
> `_hoodie_commit_seqno` string,
> `_hoodie_record_key` string,
> `_hoodie_partition_path` string,
> `_hoodie_file_name` string,
> `id` int,
> `ts` int)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
> 'path'='file:///tmp/HUDI_5584')
> STORED AS INPUTFORMAT
> 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
> 'file:/tmp/HUDI_5584'
> TBLPROPERTIES (
> 'hoodie.datasource.hive_sync.enable'='true',
> 'hoodie.datasource.hive_sync.table.strategy'='ro',
> 'preCombineField'='ts',
> 'primaryKey'='id',
> 'spark.sql.create.version'='3.3.1',
> 'spark.sql.sources.provider'='hudi',
> 'spark.sql.sources.schema.numParts'='1',
> 'spark.sql.sources.schema.part.0'='xx'
> 'transient_lastDdlTime'='1674108302',
> 'type'='mor') {code}
> *The table like a realtime table.*
>
> When we finish writing data and synchronize ro table , because the table already exists, so SERDEPROPERTIES and OUTPUTFORMAT will not be modified.
> This causes the type of the table is not match as expect.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)