You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Kwan Law (Jira)" <ji...@apache.org> on 2022/05/13 00:55:00 UTC
[jira] (HUDI-1003) Handle partitions correctly when sync non-partitioned table to hive.
[ https://issues.apache.org/jira/browse/HUDI-1003 ]
Kwan Law deleted comment on HUDI-1003:
--------------------------------
was (Author: luoyajun):
[~xleesf] Hi, I'm going to work on this.
> Handle partitions correctly when sync non-partitioned table to hive.
> --------------------------------------------------------------------
>
> Key: HUDI-1003
> URL: https://issues.apache.org/jira/browse/HUDI-1003
> Project: Apache Hudi
> Issue Type: Bug
> Components: hive
> Reporter: leesf
> Assignee: Kwan Law
> Priority: Major
> Labels: newbe, pull-request-available, starter
> Fix For: 0.6.0
>
>
> When sync hudi non-parititioned table to hive with the following options:
> *option("hoodie.datasource.hive_sync.enable", "true").*
> option("hoodie.datasource.hive_sync.table", tableName).
> option("hoodie.datasource.hive_sync.username", "root").
> option("hoodie.datasource.hive_sync.password", "123456").
> option("hoodie.datasource.hive_sync.jdbcurl", "jdbc:hive2://localhost:10000").
> *option("hoodie.datasource.hive_sync.partition_fields", "region,country,city").*
> option("hoodie.datasource.write.operation", writeOperation).
> option("hoodie.datasource.write.table.type", tableType).
> *option("hoodie.datasource.hive_sync.partition_extractor_class", "org.apache.hudi.hive.NonPartitionedExtractor")*
>
> it will create the following tables:
> CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
> `_hoodie_commit_time` string,
> `_hoodie_commit_seqno` string,
> `_hoodie_record_key` string,
> `_hoodie_partition_path` string,
> `_hoodie_file_name` string,
> `age` bigint,
> `location` string,
> `name` string,
> `sex` string,
> `ts` bigint)
> *PARTITIONED BY (*
> *`region` string,*
> *`country` string,*
> *`city` string)*
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
> 'file:/Users/sflee/personal/hudi_java_client_dataset'
> TBLPROPERTIES (
> 'last_commit_time_sync'='20200606200453',
> 'transient_lastDdlTime'='1591445103')
>
> but indeed it has no partition, and would not query any data using select * from hudi_trips_cow_hive_non_partitioned.
> so when user use *NonPartitionedExtractor and set* *hoodie.datasource.hive_sync.partition_fields to some fields,*
> we need throw exception or create proper create like below:**
> CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
> `_hoodie_commit_time` string,
> `_hoodie_commit_seqno` string,
> `_hoodie_record_key` string,
> `_hoodie_partition_path` string,
> `_hoodie_file_name` string,
> `age` bigint,
> `location` string,
> `name` string,
> `sex` string,
> `ts` bigint)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
> 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
> 'file:/Users/sflee/personal/hudi_java_client_dataset'
> TBLPROPERTIES (
> 'last_commit_time_sync'='20200606201124',
> 'transient_lastDdlTime'='1591445493')
>
> *I am incline to create the table normally using correct sql.*
--
This message was sent by Atlassian Jira
(v8.20.7#820007)