You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Kwan Law (Jira)" <ji...@apache.org> on 2022/05/13 00:55:00 UTC

[jira] (HUDI-1003) Handle partitions correctly when sync non-partitioned table to hive.

    [ https://issues.apache.org/jira/browse/HUDI-1003 ]


    Kwan Law deleted comment on HUDI-1003:
    --------------------------------

was (Author: luoyajun):
[~xleesf] Hi, I'm going to work on this.

> Handle partitions correctly when sync non-partitioned table to hive.
> --------------------------------------------------------------------
>
>                 Key: HUDI-1003
>                 URL: https://issues.apache.org/jira/browse/HUDI-1003
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: hive
>            Reporter: leesf
>            Assignee: Kwan Law
>            Priority: Major
>              Labels: newbe, pull-request-available, starter
>             Fix For: 0.6.0
>
>
> When sync hudi non-parititioned table to hive with the following options:
> *option("hoodie.datasource.hive_sync.enable", "true").*
> option("hoodie.datasource.hive_sync.table", tableName).
> option("hoodie.datasource.hive_sync.username", "root").
> option("hoodie.datasource.hive_sync.password", "123456").
> option("hoodie.datasource.hive_sync.jdbcurl", "jdbc:hive2://localhost:10000").
> *option("hoodie.datasource.hive_sync.partition_fields", "region,country,city").*
> option("hoodie.datasource.write.operation", writeOperation).
> option("hoodie.datasource.write.table.type", tableType).
> *option("hoodie.datasource.hive_sync.partition_extractor_class", "org.apache.hudi.hive.NonPartitionedExtractor")*
>  
> it will create the following tables:
> CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
>  `_hoodie_commit_time` string,
>  `_hoodie_commit_seqno` string,
>  `_hoodie_record_key` string,
>  `_hoodie_partition_path` string,
>  `_hoodie_file_name` string,
>  `age` bigint,
>  `location` string,
>  `name` string,
>  `sex` string,
>  `ts` bigint)
> *PARTITIONED BY (*
>  *`region` string,*
>  *`country` string,*
>  *`city` string)*
> ROW FORMAT SERDE
>  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
>  'org.apache.hudi.hadoop.HoodieParquetInputFormat'
> OUTPUTFORMAT
>  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>  'file:/Users/sflee/personal/hudi_java_client_dataset'
> TBLPROPERTIES (
>  'last_commit_time_sync'='20200606200453',
>  'transient_lastDdlTime'='1591445103')
>  
> but indeed it has no partition, and would not query any data using select * from  hudi_trips_cow_hive_non_partitioned.
> so when user use *NonPartitionedExtractor and set* *hoodie.datasource.hive_sync.partition_fields to some fields,* 
> we need throw exception or create proper create like below:**
> CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
>  `_hoodie_commit_time` string,
>  `_hoodie_commit_seqno` string,
>  `_hoodie_record_key` string,
>  `_hoodie_partition_path` string,
>  `_hoodie_file_name` string,
>  `age` bigint,
>  `location` string,
>  `name` string,
>  `sex` string,
>  `ts` bigint)
> ROW FORMAT SERDE
>  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
>  'org.apache.hudi.hadoop.HoodieParquetInputFormat'
> OUTPUTFORMAT
>  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>  'file:/Users/sflee/personal/hudi_java_client_dataset'
> TBLPROPERTIES (
>  'last_commit_time_sync'='20200606201124',
>  'transient_lastDdlTime'='1591445493')
>  
> *I am incline to create the table normally using correct sql.*



--
This message was sent by Atlassian Jira
(v8.20.7#820007)