You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yann Byron (Jira)" <ji...@apache.org> on 2022/07/28 14:41:00 UTC
[jira] [Assigned] (HUDI-4487) support to create ro/rt table by spark sql
[ https://issues.apache.org/jira/browse/HUDI-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yann Byron reassigned HUDI-4487:
--------------------------------
Assignee: Yann Byron
> support to create ro/rt table by spark sql
> ------------------------------------------
>
> Key: HUDI-4487
> URL: https://issues.apache.org/jira/browse/HUDI-4487
> Project: Apache Hudi
> Issue Type: Improvement
> Components: spark-sql
> Reporter: Yann Byron
> Assignee: Yann Byron
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.12.0
>
>
> Currently, if the ro/rt table is missing, user just create these only by hudi cli, and provide the all schema and properties like the sql below. Because if execute the create-table sql in spark sql, it will convert to rename the table that is not expected like this: [https://github.com/apache/hudi/issues/6004.]
>
> {code:java}
> CREATE EXTERNAL TABLE `mor_tbl1_ro`(
> `_hoodie_commit_time` string,
> `_hoodie_commit_seqno` string,
> `_hoodie_record_key` string,
> `_hoodie_partition_path` string,
> `_hoodie_file_name` string,
> `id` int,
> `name` string,
> `ts` bigint)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
> 'path'='/path/to//mor_tbl1',
> 'hoodie.query.as.ro.table'='true')
> STORED AS INPUTFORMAT
> 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
> '/path/to//mor_tbl1'
> TBLPROPERTIES (
> 'preCombineField'='ts',
> 'primaryKey'='id',
> 'spark.sql.create.version'='3.1.2',
> 'spark.sql.sources.provider'='hudi',
> 'spark.sql.sources.schema.numParts'='1',
> 'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},\{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},\{"name":"id","type":"integer","nullable":true,"metadata":{}},\{"name":"name","type":"string","nullable":true,"metadata":{}},\{"name":"ts","type":"long","nullable":true,"metadata":{}}]}',
> 'transient_lastDdlTime'='1658905080',
> 'type'='mor'
> ); {code}
>
>
> i think hudi can support the simplified way to create ro/rt table in spark-sql in the right way.
> {code:java}
> create EXTERNAL table `mor_tbl1_rt`
> using hudi
> options(`hoodie.query.as.ro.table` = 'false')
> location '/path/to//mor_tbl1';{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)