You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Wenning Ding (Jira)" <ji...@apache.org> on 2019/11/20 19:10:00 UTC
[jira] [Created] (HUDI-353) Add support for Hive style partitioning
path
Wenning Ding created HUDI-353:
---------------------------------
Summary: Add support for Hive style partitioning path
Key: HUDI-353
URL: https://issues.apache.org/jira/browse/HUDI-353
Project: Apache Hudi (incubating)
Issue Type: Improvement
Reporter: Wenning Ding
In Hive, the partition folder name follows this format: <partition_column_name>=<partition_value>.
But in Hudi, the name of its partition folder is <partition_value>.
e.g. A dataset is partitioned by three columns: year, month and day.
In Hive, the data is saved in: {{.../<table_name>/year=2019/month=05/day=01/xxx.parquet}}
In Hudi, the data is saved in: {{.../<table_name>/2019/05/01/xxx.parquet}}
Basically I add a new option in Spark datasource named {{HIVE_STYLE_PARTITIONING_FILED_OPT_KEY}} which indicates whether using hive style partitioning or not. By default this option is false (not use).
Also, if using hive style partitioning, instead of scanning the dataset and manually adding/updating all partitions, we can use "MSCK REPAIR TABLE <table_name>" to automatically sync all the partition info with Hive MetaStore.
h3.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)