You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/11/20 22:49:00 UTC

[jira] [Updated] (HUDI-353) Add support for Hive style partitioning path

     [ https://issues.apache.org/jira/browse/HUDI-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HUDI-353:
--------------------------------
    Labels: pull-request-available  (was: )

> Add support for Hive style partitioning path
> --------------------------------------------
>
>                 Key: HUDI-353
>                 URL: https://issues.apache.org/jira/browse/HUDI-353
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>            Reporter: Wenning Ding
>            Priority: Major
>              Labels: pull-request-available
>
> In Hive, the partition folder name follows this format: <partition_column_name>=<partition_value>.
> But in Hudi, the name of its partition folder is <partition_value>.
> e.g. A dataset is partitioned by three columns: year, month and day.
> In Hive, the data is saved in: {{.../<table_name>/year=2019/month=05/day=01/xxx.parquet}}
> In Hudi, the data is saved in: {{.../<table_name>/2019/05/01/xxx.parquet}}
> Basically I add a new option in Spark datasource named {{HIVE_STYLE_PARTITIONING_FILED_OPT_KEY}} which indicates whether using hive style partitioning or not. By default this option is false (not use).
> Also, if using hive style partitioning, instead of scanning the dataset and manually adding/updating all partitions, we can use "MSCK REPAIR TABLE <table_name>" to automatically sync all the partition info with Hive MetaStore.
> h3.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)