You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/09/15 01:34:00 UTC

[jira] [Updated] (HUDI-4453) Support partition pruning for tables Bootstrapped from Source Hive Style partitioned tables

     [ https://issues.apache.org/jira/browse/HUDI-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HUDI-4453:
---------------------------------
    Labels: pull-request-available  (was: )

> Support partition pruning for tables Bootstrapped from Source Hive Style partitioned tables
> -------------------------------------------------------------------------------------------
>
>                 Key: HUDI-4453
>                 URL: https://issues.apache.org/jira/browse/HUDI-4453
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Udit Mehrotra
>            Assignee: Ethan Guo
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>
> As of now the *Bootstrap* feature determines the source schema by reading it from the source parquet files => [https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/ParquetBootstrapMetadataHandler.java#L61]
> This does not consider parquet tables which might be Hive style partitioned. Thus, from the source schema partition columns would be missed and not written to the target Hudi table either. Also because of this partition pruning does not work, as we are unable to prune out source partitions. We should improve this logic to determine partition schema correctly from the partition paths in case of hive style partitioned tables and write the partition column values correctly in the target Hudi table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)