You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Udit Mehrotra (Jira)" <ji...@apache.org> on 2022/07/22 23:33:00 UTC

[jira] [Created] (HUDI-4453) Support partition pruning for tables Bootstrapped from Source Hive Style partitioned tables

Udit Mehrotra created HUDI-4453:
-----------------------------------

             Summary: Support partition pruning for tables Bootstrapped from Source Hive Style partitioned tables
                 Key: HUDI-4453
                 URL: https://issues.apache.org/jira/browse/HUDI-4453
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Udit Mehrotra


As of now the *Bootstrap* feature determines the source schema by reading it from the source parquet files => [https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/ParquetBootstrapMetadataHandler.java#L61]

This does not consider parquet tables which might be Hive style partitioned. Thus, from the source schema partition columns would be missed and not written to the target Hudi table either. Also because of this partition pruning does not work, as we are unable to prune out source partitions. We should improve this logic to determine partition schema correctly from the partition paths in case of hive style partitioned tables and write the partition column values correctly in the target Hudi table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)