You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/09/27 19:39:00 UTC

[jira] [Updated] (HUDI-4932) Add a config to allow partition column type inference in bootstrap

     [ https://issues.apache.org/jira/browse/HUDI-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo updated HUDI-4932:
----------------------------
    Summary: Add a config to allow partition column type inference in bootstrap  (was: Add a config to allow partition column inference in bootstrap)

> Add a config to allow partition column type inference in bootstrap
> ------------------------------------------------------------------
>
>                 Key: HUDI-4932
>                 URL: https://issues.apache.org/jira/browse/HUDI-4932
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: bootstrap
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Major
>             Fix For: 0.13.0
>
>
> Currently, we assume that the partition column is always in String type during bootstrap operation.  TestDataSourceForBootstrap.testMetadataBootstrapCOWHiveStylePartitioned fails for date partition column if the type inference of partition column is turned on.
>  
> We need to add a config to allow partition column inference in bootstrap so that other types of partition columns are supported.
>  
> HoodieSparkBootstrapSchemaProvider
> {code:java}
> private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig writeConfig, HoodieEngineContext context, Path filePath) {
>   // NOTE: The type inference of partition column in the parquet table is turned off explicitly,
>   // to be consistent with the existing bootstrap behavior, where the partition column is String
>   // typed in Hudi table.
>   ((HoodieSparkEngineContext) context).getSqlContext()
>       .setConf(SQLConf.PARTITION_COLUMN_TYPE_INFERENCE(), false);
>   StructType parquetSchema = ((HoodieSparkEngineContext) context).getSqlContext().read()
>       .option("basePath", writeConfig.getBootstrapSourceBasePath())
>       .parquet(filePath.toString())
>       .schema(); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)