You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "zouxxyy (Jira)" <ji...@apache.org> on 2022/10/19 08:10:00 UTC
[jira] [Updated] (HUDI-5057) Fix msck repair hudi table

     [ https://issues.apache.org/jira/browse/HUDI-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zouxxyy updated HUDI-5057:
--------------------------
    Summary: Fix msck repair hudi table  (was: Fix msck repair table)

> Fix msck repair hudi table
> --------------------------
>
>                 Key: HUDI-5057
>                 URL: https://issues.apache.org/jira/browse/HUDI-5057
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: spark-sql
>    Affects Versions: 0.12.0
>            Reporter: zouxxyy
>            Assignee: zouxxyy
>            Priority: Major
>
> When disable `hoodie.datasource.write.hive_style_partitioning`
> Run `msck repair table` sql  fails to repair the partitions in the file system to the catalog
> For example:
> 1. create table by sparksql
> {code:java}
> create table h0 (
> id int,
> name string,
> ts long,
> dt string) 
> using hudi
> partitioned by (dt)
> location '/tmp/test'
> tblproperties (
> primaryKey = 'id',
> preCombineField = 'ts',
> hoodie.datasource.write.hive_style_partitioning = 'false');{code}
> 2. modify the partitions
> {code:java}
> import org.apache.hudi.DataSourceWriteOptions.{PARTITIONPATH_FIELD, PRECOMBINE_FIELD, RECORDKEY_FIELD}
> import org.apache.hudi.HoodieSparkUtils
> import org.apache.hudi.common.table.HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE
> import org.apache.hudi.config.HoodieWriteConfig.TBL_NAME
> import org.apache.spark.sql.SaveMode
> val df = Seq((1, "a1", 1000, "2022-10-06")).toDF("id", "name", "ts", "dt");
> df.write.format("hudi")
>   .option(RECORDKEY_FIELD.key, "id")
>   .option(PRECOMBINE_FIELD.key, "ts")
>   .option(PARTITIONPATH_FIELD.key, "dt")
>   .option(HIVE_STYLE_PARTITIONING_ENABLE.key, "false")
>   .mode(SaveMode.Append)
>   .save("/tmp/test");{code}
> 3. run msck repair table by sparksql
> {code:java}
> msck repair table h0;{code}
> 4. list partitionNames
> {code:java}
> val table = spark.sessionState.sqlParser.parseTableIdentifier("h0");
> spark.sessionState.catalog.listPartitionNames(table).toArray;{code}
> It should return Array(dt=2022-10-06) but Array()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)