You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "tao meng (Jira)" <ji...@apache.org> on 2021/06/28 09:41:00 UTC

[jira] [Created] (HUDI-2089) fix the bug that metatable cannot support non_partition table

tao meng created HUDI-2089:
------------------------------

             Summary: fix the bug that metatable cannot support non_partition table
                 Key: HUDI-2089
                 URL: https://issues.apache.org/jira/browse/HUDI-2089
             Project: Apache Hudi
          Issue Type: Bug
          Components: Spark Integration
    Affects Versions: 0.8.0
         Environment: spark3.1.1
hive3.1.1
hadoop 3.1.1
            Reporter: tao meng
            Assignee: tao meng
             Fix For: 0.9.0


now, we found that when we enable metable for non_partition hudi table,  the follow  error occur:

org.apache.hudi.exception.HoodieMetadataException: Error syncing to metadata table.org.apache.hudi.exception.HoodieMetadataException: Error syncing to metadata table.
 at org.apache.hudi.client.SparkRDDWriteClient.syncTableMetadata(SparkRDDWriteClient.java:447) at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:433) at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:187)

we use hudi 0.8, but we  also find this problem in latest code of hudi

test step:

val df = spark.range(0, 1000).toDF("keyid")
 .withColumn("col3", expr("keyid"))
 .withColumn("age", lit(1))
 .withColumn("p", lit(2))

df.write.format("hudi").
 option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL).
 option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col3").
 option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "keyid").
 option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "").
 option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, "org.apache.hudi.keygen.NonpartitionedKeyGenerator").
 option(DataSourceWriteOptions.OPERATION_OPT_KEY, "insert").
 option("hoodie.insert.shuffle.parallelism", "4").
 option("hoodie.metadata.enable", "true").
 option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
 .mode(SaveMode.Overwrite).save(basePath)

// upsert same record again
df.write.format("hudi").
 option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL).
 option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col3").
 option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "keyid").
 option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "").
 option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, "org.apache.hudi.keygen.NonpartitionedKeyGenerator").
 option(DataSourceWriteOptions.OPERATION_OPT_KEY, "upsert").
 option("hoodie.insert.shuffle.parallelism", "4").
 option("hoodie.metadata.enable", "true").
 option(HoodieWriteConfig.TABLE_NAME, "hoodie_test")
 .mode(SaveMode.Append).save(basePath)

 

org.apache.hudi.exception.HoodieMetadataException: Error syncing to metadata table.org.apache.hudi.exception.HoodieMetadataException: Error syncing to metadata table.
 at org.apache.hudi.client.SparkRDDWriteClient.syncTableMetadata(SparkRDDWriteClient.java:447) at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:433) at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:187) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:564) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:230) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:162) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)