You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/30 08:13:14 UTC

[GitHub] [hudi] dragonH commented on issue #6832: [SUPPORT] AWS Glue 3.0 fail to write dataset with hudi (hive sync issue)

dragonH commented on issue #6832:
URL: https://github.com/apache/hudi/issues/6832#issuecomment-1263257319

   hi @codope 
   
   thanks for your kindly help
   
   here's my another try
   
   another hudi config (I used `Customer_Sample_Hudi_001` instead of `Customer_Sample_Hudi`)
   
   first, i used `show tables` to list tables
   
   <img width="221" alt="image" src="https://user-images.githubusercontent.com/18332044/193223276-cd667a96-1bab-46cb-a266-00ca31e5f937.png">
   
   we could see that the table `Customer_Sample_Hudi_001` was not there
   
   then used the hudi config below
   
   ```
   hudi_options = {
       'hoodie.table.name': 'Customer_Sample_Hudi_001',
       'hoodie.datasource.write.storage.type': 'COPY_ON_WRITE',
       'hoodie.datasource.write.recordkey.field': 'MA001',
       'hoodie.datasource.write.partitionpath.field': 'MA001',
       'hoodie.datasource.write.table.name': 'Customer_Sample_Hudi_001',
       'hoodie.datasource.write.operation': 'insert_overwrite',
       'hoodie.datasource.write.precombine.field': 'load_timestamp',
       'hoodie.datasource.write.hive_style_partitioning': 'true',
       'hoodie.upsert.shuffle.parallelism': 2,
       'hoodie.insert.shuffle.parallelism': 2,
       'path': 's3://drwu-data-platform-staging-data/Customer_Sample_Hudi_001/',
       'hoodie.datasource.hive_sync.enable': 'true',
       'hoodie.datasource.hive_sync.database': 'default',
       'hoodie.datasource.hive_sync.table': 'Customer_Sample_Hudi_001',
       'hoodie.datasource.hive_sync.partition_fields': 'MA001',
       'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
       'hoodie.datasource.hive_sync.use_jdbc': 'false',
       'hoodie.datasource.hive_sync.mode': 'hms',
       'hoodie.datasource.hive_sync.ignore_exceptions': True
   }
   ```
   
   write the dataframe with the same way
   
   ```
   df \
       .write \
       .format("hudi") \
       .options(**hudi_options) \
       .mode("append") \
       .save()
   ```
   
   get the same error
   
   ```
   Py4JJavaError: An error occurred while calling o249.save.
   : org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing Customer_Sample_Hudi_001
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:118)
   	at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:539)
   	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:595)
   	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:591)
   	at scala.collection.mutable.HashSet.foreach(HashSet.scala:77)
   	at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:591)
   	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:665)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:286)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table Customer_Sample_Hudi_001
   	at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:197)
   	at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:129)
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:115)
   	... 46 more
   Caused by: java.lang.IllegalArgumentException: Partitions must be in the same table
   	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
   	at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.validateInputForBatchCreatePartitions(GlueMetastoreClientDelegate.java:800)
   	at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.batchCreatePartitions(GlueMetastoreClientDelegate.java:736)
   	at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.addPartitions(GlueMetastoreClientDelegate.java:718)
   	at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.add_partitions(AWSCatalogMetastoreClient.java:339)
   	at org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:198)
   	at org.apache.hudi.hive.HoodieHiveClient.addPartitionsToTable(HoodieHiveClient.java:115)
   	at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:346)
   	... 49 more
   ```
   
   one weird thing is that the table was created as this time
   
   <img width="220" alt="image" src="https://user-images.githubusercontent.com/18332044/193223994-cc8daa21-00ce-4051-819e-18f302330475.png">
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org