You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/02 00:44:09 UTC

[GitHub] [hudi] eshu opened a new issue, #5736: [SUPPORT] Hudi 0.11.0 on AWS Glue: Metastore URIs

eshu opened a new issue, #5736:
URL: https://github.com/apache/hudi/issues/5736

   I have a successful job run on AWS Glue with Hudi 0.10.1, but after the migration to Hudi 0.11.0 with the same parameters, I have the exception
   
   ```
   2022-06-01 23:38:53,691 ERROR [spark-listener-group-streams] listeners.QueryLogger$ (QueryLogger.scala:$anonfun$onQueryTerminated$1(16)): Query 9e297e1c-602c-45b0-b28e-86fb672691d5 terminated with error, run id bc49a294-d46e-4c14-8dd2-aa2e311b8421: org.apache.hudi.exception.HoodieException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool
   	at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61)
   	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:622)
   	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:621)
   	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   	at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:621)
   	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:680)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:313)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:163)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
           ...
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.hive.HiveSyncTool
           ...
   Caused by: java.lang.reflect.InvocationTargetException
           ...
   Caused by: : org.apache.hudi.hive.HoodieHiveSyncException: Got runtime exception when hive syncing
           ...
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to create HiveMetaStoreClient
           ...
   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
           ...
   Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
           ...
   Caused by: java.lang.reflect.InvocationTargetException
           ...
   Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
           ...
   Caused by: java.net.ConnectException: Connection refused (Connection refused)
           ...
   ```
   I skipped last parts of the stack trace to make it more readable, if you need any part of it, please let me know.
   
   I think it happens because I need a correct value the parameter `METASTORE_URIS `:
   https://github.com/apache/hudi/blob/eef3f9c74acfe0ebec77694044b416696cfc7c2d/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java#L141
   
   What should I set for AWS Glue? It worked on Hudi 0.10.1, but there was no such parameter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xushiyan commented on issue #5736: [SUPPORT] Hudi 0.11.0 on AWS Glue: Metastore URIs

Posted by GitBox <gi...@apache.org>.

xushiyan commented on issue #5736:
URL: https://github.com/apache/hudi/issues/5736#issuecomment-1148058495

   @eshu glad that you got it resolved. `AWSGlueCatalogSyncClient` is also experimental. I can make a note there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xushiyan closed issue #5736: [SUPPORT] Hudi 0.11.0 on AWS Glue: Metastore URIs

Posted by GitBox <gi...@apache.org>.

xushiyan closed issue #5736: [SUPPORT] Hudi 0.11.0 on AWS Glue: Metastore URIs
URL: https://github.com/apache/hudi/issues/5736


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eshu commented on issue #5736: [SUPPORT] Hudi 0.11.0 on AWS Glue: Metastore URIs

Posted by GitBox <gi...@apache.org>.

eshu commented on issue #5736:
URL: https://github.com/apache/hudi/issues/5736#issuecomment-1144343270

   Solved by changes in the config. Added lines
   ```
   HoodieSyncConfig.META_SYNC_ENABLED -> "true",
   DataSourceWriteOptions.META_SYNC_CLIENT_TOOL_CLASS_NAME -> classOf[AwsGlueCatalogSyncTool].getName
   ```
   and removed the line
   ```
   HiveSyncConfig.HIVE_SYNC_ENABLED -> "true"
   ```
   
   The problem is that `AwsGlueCatalogSyncTool` mentioned as experimental. However, `AWSGlueCatalogSyncClient` do not have any comment about its experimental status. Is it safe to use it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org