You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/24 22:26:52 UTC

[GitHub] [hudi] FelixKJose opened a new issue #1875: EMR + Spark Batch job + HUDI + Hive external Metastore (MySQL RDS Instance) failed with No Suitable Driver

FelixKJose opened a new issue #1875:
URL: https://github.com/apache/hudi/issues/1875


   Hello,
   
   I am getting following error while I am using external RDS instance as Hive Metastore. 
   
   **My configuration:**
   
   
   'hoodie.datasource.hive_sync.enable': 'true',
   'hoodie.datasource.hive_sync.database': 'hive_metastore',
   'hoodie.datasource.hive_sync.table': 'calculations',
   'hoodie.datasource.hive_sync.username': 'spark',
   'hoodie.datasource.hive_sync.password': 'password123',
   'hoodie.datasource.hive_sync.jdbcurl': 'jdbc:mysql://*******************.us-east-1.rds.amazonaws.com:3306'
   
   Error Stacktrace:
   `org.apache.hudi.hive.HoodieHiveSyncException: Cannot create hive connection jdbc:mysql://******************.us-east-1.rds.amazonaws.com:3306/
   	at org.apache.hudi.hive.HoodieHiveClient.createHiveConnection(HoodieHiveClient.java:559)
   	at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:108)
   	at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:60)
   	at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:748)
   **Caused by: java.sql.SQLException: No suitable driver found for jdbc:mysql://******************.us-east-1.rds.amazonaws.com:3306**
   	at java.sql.DriverManager.getConnection(DriverManager.java:689)
   	at java.sql.DriverManager.getConnection(DriverManager.java:247)
   	at org.apache.hudi.hive.HoodieHiveClient.createHiveConnection(HoodieHiveClient.java:556)
   	... 35 more`
   
   
   **Environment Description**
    * EMR: 6.0.0
   
   * Hudi version : Custom HUDI Jar (provided by **Udit Mehrotra** for EMR 6.0.0 with performance fixes)
   
   * Spark version : 2.4.4
   
   * Storage (HDFS/S3/GCS..) : S3
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar closed issue #1875: EMR + Spark Batch job + HUDI + Hive external Metastore (MySQL RDS Instance) failed with No Suitable Driver

Posted by GitBox <gi...@apache.org>.

bvaradar closed issue #1875:
URL: https://github.com/apache/hudi/issues/1875


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] FelixKJose commented on issue #1875: EMR + Spark Batch job + HUDI + Hive external Metastore (MySQL RDS Instance) failed with No Suitable Driver

Posted by GitBox <gi...@apache.org>.

FelixKJose commented on issue #1875:
URL: https://github.com/apache/hudi/issues/1875#issuecomment-667433589


   @bvaradar How can I mark this issue as resolved?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #1875: EMR + Spark Batch job + HUDI + Hive external Metastore (MySQL RDS Instance) failed with No Suitable Driver

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #1875:
URL: https://github.com/apache/hudi/issues/1875#issuecomment-663814729


   @umehrot2 @bschell @zhedoubushishi : Can you guys chime in here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #1875: EMR + Spark Batch job + HUDI + Hive external Metastore (MySQL RDS Instance) failed with No Suitable Driver

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #1875:
URL: https://github.com/apache/hudi/issues/1875#issuecomment-668065757


   There should be a "Close and comment" button option at the end. Let me close this ticket for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] FelixKJose commented on issue #1875: EMR + Spark Batch job + HUDI + Hive external Metastore (MySQL RDS Instance) failed with No Suitable Driver

Posted by GitBox <gi...@apache.org>.

FelixKJose commented on issue #1875:
URL: https://github.com/apache/hudi/issues/1875#issuecomment-667433436


   I have got this issue resolved. 
   **Solution:**
   
   Issue was that the JDBC connector/driver jar was missing in Spark Classpath of the EMR master node. Even though EMR documentation says that  MySQL driver is already present in the cluster, its under  /usr/share/java/ folder but not under /usr/lib/spark/jars. 
   So to resolve this issue you have to copy the connector jar to spark class path as follows:
   `sudo cp /usr/share/java/mariadb-connector-java.jar /usr/lib/spark/jars`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org