You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "In-Ho Yi (Jira)" <ji...@apache.org> on 2022/08/16 22:59:00 UTC

[jira] [Created] (SPARK-40108) JDBC connection to Hive Metastore fails without first calling any .jdbc call

In-Ho Yi created SPARK-40108:
--------------------------------

             Summary: JDBC connection to Hive Metastore fails without first calling any .jdbc call
                 Key: SPARK-40108
                 URL: https://issues.apache.org/jira/browse/SPARK-40108
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.3.0
         Environment: PySpark==3.3.0
Java 11
            Reporter: In-Ho Yi


Tested on pyspark==3.3.0. When talking to hive metastore with MySQL backend, I installed MySQL driver with spark.jars.packages, alongside with other necessary settings:

ss = SparkSession.builder.master('local[*]')\
    .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.3," +
        "org.apache.hadoop:hadoop-common:3.3.3,mysql:mysql-connector-java:8.0.30") \   .config("spark.executor.memory", "10g") \
    .config("spark.driver.memory", "10g") \
    .config("spark.memory.offHeap.enabled","true") \
    .config("spark.memory.offHeap.size","32g")  \
    .config("spark.hadoop.javax.jdo.option.ConnectionURL", "jdbc:mysql://localhost:3306/hive") \
    .config("spark.hadoop.javax.jdo.option.ConnectionUserName", "yyyy") \
    .config("spark.hadoop.javax.jdo.option.ConnectionPassword", "xxxx") \
    .config("spark.hadoop.javax.jdo.option.ConnectionDriverName", "com.mysql.cj.jdbc.Driver") \
    .config("spark.sql.hive.metastore.sharedPrefixes", "com.mysql") \
    .config("spark.sql.warehouse.dir", "s3://xxxx-yyyy/") \
    .enableHiveSupport() \
    .appName("hms_test").config(conf=conf).getOrCreate()

Now, if I just do: ss.sql("SHOW DATABASES;").show() I get a lot of errors, saying:

Unable to open a test connection to the given database. JDBC url = jdbc:mysql://localhost:3306/hive, username = yyyy. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/hive
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:702)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
    at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
    at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
...

However, if I do any "jdbc" read, even if the call ends up in an error, then the call to Hive Metastore seem to succeed without any issue:

try:
    _ = ss.read.format("jdbc") \
        .option("url", "jdbc:mysql://localhost:3306/hive") \
        .option("query", "SHOW TABLES;") \
        .option("driver", "com.mysql.cj.jdbc.Driver").load()
except:
    pass

ss.sql("SHOW DATABASES;").show() # this now works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org