You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daniel Zhang <ja...@hotmail.com> on 2019/03/18 15:46:41 UTC

java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT on EMR

Hi,

I know the JIRA of this error (https://issues.apache.org/jira/browse/SPARK-18112), and I read all the comments and even PR for it.

But I am facing this issue on AWS EMR, and only in Oozie Spark Action. I am looking for someone can give me a hint or direction,  so I can see if I can overcome this issue on EMR.

I am testing a simple Spark application on EMR-5.12.2, which comes with Hadoop 2.8.3 + HCatalog 2.3.2 + Spark 2.2.1, and using AWS Glue Data Catalog for both Hive + Spark table metadata.

First of all, both Hive and Spark work fine with AWS Glue as metadata catalog. And my spark application works in spark-submit.

[hadoop@ip-172-31-65-232 oozieJobs]$ spark-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.1
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show databases").show
+---------------+
|   databaseName|
+---------------+
|        default|
|googleanalytics|
|       sampledb|
+---------------+


I can access and query the database I created in Glue without any issue on spark-shell or spark-sql.
And as part of later problem, I can see when it works in this case, there is no set of "spark.sql.hive.metastore.version" in spark-shell, as the default value is shown below:

scala> spark.conf.get("spark.sql.hive.metastore.version")
res2: String = 1.2.1


Even though it shows version as "1.2.1", but I knew that by using Glue the hive metastore version will be "2.3.2", I can see "hive-metastore-2.3.2-amzn-1.jar" in the Hive library path.

Now here comes the issue, when I test the Spark code in the Oozie Spark action, and "enableHiveSupport" on the Spark session, it works with spark-submit in the command line, but failed with the following error in the oozie runtime:

ailing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, HIVE_STATS_JDBC_TIMEOUT
java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
        at org.apache.spark.sql.hive.HiveUtils$.hiveClientConfigurations(HiveUtils.scala:200)
        at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:265)
        at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
        at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)


I know this most likely caused by the Oozie runtime classpath, but I spent days of trying and still cannot find out a solution. We use Spark as our core of ETL engine, and the ability to manage and query the HiveCatalog is critical for us.

Here are what puzzled me:

  *   I know this issue was supposed fixing in Spark 2.2.0, and on this ERM, we are using Spark 2.2.1
  *   There is 1.2.1 version of hive metastore jar under the spark jars on EMR. Does this mean in the successful spark-shell runtime, spark indeed is using 1.2.1 version of hive-metastore?

[hadoop@ip-172-31-65-232 oozieJobs]$ ls /usr/lib/spark/jars/*hive-meta*
/usr/lib/spark/jars/hive-metastore-1.2.1-spark2-amzn-0.jar

  *   There is 2.3.2 version of hive metastore jar under the Hive component on this EMR, which I believe it pointing to the Glue, right?

[hadoop@ip-172-31-65-232 oozieJobs]$ ls /usr/lib/hive/lib/*hive-meta*
/usr/lib/hive/lib/hive-metastore-2.3.2-amzn-1.jar  /usr/lib/hive/lib/hive-metastore.jar

  *   I specified the "oozie.action.sharelib.for.spark=spark,hive" in the oozie, and I can see oozie runtime loads the jars from both spark and hive share libs. There is NO hive-metastore-1.2.1-spark2-amzn-0.jar in the oozie SPARK sharelib, and there is indeed hive-metastore-2.3.2-amzn-1.jar in the oozie HIVE sharelib.
  *   Based on my understanding of (https://issues.apache.org/jira/browse/SPARK-18112), here are what I did so far trying to fix this in oozie runtime, but none of them works
     *   I added hive-metastore-1.2.1-spark2-amzn-0.jar into hdfs of ozzie spark share lib, and run "oozie admin -sharelibupdate".  After that, I confirm this library loaded in the oozie runtime log of my spark action, but I got the same error message.
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2" in the <spark-opts> of my oozie spark action, and confirm this configuration in spark session, but I still got the same error message above.
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf spark.sql.hive.metastore.jars=maven", but still got the same error message
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf spark.sql.hive.metastore.jars=/etc/spark/conf/hive-site.xml,/usr/lib/spark/jars/*" in oozie spark action, but got the same error message
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf hive.metastore.uris=thrift://ip-172-31-65-232.ec2.internal:9083 --conf spark.sql.hive.metastore.jars=/etc/spark/conf/hive-site.xml,/usr/lib/spark/jars/*" in the oozie spark action, but got the same error.

I run out of options to try, and I really have no idea what is missing in the oozie runtime causing this error in the Spark.

Let me know if you have any idea.

Thanks

Yong