You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daniel Zhang <ja...@hotmail.com> on 2019/03/18 15:46:41 UTC
java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT on EMR
Hi,
I know the JIRA of this error (https://issues.apache.org/jira/browse/SPARK-18112), and I read all the comments and even PR for it.
But I am facing this issue on AWS EMR, and only in Oozie Spark Action. I am looking for someone can give me a hint or direction, so I can see if I can overcome this issue on EMR.
I am testing a simple Spark application on EMR-5.12.2, which comes with Hadoop 2.8.3 + HCatalog 2.3.2 + Spark 2.2.1, and using AWS Glue Data Catalog for both Hive + Spark table metadata.
First of all, both Hive and Spark work fine with AWS Glue as metadata catalog. And my spark application works in spark-submit.
[hadoop@ip-172-31-65-232 oozieJobs]$ spark-shell
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.1
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show databases").show
+---------------+
| databaseName|
+---------------+
| default|
|googleanalytics|
| sampledb|
+---------------+
I can access and query the database I created in Glue without any issue on spark-shell or spark-sql.
And as part of later problem, I can see when it works in this case, there is no set of "spark.sql.hive.metastore.version" in spark-shell, as the default value is shown below:
scala> spark.conf.get("spark.sql.hive.metastore.version")
res2: String = 1.2.1
Even though it shows version as "1.2.1", but I knew that by using Glue the hive metastore version will be "2.3.2", I can see "hive-metastore-2.3.2-amzn-1.jar" in the Hive library path.
Now here comes the issue, when I test the Spark code in the Oozie Spark action, and "enableHiveSupport" on the Spark session, it works with spark-submit in the command line, but failed with the following error in the oozie runtime:
ailing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, HIVE_STATS_JDBC_TIMEOUT
java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
at org.apache.spark.sql.hive.HiveUtils$.hiveClientConfigurations(HiveUtils.scala:200)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:265)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
I know this most likely caused by the Oozie runtime classpath, but I spent days of trying and still cannot find out a solution. We use Spark as our core of ETL engine, and the ability to manage and query the HiveCatalog is critical for us.
Here are what puzzled me:
* I know this issue was supposed fixing in Spark 2.2.0, and on this ERM, we are using Spark 2.2.1
* There is 1.2.1 version of hive metastore jar under the spark jars on EMR. Does this mean in the successful spark-shell runtime, spark indeed is using 1.2.1 version of hive-metastore?
[hadoop@ip-172-31-65-232 oozieJobs]$ ls /usr/lib/spark/jars/*hive-meta*
/usr/lib/spark/jars/hive-metastore-1.2.1-spark2-amzn-0.jar
* There is 2.3.2 version of hive metastore jar under the Hive component on this EMR, which I believe it pointing to the Glue, right?
[hadoop@ip-172-31-65-232 oozieJobs]$ ls /usr/lib/hive/lib/*hive-meta*
/usr/lib/hive/lib/hive-metastore-2.3.2-amzn-1.jar /usr/lib/hive/lib/hive-metastore.jar
* I specified the "oozie.action.sharelib.for.spark=spark,hive" in the oozie, and I can see oozie runtime loads the jars from both spark and hive share libs. There is NO hive-metastore-1.2.1-spark2-amzn-0.jar in the oozie SPARK sharelib, and there is indeed hive-metastore-2.3.2-amzn-1.jar in the oozie HIVE sharelib.
* Based on my understanding of (https://issues.apache.org/jira/browse/SPARK-18112), here are what I did so far trying to fix this in oozie runtime, but none of them works
* I added hive-metastore-1.2.1-spark2-amzn-0.jar into hdfs of ozzie spark share lib, and run "oozie admin -sharelibupdate". After that, I confirm this library loaded in the oozie runtime log of my spark action, but I got the same error message.
* I added "--conf spark.sql.hive.metastore.version=2.3.2" in the <spark-opts> of my oozie spark action, and confirm this configuration in spark session, but I still got the same error message above.
* I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf spark.sql.hive.metastore.jars=maven", but still got the same error message
* I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf spark.sql.hive.metastore.jars=/etc/spark/conf/hive-site.xml,/usr/lib/spark/jars/*" in oozie spark action, but got the same error message
* I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf hive.metastore.uris=thrift://ip-172-31-65-232.ec2.internal:9083 --conf spark.sql.hive.metastore.jars=/etc/spark/conf/hive-site.xml,/usr/lib/spark/jars/*" in the oozie spark action, but got the same error.
I run out of options to try, and I really have no idea what is missing in the oozie runtime causing this error in the Spark.
Let me know if you have any idea.
Thanks
Yong