You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2016/08/16 17:18:20 UTC

[jira] [Created] (SPARK-17088) IsolatedClientLoader fails to load Hive client when sharesHadoopClasses is false

Marcelo Vanzin created SPARK-17088:
--------------------------------------

             Summary: IsolatedClientLoader fails to load Hive client when sharesHadoopClasses is false
                 Key: SPARK-17088
                 URL: https://issues.apache.org/jira/browse/SPARK-17088
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Marcelo Vanzin
            Priority: Minor


There's a bug in a very rare code path in {{IsolatedClientLoader}}:

{code}
          case e: RuntimeException if e.getMessage.contains("hadoop") =>
            // If the error message contains hadoop, it is probably because the hadoop
            // version cannot be resolved (e.g. it is a vendor specific version like
            // 2.0.0-cdh4.1.1). If it is the case, we will try just
            // "org.apache.hadoop:hadoop-client:2.4.0". "org.apache.hadoop:hadoop-client:2.4.0"
            // is used just because we used to hard code it as the hadoop artifact to download.
            logWarning(s"Failed to resolve Hadoop artifacts for the version ${hadoopVersion}. " +
              s"We will change the hadoop version from ${hadoopVersion} to 2.4.0 and try again. " +
              "Hadoop classes will not be shared between Spark and Hive metastore client. " +
              "It is recommended to set jars used by Hive metastore client through " +
              "spark.sql.hive.metastore.jars in the production environment.")
            sharesHadoopClasses = false
{code}

That's the rare part. But when {{sharesHadoopClasses}} is set to false, the instantiation of {{HiveClientImpl}} fails:

{code}
      classLoader
        .loadClass(classOf[HiveClientImpl].getName)
        .getConstructors.head
        .newInstance(version, sparkConf, hadoopConf, config, classLoader, this)
        .asInstanceOf[HiveClient]
{code}

{{hadoopConf}} here is an instance of {{Configuration}} loaded by the main Spark class loader, but in this case {{HiveClientImpl}} expects an instance of {{Configuration}} loaded by the isolated class loader (yay class loaders are fun). So you get an error like this:

{noformat}
2016-08-10 13:51:20.742 - stderr> Exception in thread "main" java.lang.IllegalArgumentException: argument type mismatch
2016-08-10 13:51:20.743 - stderr> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
2016-08-10 13:51:20.743 - stderr> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
2016-08-10 13:51:20.743 - stderr> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2016-08-10 13:51:20.743 - stderr> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
2016-08-10 13:51:20.744 - stderr> 	at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
2016-08-10 13:51:20.744 - stderr> 	at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:354)
2016-08-10 13:51:20.744 - stderr> 	at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:258)
2016-08-10 13:51:20.744 - stderr> 	at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
2016-08-10 13:51:20.745 - stderr> 	at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org