You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dagang Wei (JIRA)" <ji...@apache.org> on 2018/10/25 21:21:00 UTC
[jira] [Comment Edited] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

    [ https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664286#comment-16664286 ] 

Dagang Wei edited comment on SPARK-18673 at 10/25/18 9:20 PM:
--------------------------------------------------------------

Is it possible to fix in org.spark-project.hive before SPARK-20202 "Remove references to org.spark-project.hive" is resolved? In my Hadoop depolyment (Hadoop 3.1.0, Hive 3.1.0 and Spark 2.3.1), when I run spark-shell, I got

java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0
 at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
 at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
 at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
 at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
 at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)

After examining the JARs, it turns out that the org.apache.hadoop.hive.shims.ShimLoader class was from <spark-home>/jars/hive-exec-1.2.1.spark2.jar (instead of <hive-home>/lib/hive-shims-common-3.1.0.jar). Could somebody let me know where the source code of hive-exec-1.2.1.spark2.jar is? Or in general how spark fork of hive works, so that I can fix the problem in it.

 


was (Author: functicons):
Is it possible to fix in org.spark-project.hive before SPARK-20202 "Remove references to org.spark-project.hive" is resolved? In my Hadoop depolyment (Hadoop 3.1.0, Hive 3.1.0 and Spark 2.3.1), when I run spark-shell, I got

 java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0
 at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
 at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
 at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
 at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
 at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)

After examining the JARs, it turns out that the org.apache.hadoop.hive.shims.ShimLoader class that spark-shell trying to load was from <spark-home>/jars/hive-exec-1.2.1.spark2.jar (instead of <hive-home>/lib/hive-shims-common-3.1.0.jar). Could somebody let me know where the source code of hive-exec-1.2.1.spark2.jar is? Or in general how spark fork of hive works, so that I can fix the problem in it.

 

> Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version
> ------------------------------------------------------------------
>
>                 Key: SPARK-18673
>                 URL: https://issues.apache.org/jira/browse/SPARK-18673
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>         Environment: Spark built with -Dhadoop.version=3.0.0-alpha2-SNAPSHOT 
>            Reporter: Steve Loughran
>            Priority: Major
>
> Spark Dataframes fail to run on Hadoop 3.0.x, because hive.jar's shimloader considers 3.x to be an unknown Hadoop version.
> Hive itself will have to fix this; as Spark uses its own hive 1.2.x JAR, it will need to be updated to match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org