You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/03/24 15:52:25 UTC

[jira] [Commented] (SPARK-14115) SparkSql + Hive + HBase + Kerberos doesn't work

    [ https://issues.apache.org/jira/browse/SPARK-14115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210337#comment-15210337 ] 

Sean Owen commented on SPARK-14115:
-----------------------------------

Looks like you have mismatched HBase libraries. It looks like Hive's dependency doesn't match something else you're using. This sounds like a problem with the build of Spark you're using? rather than a Spark problem per se. The Hbase libs need to be harmonized internally and with your cluster.

> SparkSql + Hive + HBase + Kerberos doesn't work
> -----------------------------------------------
>
>                 Key: SPARK-14115
>                 URL: https://issues.apache.org/jira/browse/SPARK-14115
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>         Environment: Spark 1.6.1. compiled with hadoop 2.7.1, yarn, hive
> Hadoop 2.7.1 (HDP 2.3)
> Hive 1.2.1 (HDP 2.3)
> Kerberos
>            Reporter: Wojciech Indyk
>
> When I try to run SparkSql on hive, where table is defined by HBaseHandler I have an error:
> {code}
> ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(Lorg/apache/hadoop/hbase/client/HConnection;Lorg/apache/hadoop/hbase/security/User;Lorg/apache/hadoop/mapreduce/Job;)V
> java.lang.NoSuchMethodError: org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(Lorg/apache/hadoop/hbase/client/HConnection;Lorg/apache/hadoop/hbase/security/User;Lorg/apache/hadoop/mapreduce/Job;)V
> 	at org.apache.hadoop.hive.hbase.HBaseStorageHandler.addHBaseDelegationToken(HBaseStorageHandler.java:482)
> 	at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:427)
> 	at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:328)
> 	at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:304)
> 	at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:323)
> 	at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
> 	at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
> 	at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
> 	at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
> 	at scala.Option.map(Option.scala:145)
> 	at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
> 	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> 	at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
> 	at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220)
> 	at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
> 	at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
> 	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
> 	at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> 	at org.apache.spark.sql.execution.Sort.doExecute(Sort.scala:64)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> 	at org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> 	at org.apache.spark.sql.execution.aggregate.SortBasedAggregate$$anonfun$doExecute$1.apply(SortBasedAggregate.scala:72)
> 	at org.apache.spark.sql.execution.aggregate.SortBasedAggregate$$anonfun$doExecute$1.apply(SortBasedAggregate.scala:69)
> 	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
> 	at org.apache.spark.sql.execution.aggregate.SortBasedAggregate.doExecute(SortBasedAggregate.scala:69)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> 	at org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> 	at org.apache.spark.sql.execution.columnar.InMemoryRelation.buildBuffers(InMemoryColumnarTableScan.scala:129)
> 	at org.apache.spark.sql.execution.columnar.InMemoryRelation.<init>(InMemoryColumnarTableScan.scala:118)
> 	at org.apache.spark.sql.execution.columnar.InMemoryRelation$.apply(InMemoryColumnarTableScan.scala:41)
> 	at org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:93)
> 	at org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:60)
> 	at org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:84)
> 	at org.apache.spark.sql.DataFrame.persist(DataFrame.scala:1581)
> 	at org.apache.spark.sql.DataFrame.cache(DataFrame.scala:1590)
> 	at pl.com.agora.bigdata.recommendations.ranking.App$$anon$1.run(App.scala:40)
> 	at pl.com.agora.bigdata.recommendations.ranking.App$$anon$1.run(App.scala:35)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:360)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> 	at pl.com.agora.bigdata.recommendations.ranking.App$.main(App.scala:35)
> 	at pl.com.agora.bigdata.recommendations.ranking.App.main(App.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> {code}
> When I query simmilar table defined as Avro on HDFS all is OK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org