You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/03/24 15:52:25 UTC
[jira] [Commented] (SPARK-14115) SparkSql + Hive + HBase + Kerberos
doesn't work
[ https://issues.apache.org/jira/browse/SPARK-14115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210337#comment-15210337 ]
Sean Owen commented on SPARK-14115:
-----------------------------------
Looks like you have mismatched HBase libraries. It looks like Hive's dependency doesn't match something else you're using. This sounds like a problem with the build of Spark you're using? rather than a Spark problem per se. The Hbase libs need to be harmonized internally and with your cluster.
> SparkSql + Hive + HBase + Kerberos doesn't work
> -----------------------------------------------
>
> Key: SPARK-14115
> URL: https://issues.apache.org/jira/browse/SPARK-14115
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.1
> Environment: Spark 1.6.1. compiled with hadoop 2.7.1, yarn, hive
> Hadoop 2.7.1 (HDP 2.3)
> Hive 1.2.1 (HDP 2.3)
> Kerberos
> Reporter: Wojciech Indyk
>
> When I try to run SparkSql on hive, where table is defined by HBaseHandler I have an error:
> {code}
> ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(Lorg/apache/hadoop/hbase/client/HConnection;Lorg/apache/hadoop/hbase/security/User;Lorg/apache/hadoop/mapreduce/Job;)V
> java.lang.NoSuchMethodError: org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(Lorg/apache/hadoop/hbase/client/HConnection;Lorg/apache/hadoop/hbase/security/User;Lorg/apache/hadoop/mapreduce/Job;)V
> at org.apache.hadoop.hive.hbase.HBaseStorageHandler.addHBaseDelegationToken(HBaseStorageHandler.java:482)
> at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:427)
> at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:328)
> at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:304)
> at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:323)
> at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
> at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
> at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
> at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
> at scala.Option.map(Option.scala:145)
> at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
> at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220)
> at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
> at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
> at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at org.apache.spark.sql.execution.Sort.doExecute(Sort.scala:64)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at org.apache.spark.sql.execution.aggregate.SortBasedAggregate$$anonfun$doExecute$1.apply(SortBasedAggregate.scala:72)
> at org.apache.spark.sql.execution.aggregate.SortBasedAggregate$$anonfun$doExecute$1.apply(SortBasedAggregate.scala:69)
> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
> at org.apache.spark.sql.execution.aggregate.SortBasedAggregate.doExecute(SortBasedAggregate.scala:69)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
> at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
> at org.apache.spark.sql.execution.columnar.InMemoryRelation.buildBuffers(InMemoryColumnarTableScan.scala:129)
> at org.apache.spark.sql.execution.columnar.InMemoryRelation.<init>(InMemoryColumnarTableScan.scala:118)
> at org.apache.spark.sql.execution.columnar.InMemoryRelation$.apply(InMemoryColumnarTableScan.scala:41)
> at org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:93)
> at org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:60)
> at org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:84)
> at org.apache.spark.sql.DataFrame.persist(DataFrame.scala:1581)
> at org.apache.spark.sql.DataFrame.cache(DataFrame.scala:1590)
> at pl.com.agora.bigdata.recommendations.ranking.App$$anon$1.run(App.scala:40)
> at pl.com.agora.bigdata.recommendations.ranking.App$$anon$1.run(App.scala:35)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> at pl.com.agora.bigdata.recommendations.ranking.App$.main(App.scala:35)
> at pl.com.agora.bigdata.recommendations.ranking.App.main(App.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> {code}
> When I query simmilar table defined as Avro on HDFS all is OK.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org