You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Atul Payapilly (JIRA)" <ji...@apache.org> on 2016/12/15 08:49:58 UTC
[jira] [Created] (SPARK-18879) Spark SQL support for Hive hooks
regressed
Atul Payapilly created SPARK-18879:
--------------------------------------
Summary: Spark SQL support for Hive hooks regressed
Key: SPARK-18879
URL: https://issues.apache.org/jira/browse/SPARK-18879
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.0.2, 2.0.0
Reporter: Atul Payapilly
As per the stack trace from this post: http://ihorbobak.com/index.php/2015/05/08/113/
run on Spark 1.3.1
hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)
at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)
at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
It looks like Spark used to rely on the Hive Driver for execution and supported hive hooks. The current code path does not rely on the Hive Driver and support for Hive hooks regressed. This is problematic, for example, there is no way to tell which partitions were updated as part of a query.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org