You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Atul Payapilly (JIRA)" <ji...@apache.org> on 2016/12/15 08:49:58 UTC
[jira] [Created] (SPARK-18879) Spark SQL support for Hive hooks regressed

Atul Payapilly created SPARK-18879:
--------------------------------------

             Summary: Spark SQL support for Hive hooks regressed
                 Key: SPARK-18879
                 URL: https://issues.apache.org/jira/browse/SPARK-18879
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.2, 2.0.0
            Reporter: Atul Payapilly


As per the stack trace from this post: http://ihorbobak.com/index.php/2015/05/08/113/
run on Spark 1.3.1

hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook

FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)

java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook

    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

    at java.lang.Class.forName0(Native Method)

    at java.lang.Class.forName(Class.java:270)

    at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)

    at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)

    at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)

    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)

    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)

    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)

    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)

    at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)

    at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)

    at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)

    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)

    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)

    at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)

    at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)

    at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)

    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)

    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)

    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)

    at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)

    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)

    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)

    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)

    at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)

    at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)

    at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)

    at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)

    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

    at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)

    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:744)

It looks like Spark used to rely on the Hive Driver for execution and supported hive hooks. The current code path does not rely on the Hive Driver and support for Hive hooks regressed. This is problematic, for example, there is no way to tell which partitions were updated as part of a query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org