You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/05/21 22:11:38 UTC

Hive 2.0 on Spark 1.6.1 Engine

Hi,

I usually run Hive 2 on Spark 1..3.1 engine (as opposed using the
default MR or TEZ). I tried to make Hive 2 work with TEZ 0.82 but that did
not do much.

Anyway I will try to make it work.

Today I compiled Spark 1.6.1 from source excluding the Hadoop libraries. I
did this one before for Spark 1.3.1 engine.

I created spark-assembly-1.6.1-hadoop2.4.0.jar file and followed the
process that works for Spark 1.3.1.

This is example with Hive 2 on Spark 1.3.1

Starting Spark Job = 0
Query Hive on Spark job[0] stages:
0
1
Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId:
SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount
[StageCost]
2016-05-21 22:53:45,512 Stage-0_0: 1(+1)/22     Stage-1_0: 0/1
2016-05-21 22:53:47,517 Stage-0_0: 2(+1)/22     Stage-1_0: 0/1


However, when I use Spark 1.6.1 assembly file I got the following error

hive> select count(1) from sales_staging;
Query ID = hduser_20160521224219_dc9aae02-92bd-4279-87e2-98a6458db783
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.spark.rdd.RDDOperationScope$
        at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
        at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:991)
        at
org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:419)
        at
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:205)
        at
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:145)
        at
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:117)
        at
org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.execute(LocalHiveSparkClient.java:130)
        at
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:64)
        at
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:112)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:158)
        at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:101)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1840)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1584)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1361)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
        at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
        at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:778)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:717)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:645)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: Execution Error, return code -101 from
org.apache.hadoop.hive.ql.exec.spark.SparkTask. Could not initialize class
org.apache.spark.rdd.RDDOperationScope$

I am not sure anyone ahs tried this?

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com