You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Patrick McGloin <mc...@gmail.com> on 2014/09/29 17:41:14 UTC

Spark SQL + Hive + JobConf NoClassDefFoundError

Hi,

I have an error when submitting a Spark SQL application to our Spark
cluster:

14/09/29 16:02:11 WARN scheduler.TaskSetManager: Loss was due to
java.lang.NoClassDefFoundError
*java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf*
        at
org.apache.spark.sql.hive.SparkHiveHadoopWriter.setIDs(SparkHadoopWriter.scala:169)
        at
org.apache.spark.sql.hive.SparkHiveHadoopWriter.setup(SparkHadoopWriter.scala:69)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org
$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(hiveOperators.scala:260)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
        at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

I assume this is because the Executor does not have the hadoop-core.jar
file.  I've tried adding it to the SparkContext using addJar but this
didn't help.

I also see that the documentation says you must rebuild Spark if you want
to use Hive.

https://spark.apache.org/docs/1.0.2/sql-programming-guide.html#hive-tables

Is this really true or can we just package the jar files with the Spark
Application we build?  Rebuilding Spark itself isn't possible for us as it
is installed on a VM without internet access and we are using the Cloudera
distribution (Spark 1.0).

Is it possible to assemble the Hive dependencies into our Spark Application
and submit this to the cluster?  I've tried to do this with spark-submit
(and the Hadoop JobConf class is in AAC-assembly-1.0.jar) but the Executor
doesn't find the class.  Here is the command:

sudo ./spark-submit --class aac.main.SparkDriver --master
spark://localhost:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar

Any pointers would be appreciated!

Best regards,
Patrick

Re: Spark SQL + Hive + JobConf NoClassDefFoundError

Posted by Patrick McGloin <mc...@gmail.com>.
FYI, in case anybody else has this problem, we switched to Spark 1.1
(outside CDH) and the same Spark application worked first time (once
recompiled with Spark 1.1 libs of course).  I assume this is because Spark
1.1 is compiled with Hive.

On 29 September 2014 17:41, Patrick McGloin <mc...@gmail.com>
wrote:

> Hi,
>
> I have an error when submitting a Spark SQL application to our Spark
> cluster:
>
> 14/09/29 16:02:11 WARN scheduler.TaskSetManager: Loss was due to
> java.lang.NoClassDefFoundError
> *java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf*
>         at
> org.apache.spark.sql.hive.SparkHiveHadoopWriter.setIDs(SparkHadoopWriter.scala:169)
>         at
> org.apache.spark.sql.hive.SparkHiveHadoopWriter.setup(SparkHadoopWriter.scala:69)
>         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org
> $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(hiveOperators.scala:260)
>         at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
>         at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>         at org.apache.spark.scheduler.Task.run(Task.scala:51)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> I assume this is because the Executor does not have the hadoop-core.jar
> file.  I've tried adding it to the SparkContext using addJar but this
> didn't help.
>
> I also see that the documentation says you must rebuild Spark if you want
> to use Hive.
>
> https://spark.apache.org/docs/1.0.2/sql-programming-guide.html#hive-tables
>
> Is this really true or can we just package the jar files with the Spark
> Application we build?  Rebuilding Spark itself isn't possible for us as it
> is installed on a VM without internet access and we are using the Cloudera
> distribution (Spark 1.0).
>
> Is it possible to assemble the Hive dependencies into our Spark
> Application and submit this to the cluster?  I've tried to do this with
> spark-submit (and the Hadoop JobConf class is in AAC-assembly-1.0.jar) but
> the Executor doesn't find the class.  Here is the command:
>
> sudo ./spark-submit --class aac.main.SparkDriver --master
> spark://localhost:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar
>
> Any pointers would be appreciated!
>
> Best regards,
> Patrick
>
>
>