You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Udit Mehta <um...@groupon.com> on 2015/07/01 20:28:35 UTC
Class not found exception : spark on zeppelin
Hi,
I am running spark on zeppeling and trying to create some temp tables to
run sql queries on.
I have json data on hdfs which I am trying to load as a jsonRdd.
Here are my commands:
val data=sc.sequenceFile("/user/ds=01-02-2015/hour=2/*", classOf[Null],
> classOf[org.apache.hadoop.io.Text]).map{case (k,v) => v.toString()}
>
> import org.apache.spark.sql.SQLContext
> val sqlContext = new SQLContext(sc)
> val recordsJson = sqlContext.jsonRDD(data)
>
And here is the error i get which clearly shows its failing on the json rdd
step:
data: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[11] at map at
> <console>:26 import org.apache.spark.sql.SQLContext sqlContext:
> org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@313547c4
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 0.0 (TID 3, gdoop-worker31.snc1): java.lang.ClassNotFoundException:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 at
> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at
> java.lang.Class.forName0(Native Method) at
> java.lang.Class.forName(Class.java:278)
>
I built zeppelin using:
mvn clean package -DskipTests -Pspark-1.3 -Phadoop-2.6
-Dhadoop.version=2.6.0 -Pyarn
mvn clean package -P build-distr -DskipTests
Lastly here are my configs:
Interpreter.json(spark section):
> "id": "2ARHCUUUZ",
> "name": "spark",
> "group": "spark",
> "properties": {
> "spark.executor.memory": "512m",
> "args": "",
> "spark.yarn.jar":
> "hdfs://namenode-vip.snc1:8020/spark/spark-assembly-1.3.1-hadoop2.6.0.jar",
> "spark.cores.max": "",
> "zeppelin.spark.concurrentSQL": "false",
> "zeppelin.spark.useHiveContext": "true",
> "zeppelin.pyspark.python": "python",
> "zeppelin.dep.localrepo": "local-repo",
> "spark.home": "/usr/local/lib/spark-1.3",
> "spark.yarn.am.extraJavaOptions":
> "-Dhdp.version\u003d2.2.0.0-2041",
> "zeppelin.spark.maxResult": "1000",
> "master": "yarn-client",
> "spark.yarn.queue": "public",
> "spark.yarn.access.namenodes":
> "hdfs://namenode1.snc1:8032,hdfs://namenode2.snc1:8032",
> "spark.scheduler.mode": "FAIR",
> "spark.dynamicAllocation.enabled": "false",
> "spark.executor.extraLibraryPath":
> "/usr/lib/hadoop/lib/native/Linux-amd64-64",
> "spark.executor.extraJavaOptions":
> "-Dhdp.version\u003d2.2.0.0-2041",
> "spark.app.name": "Zeppelin",
> "spark.driver.extraLibraryPath":
> "/usr/lib/hadoop/lib/native/Linux-amd64-64",
> "spark.driver.extraJavaOptions": "-Dhdp.version\u003d2.2.0.0-2041"
> }
>
zeppelin-env.sh
> export HADOOP_CONF_DIR=/etc/hadoop/conf
> export
> SPARK_CLASSPATH=/usr/lib/hadoop/lib/*:/usr/lib/hadoop/lib/native/Linux-amd64-64
> export ZEPPELIN_PORT=10020
> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.2.0.0-2041
> -Dspark.jars=/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.2.0.0-2041.jar"
>
Would anyone be able to help with the problem?
Thanks in advance,
Udit