You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by "Eugene Sapozhnikov (JIRA)" <ji...@apache.org> on 2015/08/21 21:11:46 UTC

[jira] [Created] (ZEPPELIN-253) EMR Spark deployment: Class com.hadoop.compression.lzo.LzoCodec not found

Eugene Sapozhnikov created ZEPPELIN-253:
-------------------------------------------

             Summary: EMR Spark deployment: Class com.hadoop.compression.lzo.LzoCodec not found
                 Key: ZEPPELIN-253
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-253
             Project: Zeppelin
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.6.0
         Environment: It's Amazon EMR cluster:

AMI version:3.8.0
Hadoop distribution:Amazon 2.4.0
Applications:Hive 0.13.1, Pig 0.12.0, Spark 1.3.1
Zeppelin current clone from git master: 0.6.0-incubating-SNAPSHOT

Contents of zeppelin-env.sh:
export MASTER=yarn-client
export HADOOP_CONF_DIR=/home/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=2 -Dspark.executor.cores=2 -Dspark.executor.memory=1547M -Dspark.default.parallelism=4"
            Reporter: Eugene Sapozhnikov
            Priority: Blocker


Hi,
I am trying to install EMR+Spark+Zeppelin with no luck.

I executed recommendations from https://gist.github.com/andershammar/224e1077021d0ea376dd, everything feels all set, I checked this .sh file line by line.

At the host 'spark-shell' is working okay, my test code executes just fine.
When I enter Zeppelin and try to actually do something in Scala in a notebook I get error below.

Could you tell me what's wrong in connecting Zeppelin to the existing Spark cluster, or head me to some instruction about it. So far the matter of proper configuration is foggy for me.

CODE AND OUTPUT:
val people = sc.textFile("s3://mybucket/storage-archive/run=2015-08-15*")
people.take(10)

people: org.apache.spark.rdd.RDD[String] = s3://mybucket/storage-archive/run=2015-08-15* MapPartitionsRDD[3] at textFile at <console>:23
java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:186)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
...
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
    ... 59 more




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)