You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Zhun Shen <sh...@gmail.com> on 2014/09/18 04:55:23 UTC

Move Spark configuration from SPARK_CLASSPATH to spark-default.conf , HiveContext went wrong with "Class com.hadoop.compression.lzo.LzoCodec not found"

Hi there,

My product environment is AWS EMR with hadoop2.4.0 and spark1.0.2. I moved the spark configuration in SPARK_CLASSPATH to spark-default.conf,  then the hiveContext went wrong.
I also found WARN info “WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.1.jar.”. But I do not know where the registration was?

content of SPAKR_CLASSPATH:

export SPARK_MASTER_IP=10.187.25.107
export SCALA_HOME=/home/hadoop/.versions/scala-2.10.3
export SPARK_LOCAL_DIRS=/mnt/spark/
export SPARK_CLASSPATH="/usr/share/aws/emr/emr-fs/lib/*:/usr/share/aws/emr/lib/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar"
export SPARK_DAEMON_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps


contents of spark-default.conf:
 spark.master            spark://10.187.25.107:7077
 spark.eventLog.enabled  true
# spark.eventLog.dir      hdfs://namenode:8021/directory
 spark.serializer        org.apache.spark.serializer.KryoSerializer
 spark.local.dir         /mnt/spark/
 spark.executor.memory   10g
 spark.executor.extraLibraryPath "/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar"
# spark.executor.extraClassPath "/usr/share/aws/emr/emr-fs/lib/*:/usr/share/aws/emr/lib/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar"


the error log:

14/09/18 02:28:45 INFO parse.ParseDriver: Parsing command: show tables
14/09/18 02:28:45 INFO parse.ParseDriver: Parse Completed
14/09/18 02:28:45 INFO analysis.Analyzer: Max iterations (2) reached for batch MultiInstanceRelations
14/09/18 02:28:45 INFO analysis.Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences
14/09/18 02:28:45 INFO analysis.Analyzer: Max iterations (2) reached for batch Check Analysis
14/09/18 02:28:45 INFO sql.SQLContext$$anon$1: Max iterations (2) reached for batch Add exchange
14/09/18 02:28:45 INFO sql.SQLContext$$anon$1: Max iterations (2) reached for batch Prepare Expressions
14/09/18 02:28:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=Driver.run>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=compile>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=parse>
14/09/18 02:28:45 INFO parse.ParseDriver: Parsing command: show tables
14/09/18 02:28:45 INFO parse.ParseDriver: Parse Completed
14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=parse start=1411007325561 end=1411007325561 duration=0>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=semanticAnalyze>
14/09/18 02:28:45 INFO ql.Driver: Semantic Analysis Completed
14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=semanticAnalyze start=1411007325561 end=1411007325611 duration=50>
14/09/18 02:28:45 INFO exec.ListSinkOperator: Initializing Self 0 OP
14/09/18 02:28:45 INFO exec.ListSinkOperator: Operator 0 OP initialized
14/09/18 02:28:45 INFO exec.ListSinkOperator: Initialization Done 0 OP
14/09/18 02:28:45 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=compile start=1411007325538 end=1411007325677 duration=139>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=Driver.execute>
14/09/18 02:28:45 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/09/18 02:28:45 INFO ql.Driver: Starting command: show tables
14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=TimeToSubmit start=1411007325538 end=1411007325692 duration=154>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=runTasks>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=task.DDL.Stage-0>
14/09/18 02:28:45 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
14/09/18 02:28:45 INFO metastore.ObjectStore: ObjectStore, initialize called
14/09/18 02:28:45 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.1.jar."
14/09/18 02:28:45 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-core-3.2.2.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/spark/lib/datanucleus-core-3.2.2.jar."
14/09/18 02:28:45 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar."
14/09/18 02:28:46 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
14/09/18 02:28:46 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20
14/09/18 02:28:46 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
14/09/18 02:28:46 INFO metastore.ObjectStore: Initialized ObjectStore
14/09/18 02:28:47 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20
14/09/18 02:28:47 INFO metastore.HiveMetaStore: 0: get_database: default
14/09/18 02:28:47 INFO HiveMetaStore.audit: ugi=hadoop  ip=unknown-ip-addr      cmd=get_database: default
14/09/18 02:28:47 INFO metastore.HiveMetaStore: 0: get_tables: db=default pat=.*
14/09/18 02:28:47 INFO HiveMetaStore.audit: ugi=hadoop  ip=unknown-ip-addr      cmd=get_tables: db=default pat=.*
14/09/18 02:28:47 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
14/09/18 02:28:47 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=task.DDL.Stage-0 start=1411007325692 end=1411007328020 duration=2328>
14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=runTasks start=1411007325692 end=1411007328020 duration=2328>
14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=Driver.execute start=1411007325677 end=1411007328020 duration=2343>
14/09/18 02:28:48 INFO ql.Driver: OK
14/09/18 02:28:48 INFO ql.Driver: <PERFLOG method=releaseLocks>
14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=releaseLocks start=1411007328024 end=1411007328024 duration=0>
14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=Driver.run start=1411007325538 end=1411007328024 duration=2486>
14/09/18 02:28:48 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/09/18 02:28:48 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/09/18 02:28:48 ERROR hive.HiveContext:
======================
HIVE FAILURE OUTPUT
======================
OK

======================
END HIVE FAILURE OUTPUT
======================

java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:551)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1471)
        at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:196)
        at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:163)
        at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
        at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
        at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:38)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:250)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:250)
        at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
        at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:104)
        at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:75)
        at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:78)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
        at $iwC$$iwC$$iwC.<init>(<console>:27)
        at $iwC$$iwC.<init>(<console>:29)
        at $iwC.<init>(<console>:31)
        at <init>(<console>:33)
        at .<init>(<console>:37)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
        at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:223)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:379)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:515)
        ... 56 more
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:219)
        ... 58 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 61 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
        at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
        at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175)
        at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
        ... 66 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
        at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
        ... 68 more


--
Zhun Shen
Data Mining at LightnInTheBox.com
Email: shenzhunallen@gmail.com | shenzhun@yahoo.com
Phone: 186 0627 7769
GitHub: https://github.com/shenzhun
LinkedIn: http://www.linkedin.com/in/shenzhun