You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Zhun Shen <sh...@gmail.com> on 2014/09/18 04:55:23 UTC
Move Spark configuration from SPARK_CLASSPATH to
spark-default.conf , HiveContext went wrong with "Class
com.hadoop.compression.lzo.LzoCodec not found"
Hi there,
My product environment is AWS EMR with hadoop2.4.0 and spark1.0.2. I moved the spark configuration in SPARK_CLASSPATH to spark-default.conf, then the hiveContext went wrong.
I also found WARN info “WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.1.jar.”. But I do not know where the registration was?
content of SPAKR_CLASSPATH:
export SPARK_MASTER_IP=10.187.25.107
export SCALA_HOME=/home/hadoop/.versions/scala-2.10.3
export SPARK_LOCAL_DIRS=/mnt/spark/
export SPARK_CLASSPATH="/usr/share/aws/emr/emr-fs/lib/*:/usr/share/aws/emr/lib/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar"
export SPARK_DAEMON_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
contents of spark-default.conf:
spark.master spark://10.187.25.107:7077
spark.eventLog.enabled true
# spark.eventLog.dir hdfs://namenode:8021/directory
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.local.dir /mnt/spark/
spark.executor.memory 10g
spark.executor.extraLibraryPath "/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar"
# spark.executor.extraClassPath "/usr/share/aws/emr/emr-fs/lib/*:/usr/share/aws/emr/lib/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar"
the error log:
14/09/18 02:28:45 INFO parse.ParseDriver: Parsing command: show tables
14/09/18 02:28:45 INFO parse.ParseDriver: Parse Completed
14/09/18 02:28:45 INFO analysis.Analyzer: Max iterations (2) reached for batch MultiInstanceRelations
14/09/18 02:28:45 INFO analysis.Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences
14/09/18 02:28:45 INFO analysis.Analyzer: Max iterations (2) reached for batch Check Analysis
14/09/18 02:28:45 INFO sql.SQLContext$$anon$1: Max iterations (2) reached for batch Add exchange
14/09/18 02:28:45 INFO sql.SQLContext$$anon$1: Max iterations (2) reached for batch Prepare Expressions
14/09/18 02:28:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=Driver.run>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=TimeToSubmit>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=compile>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=parse>
14/09/18 02:28:45 INFO parse.ParseDriver: Parsing command: show tables
14/09/18 02:28:45 INFO parse.ParseDriver: Parse Completed
14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=parse start=1411007325561 end=1411007325561 duration=0>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=semanticAnalyze>
14/09/18 02:28:45 INFO ql.Driver: Semantic Analysis Completed
14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=semanticAnalyze start=1411007325561 end=1411007325611 duration=50>
14/09/18 02:28:45 INFO exec.ListSinkOperator: Initializing Self 0 OP
14/09/18 02:28:45 INFO exec.ListSinkOperator: Operator 0 OP initialized
14/09/18 02:28:45 INFO exec.ListSinkOperator: Initialization Done 0 OP
14/09/18 02:28:45 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=compile start=1411007325538 end=1411007325677 duration=139>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=Driver.execute>
14/09/18 02:28:45 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/09/18 02:28:45 INFO ql.Driver: Starting command: show tables
14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=TimeToSubmit start=1411007325538 end=1411007325692 duration=154>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=runTasks>
14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=task.DDL.Stage-0>
14/09/18 02:28:45 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
14/09/18 02:28:45 INFO metastore.ObjectStore: ObjectStore, initialize called
14/09/18 02:28:45 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.1.jar."
14/09/18 02:28:45 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-core-3.2.2.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/spark/lib/datanucleus-core-3.2.2.jar."
14/09/18 02:28:45 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar."
14/09/18 02:28:46 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
14/09/18 02:28:46 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20
14/09/18 02:28:46 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
14/09/18 02:28:46 INFO metastore.ObjectStore: Initialized ObjectStore
14/09/18 02:28:47 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20
14/09/18 02:28:47 INFO metastore.HiveMetaStore: 0: get_database: default
14/09/18 02:28:47 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_database: default
14/09/18 02:28:47 INFO metastore.HiveMetaStore: 0: get_tables: db=default pat=.*
14/09/18 02:28:47 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_tables: db=default pat=.*
14/09/18 02:28:47 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
14/09/18 02:28:47 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=task.DDL.Stage-0 start=1411007325692 end=1411007328020 duration=2328>
14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=runTasks start=1411007325692 end=1411007328020 duration=2328>
14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=Driver.execute start=1411007325677 end=1411007328020 duration=2343>
14/09/18 02:28:48 INFO ql.Driver: OK
14/09/18 02:28:48 INFO ql.Driver: <PERFLOG method=releaseLocks>
14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=releaseLocks start=1411007328024 end=1411007328024 duration=0>
14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=Driver.run start=1411007325538 end=1411007328024 duration=2486>
14/09/18 02:28:48 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/09/18 02:28:48 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/09/18 02:28:48 ERROR hive.HiveContext:
======================
HIVE FAILURE OUTPUT
======================
OK
======================
END HIVE FAILURE OUTPUT
======================
java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:551)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1471)
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:196)
at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:163)
at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:38)
at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:250)
at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:250)
at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:104)
at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:75)
at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:78)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
at $iwC$$iwC$$iwC.<init>(<console>:27)
at $iwC$$iwC.<init>(<console>:29)
at $iwC.<init>(<console>:31)
at <init>(<console>:33)
at .<init>(<console>:37)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:223)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:379)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:515)
... 56 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:219)
... 58 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 61 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175)
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 66 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
... 68 more
--
Zhun Shen
Data Mining at LightnInTheBox.com
Email: shenzhunallen@gmail.com | shenzhun@yahoo.com
Phone: 186 0627 7769
GitHub: https://github.com/shenzhun
LinkedIn: http://www.linkedin.com/in/shenzhun