You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by manasdebashiskar <ma...@gmail.com> on 2014/12/12 20:48:45 UTC
Spark 1.2 + Avro does not work in HDP2.2
Hi Experts,
I have recently installed HDP2.2(Depends on hadoop 2.6).
My spark 1.2 is built with hadoop 2.3 profile.
/( mvn -Pyarn -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0 -Phadoop-2.3
-Phive -DskipTests clean package)/
My program has following dependencies
/val avro = "org.apache.avro" % "avro-mapred" %"1.7.7"
val spark = "org.apache.spark" % "spark-core_2.10" % "1.2.0" %
"provided"/
My program to read avro files fails with the following error. What am I
doing wrong?
Thanks
Manas
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
at
org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
-----
Manas Kar
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-Avro-does-not-work-in-HDP2-2-tp20667.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark 1.2 + Avro does not work in HDP2.2
Posted by Sean Owen <so...@cloudera.com>.
Given that the error is
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
...this usually means there is a Hadoop version problem.
But in particular it's
https://issues.apache.org/jira/browse/SPARK-3039 which affects
assembly using the Hive code.
There's a workaround there but I'm not sure what the resolution is.
You may have luck building for hadoop.version=2.5.0-cdh5.2.1, if your
cluster in CDH. The packaging probably harmonizes this correctly even
with the Hive profile, but I have not tested this myself.
On Tue, Dec 16, 2014 at 4:19 PM, manasdebashiskar
<ma...@gmail.com> wrote:
> Hi All,
> I saw some helps online about forcing avro-mapred to hadoop2 using
> classifiers.
>
> Now my configuration is thus
> val avro = "org.apache.avro" % "avro-mapred" % V.avro classifier
> "hadoop2"
>
> How ever I still get java.lang.IncompatibleClassChangeError. I think I am
> not building spark correctly. Clearly the following steps is missing
> something avro related.
>
> /( mvn -Pyarn -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0 -Phadoop-2.3
> -Phive -DskipTests clean package)/
>
>
> *Can someone please help me build spark1.2 for either CDH5.2 or HDP2.2 +
> Hive + Avro *
>
> Thanks
>
>
>
>
>
> -----
> Manas Kar
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-Avro-does-not-work-in-HDP2-2-tp20667p20721.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark 1.2 + Avro does not work in HDP2.2
Posted by manasdebashiskar <ma...@gmail.com>.
Hi All,
I saw some helps online about forcing avro-mapred to hadoop2 using
classifiers.
Now my configuration is thus
val avro = "org.apache.avro" % "avro-mapred" % V.avro classifier
"hadoop2"
How ever I still get java.lang.IncompatibleClassChangeError. I think I am
not building spark correctly. Clearly the following steps is missing
something avro related.
/( mvn -Pyarn -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0 -Phadoop-2.3
-Phive -DskipTests clean package)/
*Can someone please help me build spark1.2 for either CDH5.2 or HDP2.2 +
Hive + Avro *
Thanks
-----
Manas Kar
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-Avro-does-not-work-in-HDP2-2-tp20667p20721.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org