You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by manasdebashiskar <ma...@gmail.com> on 2014/12/12 20:48:45 UTC

Spark 1.2 + Avro does not work in HDP2.2

Hi Experts, 
 I have recently installed HDP2.2(Depends on hadoop 2.6).
 My spark 1.2 is built with hadoop 2.3 profile. 
/( mvn -Pyarn -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0 -Phadoop-2.3
-Phive -DskipTests clean package)/

 My program has following dependencies
/val avro            = "org.apache.avro" % "avro-mapred" %"1.7.7"    
val spark           = "org.apache.spark" % "spark-core_2.10" % "1.2.0" %
"provided"/

My program to read avro files fails with the following error. What am I
doing wrong?

Thanks
    Manas

java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
	at
org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107)
	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
	at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
	at org.apache.spark.scheduler.Task.run(Task.scala:56)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)



-----
Manas Kar
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-Avro-does-not-work-in-HDP2-2-tp20667.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark 1.2 + Avro does not work in HDP2.2

Posted by Sean Owen <so...@cloudera.com>.
Given that the error is

java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

...this usually means there is a Hadoop version problem.

But in particular it's
https://issues.apache.org/jira/browse/SPARK-3039 which affects
assembly using the Hive code.

There's a workaround there but I'm not sure what the resolution is.

You may have luck building for hadoop.version=2.5.0-cdh5.2.1, if your
cluster in CDH. The packaging probably harmonizes this correctly even
with the Hive profile, but I have not tested this myself.

On Tue, Dec 16, 2014 at 4:19 PM, manasdebashiskar
<ma...@gmail.com> wrote:
> Hi All,
>  I saw some helps online about forcing avro-mapred to hadoop2 using
> classifiers.
>
>  Now my configuration is thus
>  val avro            = "org.apache.avro" % "avro-mapred" % V.avro classifier
> "hadoop2"
>
> How ever I still get java.lang.IncompatibleClassChangeError. I think I am
> not building spark correctly. Clearly the following steps is missing
> something avro related.
>
> /( mvn -Pyarn -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0 -Phadoop-2.3
> -Phive -DskipTests clean package)/
>
>
> *Can someone please help me build spark1.2 for either CDH5.2 or HDP2.2  +
> Hive + Avro *
>
> Thanks
>
>
>
>
>
> -----
> Manas Kar
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-Avro-does-not-work-in-HDP2-2-tp20667p20721.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark 1.2 + Avro does not work in HDP2.2

Posted by manasdebashiskar <ma...@gmail.com>.
Hi All, 
 I saw some helps online about forcing avro-mapred to hadoop2 using
classifiers.

 Now my configuration is thus
 val avro            = "org.apache.avro" % "avro-mapred" % V.avro classifier
"hadoop2" 

How ever I still get java.lang.IncompatibleClassChangeError. I think I am
not building spark correctly. Clearly the following steps is missing
something avro related.

/( mvn -Pyarn -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0 -Phadoop-2.3
-Phive -DskipTests clean package)/


*Can someone please help me build spark1.2 for either CDH5.2 or HDP2.2  +
Hive + Avro *

Thanks





-----
Manas Kar
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-2-Avro-does-not-work-in-HDP2-2-tp20667p20721.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org