You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "M. Dale" <me...@yahoo.com.INVALID> on 2015/02/02 23:18:08 UTC
Additional fix for Avro IncompatibleClassChangeError (SPARK-3039)
SPARK-3039 "Spark assembly for new hadoop API (hadoop 2) contains
avro-mapred for hadoop 1 API"
was marked resolved with Spark 1.2.0 release. However, when I download the
pre-built Spark distro for Hadoop 2.4 and later
(spark-1.2.0-bin-hadoop2.4.tgz) and run it
against Avro code compiled against Hadoop 2.4/new Hadoop API I still get:
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
at
org.apache.avro.mapreduce.AvroRecordReaderBase.initialize(AvroRecordReaderBase.java:87)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:135)
TaskAttemptContext was a class in the Hadoop 1.x series but became an
interface
in Hadoop 2.x. Therefore there is a avro-mapred-1.7.6.jar and
avro-mapred-1.7.6-hadoop2.jar. For Hadoop 2.x the
avro-mapred-1.7.6-hadoop2.jar
is needed.
So it seemed that spark-assembly-1.2.0-hadoop2.4.0.jar still did not contain
the org.apache.avro.mapreduce.AvroRecordReaderBase from
avro-mapred-1.7.6-hadoop2.jar.
I then downloaded the source code and compiled with:
mvn -Pyarn -Phadoop-2.4 -Phive-0.13.1 -DskipTests clean package
The hadoop-2.4 profile sets:
<avro.mapred.classifier>hadoop2</avro.mapred.classifier> which through
dependency management should pull in the right hadoop2 version:
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>${avro.version}</version>
<classifier>${avro.mapred.classifier}</classifier>
<exclusions>
However, same IncompatibleClassChangeError after replacing the assembly jar.
I had cleaned my local ~/.m2/repository before the build and found that for
avro-mapred both 1.7.5 (no extension, i.e. hadoop1) and 1.7.6 (hadoop2) had
been downloaded. That seemed a likely culprit.
After installing the created jar files into my local repo (had to handcopy
poms/jars for repl/yarn subprojects) and then running:
mvn -Pyarn -Phadoop-2.4 -Phive-0.13.1 -DskipTests dependency:tree
-Dincludes=org.apache.avro:avro-mapred
Building Spark Project Hive 1.2.0
[INFO]
------------------------------------------------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @
spark-hive_2.10 ---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] | \- org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]
Showed that hive-exec brought in the avro-mapred-1.7.5.jar (hadoop1).
Fix for
spark 1.2.x:
spark-1.2.0/sql/hive/pom.xml
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
<exclusions>
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
<exclusion>
<groupId>com.esotericsoftware.kryo</groupId>
<artifactId>kryo</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
</exclusion>
</exclusions>
</dependency>
Just add the last exclusion for avro-mapred (comparison at
https://github.com/medale/spark/compare/apache:v1.2.1-rc2...medale:avro-hadoop2-v1.2.1-rc2).
I was able to build and run against that fix with Avro code.
Fix for current master: https://github.com/apache/spark/pull/4315
Any feedback much appreciated,
Markus
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org