You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "M. Dale" <me...@yahoo.com.INVALID> on 2015/02/02 23:18:08 UTC
Additional fix for Avro IncompatibleClassChangeError (SPARK-3039)

SPARK-3039 "Spark assembly for new hadoop API (hadoop 2) contains 
avro-mapred for hadoop 1 API"
was marked resolved with Spark 1.2.0 release. However, when I download the
pre-built Spark distro for Hadoop 2.4 and later 
(spark-1.2.0-bin-hadoop2.4.tgz) and run it
against Avro code compiled against Hadoop 2.4/new Hadoop API I still get:

java.lang.IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
     at 
org.apache.avro.mapreduce.AvroRecordReaderBase.initialize(AvroRecordReaderBase.java:87)
     at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:135)

TaskAttemptContext was a class in the Hadoop 1.x series but became an 
interface
in Hadoop 2.x. Therefore there is a avro-mapred-1.7.6.jar and
avro-mapred-1.7.6-hadoop2.jar. For Hadoop 2.x the 
avro-mapred-1.7.6-hadoop2.jar
is needed.

So it seemed that spark-assembly-1.2.0-hadoop2.4.0.jar still did not contain
the org.apache.avro.mapreduce.AvroRecordReaderBase from 
avro-mapred-1.7.6-hadoop2.jar.

I then downloaded the source code and compiled with:
mvn -Pyarn -Phadoop-2.4 -Phive-0.13.1 -DskipTests clean package

The hadoop-2.4 profile sets:
<avro.mapred.classifier>hadoop2</avro.mapred.classifier> which through
dependency management should pull in the right hadoop2 version:

<dependency>
         <groupId>org.apache.avro</groupId>
         <artifactId>avro-mapred</artifactId>
         <version>${avro.version}</version>
<classifier>${avro.mapred.classifier}</classifier>
         <exclusions>

However, same IncompatibleClassChangeError after replacing the assembly jar.

I had cleaned my local ~/.m2/repository before the build and found that for
avro-mapred both 1.7.5 (no extension, i.e. hadoop1) and 1.7.6 (hadoop2) had
been downloaded. That seemed a likely culprit.

After installing the created jar files into my local repo (had to handcopy
poms/jars for repl/yarn subprojects) and then running:

mvn -Pyarn -Phadoop-2.4 -Phive-0.13.1 -DskipTests dependency:tree 
-Dincludes=org.apache.avro:avro-mapred

Building Spark Project Hive 1.2.0
[INFO] 
------------------------------------------------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ 
spark-hive_2.10 ---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]

Showed that hive-exec brought in the avro-mapred-1.7.5.jar (hadoop1). 
Fix for
spark 1.2.x:

spark-1.2.0/sql/hive/pom.xml

     <dependency>
       <groupId>org.spark-project.hive</groupId>
       <artifactId>hive-exec</artifactId>
       <version>${hive.version}</version>
       <exclusions>
         <exclusion>
           <groupId>commons-logging</groupId>
           <artifactId>commons-logging</artifactId>
         </exclusion>
         <exclusion>
           <groupId>com.esotericsoftware.kryo</groupId>
           <artifactId>kryo</artifactId>
         </exclusion>
         <exclusion>
           <groupId>org.apache.avro</groupId>
           <artifactId>avro-mapred</artifactId>
         </exclusion>
       </exclusions>
     </dependency>

  Just add the last exclusion for avro-mapred (comparison at 
https://github.com/medale/spark/compare/apache:v1.2.1-rc2...medale:avro-hadoop2-v1.2.1-rc2).
  I was able to build and run against that fix with Avro code.

  Fix for current master: https://github.com/apache/spark/pull/4315

  Any feedback much appreciated,
  Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org