You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Heuer (JIRA)" <ji...@apache.org> on 2019/07/16 16:20:00 UTC
[jira] [Comment Edited] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V

    [ https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886237#comment-16886237 ] 

Michael Heuer edited comment on SPARK-27781 at 7/16/19 4:19 PM:
----------------------------------------------------------------

I believe I saw a fix for this specific issue, where the avro jars are now added to the Spark binary distribution without Hadoop.  Will look for the pull request.

I cannot let Spark off the hook as easily as you suggest though – Spark is the project that brings these dependencies together, as compile time dependencies and on the runtime classpath.  Spark needs to ensure those dependencies are compatible with each other.


was (Author: heuermh):
I believe I saw a fix for this specific issue, where the avro jars are now added to the Spark binary distribution with out Hadoop.  Will look for the pull request.

I cannot let Spark off the hook as easily as you suggest though – Spark is the project that brings these dependencies together, as compile time dependencies and on the runtime classpath.  Spark needs to ensure those dependencies are compatible with each other.

> Tried to access method org.apache.avro.specific.SpecificData.<init>()V
> ----------------------------------------------------------------------
>
>                 Key: SPARK-27781
>                 URL: https://issues.apache.org/jira/browse/SPARK-27781
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.3
>            Reporter: Michael Heuer
>            Priority: Major
>         Attachments: reproduce.sh
>
>
> It appears that there is a conflict in avro dependency versions at runtime when using Spark 2.4.3 and Scala 2.12 (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop 2.7.7.
>  
> Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes avro-1.8.2.jar
> {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}}
> {{jars/avro-1.8.2.jar}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
> {{jars/avro-ipc-1.8.2.jar}}
>  
> Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop does not
> {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
>  
> Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which conflicts at runtime
> {{$ find hadoop-2.7.7 -name *.jar | grep avro}}
> {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}}
> {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}}
> {{share/hadoop/tools/lib/avro-1.7.4.jar}}
> {{share/hadoop/common/lib/avro-1.7.4.jar}}
> {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}}
>  
> Issue filed downstream in
> [https://github.com/bigdatagenomics/adam/issues/2151]
>  
> Attached a smaller reproducing test case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org