You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/09/02 12:46:46 UTC

[jira] [Resolved] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4

     [ https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-10374.
-------------------------------
    Resolution: Not A Problem

Thanks all, everyone else had much more useful things to say there. It was, sort of, something to do with bringing in mismatched versions from Maven. I think this JIRA itself is a good bit of documentation for this issue.

I also tend to believe that support for Hadoop 1 and 2.0/2.1 is becoming difficult and has problems sometimes, like the problem fixed by a recent change to use reflection in accessing some Hadoop 1 APIs, which means 1.4 was slightly broken with 1.x. 2.0.0 gets even less attention. Until support for these formally goes away it may require footwork to get recent releases to fully work and build with 2.0.0. Anything more than small patches to keep them working may be not worth it.

So that's a long way of saying that, yes I think this doesn't end in a particular change but this serves as a good reminder about the Akka dependency issue.

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-10374
>                 URL: https://issues.apache.org/jira/browse/SPARK-10374
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 1.5.0
>            Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. When I run the driver application, I can hit the following error:
> {code}
> <redacted other messages>… java.lang.UnsupportedOperationException: This is supposed to be overridden by subclasses.
>         at com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
>         at com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark modules. It appears that Spark used to shade its protobuf dependencies and hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However when I ran dependencyInsight again against Spark 1.5 and it looks like protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |    \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> |         +--- compile
> |         \--- org.apache.spark:spark-core_2.10:1.4.1
> |              +--- compile
> |              +--- org.apache.spark:spark-sql_2.10:1.4.1
> |              |    \--- compile
> |              \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |                   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>      \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>      \--- org.apache.spark:spark-core_2.10:1.4.1
>           +--- compile
>           +--- org.apache.spark:spark-sql_2.10:1.4.1
>           |    \--- compile
>           \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>                \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>      \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>           +--- compile
>           +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>           |    \--- compile
>           \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>                \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |    \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> |         +--- compile
> |         \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |              +--- compile
> |              +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |              |    \--- compile
> |              \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |                   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>      \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force protobuf to use 2.5.0, then invoking Hadoop code from my application will break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other hand, forcing protobuf to use version 2.4 breaks spark-core code that is compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are not binary compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org