You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 21:34:51 UTC

[GitHub] [beam] damccorm opened a new issue, #21092: java.io.InvalidClassException with Spark 3.1.2

damccorm opened a new issue, #21092:
URL: https://github.com/apache/beam/issues/21092

This was reported on the mailing list.
\--\--
Using spark downloaded from below link,
[https://www.apache.org/dyn/closer.lua/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz](https://www.apache.org/dyn/closer.lua/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz)
I get below error when submitting a pipeline.
Full error is on [https://gist.github.com/yuwtennis/7b0c1dc0dcf98297af1e3179852ca693](https://gist.github.com/yuwtennis/7b0c1dc0dcf98297af1e3179852ca693).
\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--
21/08/16 01:10:26 WARN TransportChannelHandler: Exception in connection from /[192.168.11.2:35601](http://192.168.11.2:35601/)
java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible: stream classdesc serialVersionUID = 3456489343829468865, local class serialVersionUID = 1028182004549731694
at java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
...
\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--\--
SDK Harness and Job service are deployed as below.
1. SDK Harness
sudo docker run \--net=host apache/beam_spark3_job_server:2.31.0 \--spark-master-url=spark://localhost:7077 \--clean-artifacts-per-job true
2. Job service
sudo docker run \--net=host apache/beam_python3.8_sdk:2.31.0 \--worker_pool
* apache/beam_spark_job_server:2.31.0 for spark 2.4.8
3. SDK client code
[https://gist.github.com/yuwtennis/2e4c13c79f71e8f713e947955115b3e2](https://gist.github.com/yuwtennis/2e4c13c79f71e8f713e947955115b3e2)
Spark 2.4.8 succeeded without any errors using above components.
[https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz](https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz)

Imported from Jira [BEAM-12762](https://issues.apache.org/jira/browse/BEAM-12762). Original Jira may contain additional context.
Reported by: ibzib.
This issue has child subcomponents which were not migrated over. See the original Jira for more information.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] mosche commented on issue #21092: java.io.InvalidClassException with Spark 3.1.2

Posted by GitBox <gi...@apache.org>.

mosche commented on issue #21092:
URL: https://github.com/apache/beam/issues/21092#issuecomment-1239426855

   @aymanfarhat Just FYI, this is a known Scala issue. You can find more details on the problem [here](https://github.com/scala/bug/issues/5046#issuecomment-928108088) including hints on how to resolve it.
   The original Beam Jira also contains a bit more context, see https://issues.apache.org/jira/browse/BEAM-12762.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] nitinlkoin1984 commented on issue #21092: java.io.InvalidClassException with Spark 3.1.2

Posted by GitBox <gi...@apache.org>.

nitinlkoin1984 commented on issue #21092:
URL: https://github.com/apache/beam/issues/21092#issuecomment-1253911292

   > @aymanfarhat Just FYI, this is a known Scala issue. You can find more details on the problem [here](https://github.com/scala/bug/issues/5046#issuecomment-928108088) including hints on how to resolve it. The original Beam Jira also contains a bit more context, see https://issues.apache.org/jira/browse/BEAM-12762.
   
   From the link you provided, it appears that Beam is using scala v2.12 and when we build beam it is being built using scala2.12.14 which is incompatible with Spark v3.1.2, How do we set the scala version for beam's job server?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] mosche commented on issue #21092: java.io.InvalidClassException with Spark 3.1.2

Posted by GitBox <gi...@apache.org>.

mosche commented on issue #21092:
URL: https://github.com/apache/beam/issues/21092#issuecomment-1262423620

@nitinlkoin1984 I finally found some time to look deeper into this. Sorry for the hassle, finding the job-server in this state is a bit disappointing.

> Also what is the working and tested spark job server version and it's compatible Spark version.

Unfortunately this is a weakness of the existing test infrastructure, it uses Spark in local mode. In that setup such a classpath issue won't be discovered.

Anyways, I've done some testing:

- You can fairly easily build yourself a custom version of the `beam_spark3_job_server` image on the latest Beam master for Spark `3.2.2` (or later). This versions of Spark are using Scala 2.12.15 and don't suffer from the Scala bug causing this. Here's the detailed steps how to do it:
1. Pull the source code of Beam from https://github.com/apache/beam and checkout tag `v.2.14.0`
2. Update the Spark version to `3.2.2` in these two places:
https://github.com/apache/beam/blob/f37795e326a75310828518464189440b14863834/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L496
https://github.com/apache/beam/blob/f37795e326a75310828518464189440b14863834/runners/spark/3/build.gradle#L23
3. In the project directory, run the gradle command to build the docker container:
```
./gradlew :runners:spark:3:job-server:container:docker
```
This will build `apache/beam_spark3_job_server:latest`, it will be available for local use.

- Downgrading Scala in the job-server image to 2.12.10 is also possible, but not as obvious. The Scala version is bumped by a transitive dependency.
1. Append the following lines to [runners/spark/3/job-server/build.gradle](
https://github.com/apache/beam/blob/17453e71a81ba774ab451ad141fc8c21ea8770c9/runners/spark/3/job-server/build.gradle)
```
configurations.runtimeClasspath {
resolutionStrategy {
force "org.scala-lang:scala-library:2.12.10"
}
}
```
2. In the project directory, run the gradle command to build the docker container:
```
./gradlew gradle :runners:spark:3:job-server:clean
./gradlew :runners:spark:3:job-server:container:docker
```
This will build `apache/beam_spark3_job_server:latest`, it will be available for local use.

Alternatively you could build yourself a custom Spark 3.1.2 image that contains Scala 2.12.15 (instead of 2.12.10) on the classpath. But I don't think that's generally a feasible option.

Let me know if any of these options help!

@aromanenko-dev The 2nd option would fix the job-server without having to bump Spark. One the other hand bumping to Spark 3.2.2 seems to be the more robust and longterm solution. But the biggest concern there is the Avro dependency upgrade (1.10). What do you think? Anyone else who could chime in?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] nitinlkoin1984 commented on issue #21092: java.io.InvalidClassException with Spark 3.1.2

Posted by GitBox <gi...@apache.org>.

nitinlkoin1984 commented on issue #21092:
URL: https://github.com/apache/beam/issues/21092#issuecomment-1253870457

   This may be a known issue with Spark 3.1.2. I think the issue goes away with spark 3.1.3 onwards. Can beam be migrated to v3.1.3? I am trying to use Beam's Python SDK with Spark but this issue is preventing me to use it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] nitinlkoin1984 commented on issue #21092: java.io.InvalidClassException with Spark 3.1.2

Posted by GitBox <gi...@apache.org>.

nitinlkoin1984 commented on issue #21092:
URL: https://github.com/apache/beam/issues/21092#issuecomment-1259984716

   > @mosche AFAIK, there is no specific Scala dependencies for Spark portable runner and job server. They are sitting in the same package branch and have the same Spark/Scala dependencies as a native runner. I'm sure that @ibzib should know much better than me.
   > 
   > Also, I'm wondering why Scala version was specifically set to 2.12.15 [here](https://github.com/apache/beam/blob/19f5e62c61dd9b201bc471fe79d109fa8406fff9/runners/spark/spark_runner.gradle#L164) since by default it is that way for Spark 3:
   > 
   > ```
   >   spark_version = '3.1.2'
   >   spark_scala_version = '2.12'
   > ```
   > 
   > Indeed, the right solution would be align the spark/scala versions. The question here is do we need to support the different versions or not?
   
   I tried setting scala to 2.12.10 in spark_runner.gradle; it did not work. Do I have to set scala version somewhere else?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] aromanenko-dev closed issue #21092: java.io.InvalidClassException with Spark 3.1.2

Posted by GitBox <gi...@apache.org>.

aromanenko-dev closed issue #21092: java.io.InvalidClassException with Spark 3.1.2
URL: https://github.com/apache/beam/issues/21092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] mosche commented on issue #21092: java.io.InvalidClassException with Spark 3.1.2

Posted by GitBox <gi...@apache.org>.

mosche commented on issue #21092:
URL: https://github.com/apache/beam/issues/21092#issuecomment-1255281110

   @aromanenko-dev I'm not really familiar with the portable runner / the job server, would you have additional insights here?
   
   As far as I can see, both beam-spark and beam-spark-job-server are using Scala 2.12.15. 
   Spark 3.1.2 / 3.1.3 depend on Scala 2.12.10 with the mentioned binary incompatibility, so I suspect the problem is in fact the cluster running with a broken Scala version.
   
   An option might be the bump Beam to use Spark 3.2.2 or later (with Scala 2.12.15). 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] aromanenko-dev commented on issue #21092: java.io.InvalidClassException with Spark 3.1.2

Posted by GitBox <gi...@apache.org>.

aromanenko-dev commented on issue #21092:
URL: https://github.com/apache/beam/issues/21092#issuecomment-1256255125

   @mosche AFAIK, there is no specific Scala dependencies for Spark portable runner and job server. They are sitting in the same package branch and have the same Spark/Scala dependencies as a native runner. I'm sure that @ibzib should know much better than me.
   
   Also, I'm wondering why Scala version was specifically set to 2.12.15 [here](https://github.com/apache/beam/blob/19f5e62c61dd9b201bc471fe79d109fa8406fff9/runners/spark/spark_runner.gradle#L164) since by default it is that way for Spark 3:
   ```
     spark_version = '3.1.2'
     spark_scala_version = '2.12'
   ```
   
   Indeed, the right solution would be align the spark/scala versions. The question here is do we need to support the different versions or not?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] nitinlkoin1984 commented on issue #21092: java.io.InvalidClassException with Spark 3.1.2

Posted by GitBox <gi...@apache.org>.

nitinlkoin1984 commented on issue #21092:
URL: https://github.com/apache/beam/issues/21092#issuecomment-1259985784

   Also what is the working and testing spark job server version and it's compatible Spark version. If I cannot use beam with Spark 3.1.2; the I will have to downgrade the Spark version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] aymanfarhat commented on issue #21092: java.io.InvalidClassException with Spark 3.1.2

Posted by GitBox <gi...@apache.org>.

aymanfarhat commented on issue #21092:
URL: https://github.com/apache/beam/issues/21092#issuecomment-1236388087

   I can confirm that I'm facing the same issue using a job services based on **apache/beam_spark3_job_server:2.41.0** trying to run a beam pipeline on a [bitnami/spark:3.1](https://github.com/bitnami/containers/blob/main/bitnami/spark/3.1/debian-11/Dockerfile) based spark cluster.
   
   stderror log from the task:
   
   ```
   java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible: stream classdesc serialVersionUID = 3456489343829468865, local class serialVersionUID = 1028182004549731694
   	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
   	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
   	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
   	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
   	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
   	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
   	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
   	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
   	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
   	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
   	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
   	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
   	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:109)
   	at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$deserialize$2(NettyRpcEnv.scala:299)
   	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
   	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:352)
   	at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$deserialize$1(NettyRpcEnv.scala:298)
   	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
   	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:298)
   	at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$askAbortable$7(NettyRpcEnv.scala:246)
   	at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$askAbortable$7$adapted(NettyRpcEnv.scala:246)
   	at org.apache.spark.rpc.netty.RpcOutboxMessage.onSuccess(Outbox.scala:90)
   	at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:195)
   	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)
   	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
   	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
   	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
   	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
   	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
   	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
   	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
   	at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
   	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
   	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
   	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
   	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
   	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
   	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
   	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
   	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
   	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   	at java.lang.Thread.run(Thread.java:750)
   ```
   
   Based on my understanding, this issue is usually linked to having different versions of spark between the version deployed on the cluster and the client submitting the job into the cluster. In this case, I'm guessing the client is the job service. 
   
   I can also confirm, after running a quick test, downloading the same spark sdk (as the cluster version) locally and submitting a job from a local client to the cluster via something like:
   
   ```
   ./bin/spark-submit \ 
        --class org.apache.spark.examples.SparkPi 
        --master spark://localhost:7077  \
        -- ./examples/jars/spark-examples_2.12-3.1.3.jar 10000
   ```
    Works fine. 
   
   I don't know what goes on inside the beam spark job server but this leads me to thinking, that this is probably an incompatibility between what the spark client version is on the beam job server and what the target cluster is running.
   
   Any idea if there is an easy work around, around this? Is there any way to know the exact version of Spark that the `apache/beam_spark_job_server:2.31.0` would support? I could adapt the Spark cluster version to that if needed.
   
   Would greatly appreciate any feedback on this. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org