You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Yu Watanabe (Jira)" <ji...@apache.org> on 2021/11/30 12:53:00 UTC

[jira] [Commented] (BEAM-12762) java.io.InvalidClassException with Spark 3.1.2

    [ https://issues.apache.org/jira/browse/BEAM-12762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451106#comment-17451106 ] 

Yu Watanabe commented on BEAM-12762:
------------------------------------

I have checked that this still happens in below versions.

[spark-3.2.0-bin-hadoop3.2.tgz|https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz]

[job-server-2.33.0|https://mvnrepository.com/artifact/org.apache.beam/beam-runners-spark-job-server/2.33.0]

[python sdk 2.33.0|https://hub.docker.com/layers/apache/beam_python3.8_sdk/2.33.0/images/sha256-75d1434840e60d8ca69259e203dc5403e16606b8a2a66ab6edd986091c7fccf5?context=explore]

[logs in gist|https://gist.github.com/yuwtennis/e103795375b7183c426427174da66c69]

 

Reproduced using docker.

 
{code:java}
sudo docker run --net=host apache/beam_spark3_job_server:2.33.0 --spark-master-url=spark://localhost:7077 --clean-artifacts-per-job true
sudo docker run --net=host apache/beam_python3.8_sdk:2.33.0 --worker_pool
sudo docker run --net host --rm 17db23d8c652 python __main__.py{code}
 

 

The error in stdout for job server comes right after this line.
{code:java}
21/11/30 11:36:26 INFO org.apache.beam.runners.spark.SparkPipelineRunner: Running job BeamApp-root-1130113625-95d5091f_f3610c2b-56a0-4bcf-b1be-9643512c619c on Spark master spark://localhost:7077 {code}
 

Which corresponds to this line in the code.

https://github.com/apache/beam/blob/release-2.33.0/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java#L133

> java.io.InvalidClassException with Spark 3.1.2
> ----------------------------------------------
>
>                 Key: BEAM-12762
>                 URL: https://issues.apache.org/jira/browse/BEAM-12762
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Kyle Weaver
>            Priority: P3
>
> This was reported on the mailing list.
>  
> ----
>  
> Using spark downloaded from below link,
>  
> [https://www.apache.org/dyn/closer.lua/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz]
>  
> I get below error when submitting a pipeline. 
> Full error is on [https://gist.github.com/yuwtennis/7b0c1dc0dcf98297af1e3179852ca693].
>  
> ------------------------------------------------------------------------------------------------------------------
> 21/08/16 01:10:26 WARN TransportChannelHandler: Exception in connection from /[192.168.11.2:35601|http://192.168.11.2:35601/]
> java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible: stream classdesc serialVersionUID = 3456489343829468865, local class serialVersionUID = 1028182004549731694
> at java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
> ...
> ------------------------------------------------------------------------------------------------------------------
>  
> SDK Harness and Job service are deployed as below.
>  
> 1. SDK Harness
>  
> sudo docker run --net=host apache/beam_spark3_job_server:2.31.0 --spark-master-url=spark://localhost:7077 --clean-artifacts-per-job true
>  
> 2. Job service
>  
> sudo docker run --net=host apache/beam_python3.8_sdk:2.31.0 --worker_pool
>  
> * apache/beam_spark_job_server:2.31.0 for spark 2.4.8
>  
> 3. SDK client code
>  
> [https://gist.github.com/yuwtennis/2e4c13c79f71e8f713e947955115b3e2]
>  Spark 2.4.8 succeeded without any errors using above components.
>  
> [https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)