You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Yu Watanabe <yu...@gmail.com> on 2021/08/15 16:36:11 UTC

java.io.InvalidClassException with Spark 3.1.2

Hello .

I would like to ask question for spark runner.

Using spark downloaded from below link,

https://www.apache.org/dyn/closer.lua/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz

I get below error when submitting a pipeline.
Full error is on
https://gist.github.com/yuwtennis/7b0c1dc0dcf98297af1e3179852ca693.

------------------------------------------------------------------------------------------------------------------
21/08/16 01:10:26 WARN TransportChannelHandler: Exception in connection
from /192.168.11.2:35601
java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef;
local class incompatible: stream classdesc serialVersionUID =
3456489343829468865, local class serialVersionUID = 1028182004549731694
at
java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
...
------------------------------------------------------------------------------------------------------------------

SDK Harness and Job service are deployed as below.

1. SDK Harness

sudo docker run --net=host apache/beam_spark3_job_server:2.31.0
--spark-master-url=spark://localhost:7077 --clean-artifacts-per-job true

2. Job service

sudo docker run --net=host apache/beam_python3.8_sdk:2.31.0 --worker_pool

* apache/beam_spark_job_server:2.31.0 for spark 2.4.8

3. SDK client code

https://gist.github.com/yuwtennis/2e4c13c79f71e8f713e947955115b3e2

Spark 2.4.8 succeeded without any errors using above components.

https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz

Would there be any setting which you need to be aware of for spark 3.1.2 ?

Thanks,
Yu Watanabe

-- 
Yu Watanabe

linkedin: www.linkedin.com/in/yuwatanabe1/
twitter:   twitter.com/yuwtennis

Re: java.io.InvalidClassException with Spark 3.1.2

Posted by Yu Watanabe <yu...@gmail.com>.
Sure. I starred your repository.

On Sat, Aug 21, 2021 at 11:27 AM cw <se...@yahoo.com> wrote:

> Hello Yu,
>    i done lot of testing, it only work for spark 2+, not 3. if you need a
> working example on kubernetes,
> https://github.com/cometta/python-apache-beam-spark , feel free to
> improve the code, if you would like to contribute. help me *star if if it
> is useful for you. thank you
>
> On Monday, August 16, 2021, 12:37:46 AM GMT+8, Yu Watanabe <
> yu.w.tennis@gmail.com> wrote:
>
>
> Hello .
>
> I would like to ask question for spark runner.
>
> Using spark downloaded from below link,
>
>
> https://www.apache.org/dyn/closer.lua/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
>
> I get below error when submitting a pipeline.
> Full error is on
> https://gist.github.com/yuwtennis/7b0c1dc0dcf98297af1e3179852ca693.
>
>
> ------------------------------------------------------------------------------------------------------------------
> 21/08/16 01:10:26 WARN TransportChannelHandler: Exception in connection
> from /192.168.11.2:35601
> java.io.InvalidClassException:
> scala.collection.mutable.WrappedArray$ofRef; local class incompatible:
> stream classdesc serialVersionUID = 3456489343829468865, local class
> serialVersionUID = 1028182004549731694
> at
> java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
> ...
>
> ------------------------------------------------------------------------------------------------------------------
>
> SDK Harness and Job service are deployed as below.
>
> 1. SDK Harness
>
> sudo docker run --net=host apache/beam_spark3_job_server:2.31.0
> --spark-master-url=spark://localhost:7077 --clean-artifacts-per-job true
>
> 2. Job service
>
> sudo docker run --net=host apache/beam_python3.8_sdk:2.31.0 --worker_pool
>
> * apache/beam_spark_job_server:2.31.0 for spark 2.4.8
>
> 3. SDK client code
>
> https://gist.github.com/yuwtennis/2e4c13c79f71e8f713e947955115b3e2
>
> Spark 2.4.8 succeeded without any errors using above components.
>
>
> https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz
>
> Would there be any setting which you need to be aware of for spark 3.1.2 ?
>
> Thanks,
> Yu Watanabe
>
> --
> Yu Watanabe
>
> linkedin: www.linkedin.com/in/yuwatanabe1/
> twitter:   twitter.com/yuwtennis
>
>


-- 
Yu Watanabe

linkedin: www.linkedin.com/in/yuwatanabe1/
twitter:   twitter.com/yuwtennis

Re: java.io.InvalidClassException with Spark 3.1.2

Posted by cw <se...@yahoo.com>.
 Hello Yu,   i done lot of testing, it only work for spark 2+, not 3. if you need a working example on kubernetes, https://github.com/cometta/python-apache-beam-spark , feel free to improve the code, if you would like to contribute. help me *star if if it is useful for you. thank you

    On Monday, August 16, 2021, 12:37:46 AM GMT+8, Yu Watanabe <yu...@gmail.com> wrote:  
 
 Hello .
I would like to ask question for spark runner.
Using spark downloaded from below link,
https://www.apache.org/dyn/closer.lua/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz

I get below error when submitting a pipeline. Full error is on https://gist.github.com/yuwtennis/7b0c1dc0dcf98297af1e3179852ca693.
------------------------------------------------------------------------------------------------------------------21/08/16 01:10:26 WARN TransportChannelHandler: Exception in connection from /192.168.11.2:35601
java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible: stream classdesc serialVersionUID = 3456489343829468865, local class serialVersionUID = 1028182004549731694
 at java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
...------------------------------------------------------------------------------------------------------------------
SDK Harness and Job service are deployed as below.
1. SDK Harness
sudo docker run --net=host apache/beam_spark3_job_server:2.31.0 --spark-master-url=spark://localhost:7077 --clean-artifacts-per-job true

2. Job service
sudo docker run --net=host apache/beam_python3.8_sdk:2.31.0 --worker_pool

* apache/beam_spark_job_server:2.31.0 for spark 2.4.8
3. SDK client code
https://gist.github.com/yuwtennis/2e4c13c79f71e8f713e947955115b3e2

Spark 2.4.8 succeeded without any errors using above components.
https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz

Would there be any setting which you need to be aware of for spark 3.1.2 ?
Thanks,Yu Watanabe
-- 
Yu Watanabe

linkedin: www.linkedin.com/in/yuwatanabe1/
twitter:   twitter.com/yuwtennis
      

Re: java.io.InvalidClassException with Spark 3.1.2

Posted by Yu Watanabe <yu...@gmail.com>.
Kyle.

Thank you.

On Tue, Aug 17, 2021 at 5:55 AM Kyle Weaver <kc...@google.com> wrote:

> I was able to reproduce the error. I'm not sure why this would happen,
> since as far as I can tell the Beam 2.31.0 Spark runner should be using
> Spark 3.1.2 and Scala 2.12 [1]. I filed a JIRA issue for it. [2]
>
> [1]
> https://github.com/apache/beam/pull/14897/commits/b6fca2bb79d9e7a69044b477460445456720ec58
> [2] https://issues.apache.org/jira/browse/BEAM-12762
>
>
> On Sun, Aug 15, 2021 at 9:37 AM Yu Watanabe <yu...@gmail.com> wrote:
>
>> Hello .
>>
>> I would like to ask question for spark runner.
>>
>> Using spark downloaded from below link,
>>
>>
>> https://www.apache.org/dyn/closer.lua/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
>>
>> I get below error when submitting a pipeline.
>> Full error is on
>> https://gist.github.com/yuwtennis/7b0c1dc0dcf98297af1e3179852ca693.
>>
>>
>> ------------------------------------------------------------------------------------------------------------------
>> 21/08/16 01:10:26 WARN TransportChannelHandler: Exception in connection
>> from /192.168.11.2:35601
>> java.io.InvalidClassException:
>> scala.collection.mutable.WrappedArray$ofRef; local class incompatible:
>> stream classdesc serialVersionUID = 3456489343829468865, local class
>> serialVersionUID = 1028182004549731694
>> at
>> java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
>> ...
>>
>> ------------------------------------------------------------------------------------------------------------------
>>
>> SDK Harness and Job service are deployed as below.
>>
>> 1. SDK Harness
>>
>> sudo docker run --net=host apache/beam_spark3_job_server:2.31.0
>> --spark-master-url=spark://localhost:7077 --clean-artifacts-per-job true
>>
>> 2. Job service
>>
>> sudo docker run --net=host apache/beam_python3.8_sdk:2.31.0 --worker_pool
>>
>> * apache/beam_spark_job_server:2.31.0 for spark 2.4.8
>>
>> 3. SDK client code
>>
>> https://gist.github.com/yuwtennis/2e4c13c79f71e8f713e947955115b3e2
>>
>> Spark 2.4.8 succeeded without any errors using above components.
>>
>>
>> https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz
>>
>> Would there be any setting which you need to be aware of for spark 3.1.2 ?
>>
>> Thanks,
>> Yu Watanabe
>>
>> --
>> Yu Watanabe
>>
>> linkedin: www.linkedin.com/in/yuwatanabe1/
>> twitter:   twitter.com/yuwtennis
>>
>>
>

-- 
Yu Watanabe

linkedin: www.linkedin.com/in/yuwatanabe1/
twitter:   twitter.com/yuwtennis

Re: java.io.InvalidClassException with Spark 3.1.2

Posted by Kyle Weaver <kc...@google.com>.
I was able to reproduce the error. I'm not sure why this would happen,
since as far as I can tell the Beam 2.31.0 Spark runner should be using
Spark 3.1.2 and Scala 2.12 [1]. I filed a JIRA issue for it. [2]

[1]
https://github.com/apache/beam/pull/14897/commits/b6fca2bb79d9e7a69044b477460445456720ec58
[2] https://issues.apache.org/jira/browse/BEAM-12762


On Sun, Aug 15, 2021 at 9:37 AM Yu Watanabe <yu...@gmail.com> wrote:

> Hello .
>
> I would like to ask question for spark runner.
>
> Using spark downloaded from below link,
>
>
> https://www.apache.org/dyn/closer.lua/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
>
> I get below error when submitting a pipeline.
> Full error is on
> https://gist.github.com/yuwtennis/7b0c1dc0dcf98297af1e3179852ca693.
>
>
> ------------------------------------------------------------------------------------------------------------------
> 21/08/16 01:10:26 WARN TransportChannelHandler: Exception in connection
> from /192.168.11.2:35601
> java.io.InvalidClassException:
> scala.collection.mutable.WrappedArray$ofRef; local class incompatible:
> stream classdesc serialVersionUID = 3456489343829468865, local class
> serialVersionUID = 1028182004549731694
> at
> java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
> ...
>
> ------------------------------------------------------------------------------------------------------------------
>
> SDK Harness and Job service are deployed as below.
>
> 1. SDK Harness
>
> sudo docker run --net=host apache/beam_spark3_job_server:2.31.0
> --spark-master-url=spark://localhost:7077 --clean-artifacts-per-job true
>
> 2. Job service
>
> sudo docker run --net=host apache/beam_python3.8_sdk:2.31.0 --worker_pool
>
> * apache/beam_spark_job_server:2.31.0 for spark 2.4.8
>
> 3. SDK client code
>
> https://gist.github.com/yuwtennis/2e4c13c79f71e8f713e947955115b3e2
>
> Spark 2.4.8 succeeded without any errors using above components.
>
>
> https://archive.apache.org/dist/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz
>
> Would there be any setting which you need to be aware of for spark 3.1.2 ?
>
> Thanks,
> Yu Watanabe
>
> --
> Yu Watanabe
>
> linkedin: www.linkedin.com/in/yuwatanabe1/
> twitter:   twitter.com/yuwtennis
>
>