You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Peter Rudenko <pe...@gmail.com> on 2020/02/26 19:49:53 UTC

Spark-3.0 - performance degradation

Facing performance degradation for RDD shuffle jobs in Spark-3.0.
Environment:
Spark-3.0: build from commit ba4212660305c6555ae16b10c6bbaf6114c4d830
Spark-2.4.2: release (just to use scala-2.12, results are the same for
spark-2.4.5)
Spark-terasort:
https://github.com/ehiggs/spark-terasort/tree/a240386988a71eeaff1fe25cfd73e527c69fb7b2
Dataset of size 1800Gb, 20 executors, 25 cores per executor:

3.0 results:
[image: image.png]
2.4.2:
[image: image.png]
Event timeline for 3.0 looks very weird:

[image: image.png]
Compared to 2.4:
[image: image.png]
Everything with default settings. Run several different workloads of
different sizes, with different executors number, but result is the same.
Seems like some scheduling issue in 3.0.

Does someone facing the same issue?

Thanks,
Peter Rudenko

Re:Spark-3.0 - performance degradation

Posted by 大啊 <be...@163.com>.
Can you provide configuration information?



At 2020-02-27 03:49:53, "Peter Rudenko" <pe...@gmail.com> wrote:

Facing performance degradation for RDD shuffle jobs in Spark-3.0.
Environment:
Spark-3.0: build from commit ba4212660305c6555ae16b10c6bbaf6114c4d830

Spark-2.4.2: release (just to use scala-2.12, results are the same for spark-2.4.5)
Spark-terasort: https://github.com/ehiggs/spark-terasort/tree/a240386988a71eeaff1fe25cfd73e527c69fb7b2
Dataset of size 1800Gb, 20 executors, 25 cores per executor:


3.0 results:

2.4.2:


Event timeline for 3.0 looks very weird:




Compared to 2.4:


Everything with default settings. Run several different workloads of different sizes, with different executors number, but result is the same. Seems like some scheduling issue in 3.0.


Does someone facing the same issue?


Thanks,
Peter Rudenko



Re: Spark-3.0 - performance degradation

Posted by beliefer <be...@163.com>.
Can you provide configuration information?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Spark-3.0 - performance degradation

Posted by beliefer <be...@163.com>.
I test it and cannot reproduce the issue.
I build Spark-3.1.0 and Spark2.3.1.
After many tests, it is found that there is little difference between them,
and they win and lose each other.
And from the view of event timeline, Spark-3.1.0 looks more accurate.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Spark-3.0 - performance degradation

Posted by beliefer <be...@163.com>.
Can you show the running configuration information?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Spark-3.0 - performance degradation

Posted by beliefer <be...@163.com>.
Can you provide configuration information?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org