You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by gisleyt <gi...@cxense.com> on 2015/07/15 17:48:04 UTC

Tasks unevenly distributed in Spark 1.4.0

Hello all,

I upgraded from spark 1.3.1 to 1.4.0, but I'm experiencing a massive drop in
performance for the application I'm running. I've (somewhat) reproduced this
behaviour in the attached file. 

My current spark setup may not be optimal exactly for this reproduction, but
I see that Spark 1.4.0 takes 12 minute to complete, while 1.3.1 finishes in
8 minutes in this test. I've found that when you play about with subtraction
and sampling of JavaRDDs (see attached reproduction test), tasks do not seem
to be properly distributed among the workers when you're doing additional
operations on the data. I derive this from the admin view, where I clearly
see that in 1.4.0, tasks are distributed differently, and specifically, one
task consists of almost all the data, while the other tasks are tiny. 

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n23858/1.jpg> 

Do any of you know of any changes to 1.4.0 that could explain this
behaviour? When submitting the same application to Spark 1.3.1, the tasks
are distributed uniformly, and the application is therefore much quicker.

Thanks,
Gisle

ReproduceHang.java
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n23858/ReproduceHang.java>  



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Tasks-unevenly-distributed-in-Spark-1-4-0-tp23858.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org