You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "alberto.scolari" <al...@polimi.it> on 2016/02/23 00:39:29 UTC

Variable performance in Spark threads

Hi everybody,
I am running a Spark job with multiple Map-Reduce iterations on a cluster
with multi-core machines. Inside each machine, I observe variable
performance, with some threads taking 20% more time than others (within the
same machine). I checked that the input size is the same for all the
threads, and the computation does not depend on the input values. The
situation still happens if I run the job in a single machine. It seems to me
a JVM issue, but I hope somebody already experienced it and give some help.
Below, I post the example of one iteration, with the red bars being the
duration of each task. The first group of long red bars are mapPartitions
tasks running separate threads thread, while the following short lines are
the reduce tasks. In the first group (long lines), the execution variability
is clearly visible
Has somebody already seen it? What might the cause be?
Thanks everybody,

Alberto

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26298/anomaly.png> 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Variable-performance-in-Spark-threads-tp26298.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org