You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by mod0 <mb...@uwaterloo.ca> on 2014/11/13 01:33:21 UTC

spark.parallelize seems broken on type

Interesting result here. I'm trying to parallelize a list for some simple
tests with spark and Ganglia. It seems that spark.parallelize doesn't create
partitions except for on the master node on our cluster. The image below
shows the CPU utilization per node over three tests. The first two compute
Pi as in the spark example with:

spark.parallelize(1 to N, num_nodes*num_cores*beta) // for beta = 3

The third test uses a list with N elements, again with the same number of
partitions, but as you can see in the image, the only CPU utilization is the
blue curve for the master node. The list is created thusly:

 44   def rand_list(numel: Int, max: Int) =
 45   {
 46     val L = List.fill(numel){max}
 47     L.map(_ => Math.random())
 48   }
spark.parallelize(rand_list(N,N), num_nodes*num_cores*beta) // for beta = 3

I don't know why this would be---unless there is some note that List types
should not be used with parallelize?

A penny for your thoughts.

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n18782/cpu.png> 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-parallelize-seems-broken-on-type-List-tp18782.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org