You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ReticulatedPython <pe...@gmail.com> on 2014/11/09 14:18:31 UTC

Why does this siimple spark program uses only one core?

So, I'm running this simple program on a 16 core multicore system. I run it
by issuing the following.

spark-submit --master local[*] pi.py

And the code of that program is the following. When I use top to see CPU
consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
documentation says that the default parallelism is contained in property
spark.default.parallelism. How can I read this property from within my
python program?

#"""pi.py"""
from pyspark import SparkContext
import random

NUM_SAMPLES = 12500000

def sample(p):
    x, y = random.random(), random.random()
    return 1 if x*x + y*y < 1 else 0
	
sc = SparkContext("local", "Test App")
count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
b: a + b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Why does this siimple spark program uses only one core?

Posted by Matei Zaharia <ma...@gmail.com>.

Call getNumPartitions() on your RDD to make sure it has the right number of partitions. You can also specify it when doing parallelize, e.g.

rdd = sc.parallelize(xrange(1000), 10))

This should run in parallel if you have multiple partitions and cores, but it might be that during part of the process only one node (e.g. the master process) is doing anything.

Matei


> On Nov 9, 2014, at 9:27 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote:
> 
> You can set the following entry inside the conf/spark-defaults.conf file 
> 
> spark.cores.max 16
> 
> If you want to read the default value, then you can use the following api call
> 
> sc.defaultParallelism
> 
> where sc is your sparkContext object.
> 
> Thanks
> Best Regards
> 
> On Sun, Nov 9, 2014 at 6:48 PM, ReticulatedPython <person.of.book@gmail.com <ma...@gmail.com>> wrote:
> So, I'm running this simple program on a 16 core multicore system. I run it
> by issuing the following.
> 
> spark-submit --master local[*] pi.py
> 
> And the code of that program is the following. When I use top to see CPU
> consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
> documentation says that the default parallelism is contained in property
> spark.default.parallelism. How can I read this property from within my
> python program?
> 
> #"""pi.py"""
> from pyspark import SparkContext
> import random
> 
> NUM_SAMPLES = 12500000
> 
> def sample(p):
>     x, y = random.random(), random.random()
>     return 1 if x*x + y*y < 1 else 0
> 
> sc = SparkContext("local", "Test App")
> count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
> b: a + b)
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html <http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org <ma...@spark.apache.org>
> 
>

Re: Why does this siimple spark program uses only one core?

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

You can set the following entry inside the conf/spark-defaults.conf file

spark.cores.max 16


If you want to read the default value, then you can use the following api
call

*sc*.defaultParallelism

where *sc* is your sparkContext object.


Thanks
Best Regards

On Sun, Nov 9, 2014 at 6:48 PM, ReticulatedPython <pe...@gmail.com>
wrote:

> So, I'm running this simple program on a 16 core multicore system. I run it
> by issuing the following.
>
> spark-submit --master local[*] pi.py
>
> And the code of that program is the following. When I use top to see CPU
> consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
> documentation says that the default parallelism is contained in property
> spark.default.parallelism. How can I read this property from within my
> python program?
>
> #"""pi.py"""
> from pyspark import SparkContext
> import random
>
> NUM_SAMPLES = 12500000
>
> def sample(p):
>     x, y = random.random(), random.random()
>     return 1 if x*x + y*y < 1 else 0
>
> sc = SparkContext("local", "Test App")
> count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
> b: a + b)
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>