You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Johan Verwey <jo...@gmail.com> on 2014/02/17 20:50:06 UTC

Fwd: Why is Spark not using all cores on a single machine?

When I run some of the Apache Spark examples in the Spark-Shell or as a
job, I am not able to achieve full core utilization on a single machine.
For example:

var textColumn = sc.textFile("/home/someuser/verylargefile.txt").cache()

var distinctWordCount = textColumn.flatMap(line => line.split('\0'))

                             .map(word => (word, 1))

                             .reduceByKey(_+_)

                             .count()

When running this script, I mostly see only 1 or 2 active cores on my 8
core machine. Isn't Spark supposed to parallelize this?

This job takes about 15 seconds but most of my cores are idle...

How can I configure spark to utilize all cores?

RE: Fwd: Why is Spark not using all cores on a single machine?

Posted by "Ganelin, Ilya" <Il...@capitalone.com>.
To set the number of spark cores used you must set two parameters in the actual spark-submit script. You must set num-executors (the number of nodes to have) and executor-cores (the number of cores per machinel) . Please see the Spark configuration and tuning pages for more details.


-----Original Message-----
From: ll [duy.huynh.uiv@gmail.com<ma...@gmail.com>]
Sent: Saturday, November 08, 2014 12:05 AM Eastern Standard Time
To: user@spark.incubator.apache.org
Subject: Re: Fwd: Why is Spark not using all cores on a single machine?


hi.  i did use local[8] as below, but it still ran on only 1 core.

val sc = new SparkContext(new
SparkConf().setMaster("local[8]").setAppName("abc"))

any advice is much appreciated.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Why-is-Spark-not-using-all-cores-on-a-single-machine-tp1638p18397.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Fwd: Why is Spark not using all cores on a single machine?

Posted by ll <du...@gmail.com>.
hi.  i did use local[8] as below, but it still ran on only 1 core.

val sc = new SparkContext(new
SparkConf().setMaster("local[8]").setAppName("abc"))

any advice is much appreciated.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Why-is-Spark-not-using-all-cores-on-a-single-machine-tp1638p18397.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Fwd: Why is Spark not using all cores on a single machine?

Posted by Johan Verwey <jo...@gmail.com>.
Thank you!!!


On 17 February 2014 12:09, Ewen Cheslack-Postava <me...@ewencp.org> wrote:

> You need to tell it to use more cores by specifying MASTER=local[N] where
> N is the number of cores you want to use. See the "Initializing Spark" and
> "Master URLs" sections of the scala programming guide:
> http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html
>
> Ewen
>
>  Johan Verwey <jo...@gmail.com>
>  February 17, 2014 at 11:50 AM
>
>

Re: Fwd: Why is Spark not using all cores on a single machine?

Posted by Ewen Cheslack-Postava <me...@ewencp.org>.
You need to tell it to use more cores by specifying MASTER=local[N] 
where N is the number of cores you want to use. See the "Initializing 
Spark" and "Master URLs" sections of the scala programming guide: 
http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html

Ewen
> Johan Verwey <ma...@gmail.com>
> February 17, 2014 at 11:50 AM
>