You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by t3l <t3...@threelights.de> on 2015/10/20 17:13:24 UTC

Partition for each executor

If I have a cluster with 7 nodes, each having an equal amount of cores and
create an RDD with sc.parallelize() it looks as if the Spark will always
tries to distribute the partitions.

Question:
(1) Is that something I can rely on?

(2) Can I rely that sc.parallelize() will assign partitions to as many
executers as possible. Meaning: Let's say I request 7 partitions, is each
node guaranteed to get 1 of these partitions? If I select 14 partitions, is
each node guaranteed to grab 2 partitions?

P.S.: I am aware that for other cases like sc.hadoopFile, this might depend
in the actual storage location of the data. I am merely asking for the
sc.parallelize() case.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Partition-for-each-executor-tp25141.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Partition for each executor

Posted by Adrian Tanase <at...@adobe.com>.

I think it should use the default parallelism which by default is equal to the number of cores in your cluster.

If you want to control it, specify a value for numSlices - the second param to parallelize().

-adrian



On 10/20/15, 6:13 PM, "t3l" <t3...@threelights.de> wrote:

>If I have a cluster with 7 nodes, each having an equal amount of cores and
>create an RDD with sc.parallelize() it looks as if the Spark will always
>tries to distribute the partitions.
>
>Question:
>(1) Is that something I can rely on?
>
>(2) Can I rely that sc.parallelize() will assign partitions to as many
>executers as possible. Meaning: Let's say I request 7 partitions, is each
>node guaranteed to get 1 of these partitions? If I select 14 partitions, is
>each node guaranteed to grab 2 partitions?
>
>P.S.: I am aware that for other cases like sc.hadoopFile, this might depend
>in the actual storage location of the data. I am merely asking for the
>sc.parallelize() case.
>
>
>
>--
>View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Partition-for-each-executor-tp25141.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org