You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by anshu shukla <an...@gmail.com> on 2015/07/11 12:02:41 UTC
Rdd partitioning
Suppose i have RDD with 10 tuples and cluster with 100 cores (standalone
mode) the by dafault how the partition will be done.
I did not get how it will divide 20 tuples set (RDD) to 100 cores .(By
default )
Mentioned in documentation -
*spark.default.parallelism*
For distributed shuffle operations likereduceByKey and join, the largest
number of partitions in a parent RDD. For operations likeparallelize with
no parent RDDs, it depends on the cluster manager -
- *Others: total number of cores on all executor nodes or 2, whichever
is larger*
--
Thanks & Regards,
Anshu Shukla