You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by anshu shukla <an...@gmail.com> on 2015/07/11 12:02:41 UTC

Rdd partitioning

Suppose i have RDD with 10 tuples and cluster  with 100 cores (standalone
mode) the by dafault how the partition will be  done.
I did not get how it will divide 20 tuples set (RDD) to 100 cores .(By
default )

Mentioned in documentation -
*spark.default.parallelism*

For distributed shuffle operations likereduceByKey and join, the largest
number of partitions in a parent RDD. For operations likeparallelize with
no parent RDDs, it depends on the cluster manager -

   - *Others: total number of cores on all executor nodes or 2, whichever
   is larger*



-- 
Thanks & Regards,
Anshu Shukla