You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by David Thomas <dt...@gmail.com> on 2014/03/13 19:50:25 UTC

Round Robin Partitioner

Is it possible to parition the RDD elements in a round robin fashion? Say I
have 5 nodes in the cluster and 5 elements in the RDD. I need to ensure
each element gets mapped to each node in the cluster.

Re: Round Robin Partitioner

Posted by Patrick Wendell <pw...@gmail.com>.

In Spark 1.0 we've added better randomization to the scheduling of
tasks so they are distributed more evenly by default.

https://github.com/apache/spark/commit/556c56689bbc32c6cec0d07b57bd3ec73ceb243e

However having specific policies like that isn't really supported
unless you subclass the RDD itself and override getPreferredLocations.
Keep in mind this is tricky because the set of executors might change
during the lifetime of a Spark job.

- Patrick

On Thu, Mar 13, 2014 at 11:50 AM, David Thomas <dt...@gmail.com> wrote:
> Is it possible to parition the RDD elements in a round robin fashion? Say I
> have 5 nodes in the cluster and 5 elements in the RDD. I need to ensure each
> element gets mapped to each node in the cluster.