You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Yishu Lin <yi...@gmail.com> on 2014/03/06 01:00:19 UTC

Is there a way to control where RDD partition physically go to?

Let’s say I have a RDD that represents user’s behavior data. I can shard the RDD to several partitions on user id by HashPartitioner.  Is there any way that I can control to which machine each partition goes to? Or how does Spark distribute partitions onto each machine? Thanks!

Yishu