You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yishu Lin <yi...@gmail.com> on 2014/03/06 01:00:19 UTC
Is there a way to control where RDD partition physically go to?
Let’s say I have a RDD that represents user’s behavior data. I can shard the RDD to several partitions on user id by HashPartitioner. Is there any way that I can control to which machine each partition goes to? Or how does Spark distribute partitions onto each machine? Thanks!
Yishu