You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sabarish Sasidharan <sa...@manthan.com> on 2016/01/13 04:01:00 UTC

Re:

You could generate as many duplicates with a tag/sequence. And then use a
custom partitioner that uses that tag/sequence in addition to the key to do
the partitioning.

Regards
Sab
On 12-Jan-2016 12:21 am, "Daniel Imberman" <da...@gmail.com>
wrote:

> Hi all,
>
> I'm looking for a way to efficiently partition an RDD, but allow the same
> data to exists on multiple partitions.
>
>
> Lets say I have a key-value RDD with keys {1,2,3,4}
>
> I want to be able to to repartition the RDD so that so the partitions look
> like
>
> p1 = {1,2}
> p2 = {2,3}
> p3 = {3,4}
>
> Locality is important in this situation as I would be doing internal
> comparison values.
>
> Does anyone have any thoughts as to how I could go about doing this?
>
> Thank you
>