You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Akshat Aranya <aa...@gmail.com> on 2014/09/16 22:11:13 UTC

Indexed RDD

Hi,

I'm trying to implement a custom RDD that essentially works as a
distributed hash table, i.e. the key space is split up into partitions and
within a partition, an element can be looked up efficiently by the key.
However, the RDD lookup() function (in PairRDDFunctions) is implemented in
a way iterate through all elements of a partition and find the matching
ones.  Is there a better way to do what I want to do, short of just
implementing new methods on the custom RDD?

Thanks,
Akshat