You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Harsha HN <99...@gmail.com> on 2014/09/19 00:06:50 UTC

PairRDD's lookup method Performance

Hi All,

My question is related to improving performance of pairRDD's lookup method.
I went through below link where "Tathagata Das
<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=46>"
 explains
creating Hash Map over Partitions using "mappartition" method to get search
performance of O(1).
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-over-hashmap-td893.html

How can this be done in Java? HashMap is not a supported return type for
any overloaded version of "mappartition" methods.

Thanks and Regards,
Harsha

Re: PairRDD's lookup method Performance

Posted by Sean Owen <so...@cloudera.com>.

The product of each mapPartitions call can be an Iterable of one big Map.
You still need to write some extra custom code like what lookup() does to
exploit this data structure.
On Sep 18, 2014 11:07 PM, "Harsha HN" <99...@gmail.com> wrote:

> Hi All,
>
> My question is related to improving performance of pairRDD's lookup
> method. I went through below link where "Tathagata Das
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=46>"  explains
> creating Hash Map over Partitions using "mappartition" method to get search
> performance of O(1).
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-over-hashmap-td893.html
>
> How can this be done in Java? HashMap is not a supported return type for
> any overloaded version of "mappartition" methods.
>
> Thanks and Regards,
> Harsha
>