You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Ratika Prasad <rp...@couponsinc.com> on 2015/08/19 18:14:35 UTC

Creating RDD with key and Subkey

Hi,

We have a need where we need the RDD with the following format JavaPairRDD<String,HashMap<String,List<String>>>, mostly RDD with a Key and Subkey kind of a structure, how is that doable in Spark ?

Thanks
R

Re: Creating RDD with key and Subkey

Posted by Ranjana Rajendran <ra...@gmail.com>.
Hi Ratika,

I tried the following:

val l = List("apple", "orange", "banana")

var inner = new scala.collection.mutable.HashMap[String, List[String]]

inner.put("fruits",l)

var list = new scala.collection.mutable.HashMap[String,
scala.collection.mutable.HashMap[String, List[String]]]

list.put("food", inner)

import scala.collection.JavaConverters._

val rdd = sc.parallelize(list.toSeq)

Now, the O(1) look up for value for a key is lost here. See the discussion
below:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-over-hashmap-td893.html

On Wed, Aug 19, 2015 at 10:28 AM, Ratika Prasad <rp...@couponsinc.com>
wrote:

>
> We need to create RDDas below
>
> JavaPairRDD<String,List<HashMap<String,List<String>>>>
>
> The idea is we need to do lookup() on Key which will return a list of hash
> maps kind of structure and then do lookup on subkey which is the key in the
> HashMap returned
>
>
>
> _____________________________
> From: Silas Davis <si...@silasdavis.net>
> Sent: Wednesday, August 19, 2015 10:34 pm
> Subject: Re: Creating RDD with key and Subkey
> To: Ratika Prasad <rp...@couponsinc.com>, <de...@spark.apache.org>
>
>
>
> This should be sent to the user mailing list, I think.
>
> It depends what you want to do with the RDD, so yes you could throw around
> (String, HashMap<String,List<String>>) tuples or perhaps you'd like to be
> able to groupByKey, reduceByKey on the key and sub-key as a composite in
> which case JavaPairRDD<Tuple2<String,String>, List<String>> might be more
> appropriate. Not really clear what you are asking.
>
>
> On Wed, 19 Aug 2015 at 17:15 Ratika Prasad < rprasad@couponsinc.com>
> wrote:
>
>> Hi,
>>
>>
>>
>> We have a need where we need the RDD with the following format
>> JavaPairRDD<String,HashMap<String,List<String>>>, mostly RDD with a Key and
>> Subkey kind of a structure, how is that doable in Spark ?
>>
>>
>>
>> Thanks
>>
>> R
>>
>
>
>

Re: Creating RDD with key and Subkey

Posted by Ratika Prasad <rp...@couponsinc.com>.
We need to create RDDas below

JavaPairRDD<String,List<HashMap<String,List<String>>>>

The idea is we need to do lookup() on Key which will return a list of hash maps kind of structure and then do lookup on subkey which is the key in the HashMap returned



_____________________________
From: Silas Davis <si...@silasdavis.net>>
Sent: Wednesday, August 19, 2015 10:34 pm
Subject: Re: Creating RDD with key and Subkey
To: Ratika Prasad <rp...@couponsinc.com>>, <de...@spark.apache.org>>


This should be sent to the user mailing list, I think.

It depends what you want to do with the RDD, so yes you could throw around (String, HashMap<String,List<String>>) tuples or perhaps you'd like to be able to groupByKey, reduceByKey on the key and sub-key as a composite in which case JavaPairRDD<Tuple2<String,String>, List<String>> might be more appropriate. Not really clear what you are asking.


On Wed, 19 Aug 2015 at 17:15 Ratika Prasad < rprasad@couponsinc.com<ma...@couponsinc.com>> wrote:
Hi,

We have a need where we need the RDD with the following format JavaPairRDD<String,HashMap<String,List<String>>>, mostly RDD with a Key and Subkey kind of a structure, how is that doable in Spark ?

Thanks
R



Re: Creating RDD with key and Subkey

Posted by Silas Davis <si...@silasdavis.net>.
This should be sent to the user mailing list, I think.

It depends what you want to do with the RDD, so yes you could throw around
(String, HashMap<String,List<String>>) tuples or perhaps you'd like to be
able to groupByKey, reduceByKey on the key and sub-key as a composite in
which case JavaPairRDD<Tuple2<String,String>, List<String>> might be more
appropriate. Not really clear what you are asking.


On Wed, 19 Aug 2015 at 17:15 Ratika Prasad <rp...@couponsinc.com> wrote:

> Hi,
>
>
>
> We have a need where we need the RDD with the following format
> JavaPairRDD<String,HashMap<String,List<String>>>, mostly RDD with a Key and
> Subkey kind of a structure, how is that doable in Spark ?
>
>
>
> Thanks
>
> R
>