You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shams ul Haque <sh...@cashcare.in> on 2015/10/27 11:50:21 UTC

Separate all values from Iterable

Hi,

I have grouped all my customers in JavaPairRDD<Long, Iterable<ProductBean>>
by there customerId (of Long type). Means every customerId have a List or
ProductBean.

Now i want to save all ProductBean to DB irrespective of customerId. I got
all values by using method
JavaRDD<Iterable<ProductBean>> values = custGroupRDD.values();

Now i want to convert JavaRDD<Iterable<ProductBean>> to JavaRDD<Object,
BSONObject> so that i can save it to Mongo. Remember, every BSONObject is
made of Single ProductBean.

I am not getting any idea of how to do this in Spark, i mean which Spark's
Transformation is used to do that job. I think this task is some kind
of seperate
all values from Iterable. Please let me know how is this possible.
Any hint in Scala or Python are also ok.


Thanks

Shams

Re: Separate all values from Iterable

Posted by Adrian Tanase <at...@adobe.com>.
The operator you’re looking for is .flatMap. It flattens all the results if you have nested lists of results (e.g. A map over a source element can return zero or more target elements)
I’m not very familiar with the Java APIs but in scala it would go like this (keeping type annotations only as documentation):

def toBson(bean: ProductBean): BSONObject = { … }

val customerBeans: RDD[(Long, Seq[ProductBean])] = allBeans.groupBy(_.customerId)
val mongoObjects: RDD[BSONObject] = customerBeans.flatMap { case (id, beans) => beans.map(toBson) }

Hope this helps,
-adrian

From: Shams ul Haque
Date: Tuesday, October 27, 2015 at 12:50 PM
To: "user@spark.apache.org<ma...@spark.apache.org>"
Subject: Separate all values from Iterable

Hi,


I have grouped all my customers in JavaPairRDD<Long, Iterable<ProductBean>> by there customerId (of Long type). Means every customerId have a List or ProductBean.

Now i want to save all ProductBean to DB irrespective of customerId. I got all values by using method
JavaRDD<Iterable<ProductBean>> values = custGroupRDD.values();

Now i want to convert JavaRDD<Iterable<ProductBean>> to JavaRDD<Object, BSONObject> so that i can save it to Mongo. Remember, every BSONObject is made of Single ProductBean.

I am not getting any idea of how to do this in Spark, i mean which Spark's Transformation is used to do that job. I think this task is some kind of seperate all values from Iterable. Please let me know how is this possible.
Any hint in Scala or Python are also ok.


Thanks

Shams