You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Vikash Kumar <vi...@gmail.com> on 2016/10/09 04:11:14 UTC

"How to change rdd fields for each key combination."

I have rdd with records in below format,
id,name,age,houseno,childPresent
1,gupta,35,100,None
1,verma,16,100,None
1,ravi,10,100,None
2, Abc,32,200,None
2,xyz,23,200,None

I have to change childPresent field for all row for same id if any record
with same id have age < 18. How can I do that.

I want output as below:
1,gupta,35,100,Y
1,verma,16,100,Y   -- because it hase age less than 18 so Y for all
childPresent for Id =1
1,ravi,10,100,Y

2, Abc,32,200,N
2,xyz,23,200,N -- because there is no age < 18 for Id =2.

Please let me know how can I achieve using spark/scala.

Thanks
Vikash