You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Vipul Pandey <vi...@gmail.com> on 2014/02/07 08:58:53 UTC

saving partitions separately

Hi,

I have a dataset, which after a few transformation takes the following shape : 

org.apache.spark.rdd.RDD[String,(String,Double)]

And there are just a handful of possible keys ( <100) . What I want to do is save data for each key in separate files. One way to do that is to filter the RDD as many times for each key in a loop and save each Filtered RDD separately. I was wondering if there is a direct way of doing this? May be repartitioning based on the key somehow? or grouping by keys? but then how do we save each separately without looping through. 

Thanks,
Vipul