You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Brandon White <bw...@gmail.com> on 2015/07/16 07:37:27 UTC
Running foreach on a list of rdds in parallel
Hello,
I have a list of rdds
List(rdd1, rdd2, rdd3,rdd4)
I would like to save these rdds in parallel. Right now, it is running each
operation sequentially. I tried using a rdd of rdd but that does not work.
list.foreach { rdd =>
rdd.saveAsTextFile("/tmp/cache/")
}
Any ideas?
Re: Running foreach on a list of rdds in parallel
Posted by Vetle Leinonen-Roeim <ve...@roeim.net>.
On Thu, Jul 16, 2015 at 7:37 AM Brandon White <bw...@gmail.com>
wrote:
> Hello,
>
> I have a list of rdds
>
> List(rdd1, rdd2, rdd3,rdd4)
>
> I would like to save these rdds in parallel. Right now, it is running each
> operation sequentially. I tried using a rdd of rdd but that does not work.
>
> list.foreach { rdd =>
> rdd.saveAsTextFile("/tmp/cache/")
> }
>
> Any ideas?
>
If they're to be saved in the same text file, use the answer from Davies -
if they're to be saved in separate files, list.par.foreach should work, no?
Regards,
Vetle
Re: Running foreach on a list of rdds in parallel
Posted by Davies Liu <da...@databricks.com>.
sc.union(rdds).saveAsTextFile()
On Wed, Jul 15, 2015 at 10:37 PM, Brandon White <bw...@gmail.com> wrote:
> Hello,
>
> I have a list of rdds
>
> List(rdd1, rdd2, rdd3,rdd4)
>
> I would like to save these rdds in parallel. Right now, it is running each
> operation sequentially. I tried using a rdd of rdd but that does not work.
>
> list.foreach { rdd =>
> rdd.saveAsTextFile("/tmp/cache/")
> }
>
> Any ideas?
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org