You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Brandon White <bw...@gmail.com> on 2015/07/16 07:37:27 UTC

Running foreach on a list of rdds in parallel

Hello,

I have a list of rdds

List(rdd1, rdd2, rdd3,rdd4)

I would like to save these rdds in parallel. Right now, it is running each
operation sequentially. I tried using a rdd of rdd but that does not work.

list.foreach { rdd =>
  rdd.saveAsTextFile("/tmp/cache/")
}

Any ideas?

Re: Running foreach on a list of rdds in parallel

Posted by Vetle Leinonen-Roeim <ve...@roeim.net>.
On Thu, Jul 16, 2015 at 7:37 AM Brandon White <bw...@gmail.com>
wrote:

> Hello,
>
> I have a list of rdds
>
> List(rdd1, rdd2, rdd3,rdd4)
>
> I would like to save these rdds in parallel. Right now, it is running each
> operation sequentially. I tried using a rdd of rdd but that does not work.
>
> list.foreach { rdd =>
>   rdd.saveAsTextFile("/tmp/cache/")
> }
>
> Any ideas?
>

If they're to be saved in the same text file, use the answer from Davies -
if they're to be saved in separate files, list.par.foreach should work, no?

Regards,
Vetle

Re: Running foreach on a list of rdds in parallel

Posted by Davies Liu <da...@databricks.com>.
sc.union(rdds).saveAsTextFile()

On Wed, Jul 15, 2015 at 10:37 PM, Brandon White <bw...@gmail.com> wrote:
> Hello,
>
> I have a list of rdds
>
> List(rdd1, rdd2, rdd3,rdd4)
>
> I would like to save these rdds in parallel. Right now, it is running each
> operation sequentially. I tried using a rdd of rdd but that does not work.
>
> list.foreach { rdd =>
>   rdd.saveAsTextFile("/tmp/cache/")
> }
>
> Any ideas?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org