You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Adrian Mocanu <am...@verticalscope.com> on 2014/11/12 19:41:47 UTC

using RDD result in another TDD

Hi
I'd like to use the result of one RDD1 in another RDD2. Normally I would use something like a barrier so make the 2nd RDD wait till the computation of the 1st RDD is done then include the result from RDD1 in the closure for RDD2.
Currently I create another RDD, RDD3, out of the result of RDD1 then do Cartesian product on RDD2 and RDD3. NB: This operation is slow and expands partitions from 270 to 1200

This is a simplified example but I think it should help:
What I want to do (pseudocode):
   val a:Int=RDD1.reduce(..)
   RDD2.map(x => x*a)

What I use right now (pseudocode):
  val a:Int=RDD1.reduce(..)
  RDD3=makeRDD(a)
   RDD2.cartesianProduct(RDD3)

How to structure this type of operation to not need the barrier to block computing RDD2 until RDD1 is done?

-Adrian

Re: using RDD result in another TDD

Posted by Sean Owen <so...@cloudera.com>.

You can't use RDDs inside of RDDs, so this won't work anyway. You could
collect the result of RDD1 and broadcast it, perhaps. collect() blocks.

On Wed, Nov 12, 2014 at 6:41 PM, Adrian Mocanu <am...@verticalscope.com>
wrote:

>  Hi
>
> I’d like to use the result of one RDD1 in another RDD2. Normally I would
> use something like a barrier so make the 2nd RDD wait till the
> computation of the 1st RDD is done then include the result from RDD1 in
> the closure for RDD2.
>
> Currently I create another RDD, RDD3, out of the result of RDD1 then do
> Cartesian product on RDD2 and RDD3. NB: This operation is slow and expands
> partitions from 270 to 1200
>
>
>
> This is a simplified example but I think it should help:
>
> What I want to do (pseudocode):
>
>    val a:Int=RDD1.reduce(..)
>
>    RDD2.map(x => x*a)
>
>
>
> What I use right now (pseudocode):
>
>   val a:Int=RDD1.reduce(..)
>
>   RDD3=makeRDD(a)
>
>    RDD2.cartesianProduct(RDD3)
>
>
>
> How to structure this type of operation to not need the barrier to block
> computing RDD2 until RDD1 is done?
>
>
>
> -Adrian
>
>
>