You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Meihua Wu <ro...@gmail.com> on 2015/08/03 18:56:31 UTC

Does RDD.cartesian involve shuffling?

Does RDD.cartesian involve shuffling?

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Does RDD.cartesian involve shuffling?

Posted by Richard Marscher <rm...@localytics.com>.
That is the only alternative I'm aware of, if either A or B are small
enough to broadcast then you'd at least be done cartesian products all
locally without needing to also transmit and shuffle A. Unless spark
somehow optimizes cartesian product and only transfers the smaller RDD
across the network in the shuffle but I don't have reason to believe that's
true.

I'd try the cartesian first if you haven't tried at all, just to make sure
it actually is too slow before getting tricky with the broadcast.

On Tue, Aug 4, 2015 at 12:25 PM, Meihua Wu <ro...@gmail.com>
wrote:

> Thanks, Richard!
>
> I basically have two RDD's: A and B; and I need to compute a value for
> every pair of (a, b) for a in A and b in B. My first thought is
> cartesian, but involves expensive shuffle.
>
> Any alternatives? How about I convert B to an array and broadcast it
> to every node (assuming B is relative small to fit)?
>
>
>
> On Tue, Aug 4, 2015 at 8:23 AM, Richard Marscher
> <rm...@localytics.com> wrote:
> > Yes it does, in fact it's probably going to be one of the more expensive
> > shuffles you could trigger.
> >
> > On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu <rotationsymmetry14@gmail.com
> >
> > wrote:
> >>
> >> Does RDD.cartesian involve shuffling?
> >>
> >> Thanks!
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: user-help@spark.apache.org
> >>
> >
> >
> >
> > --
> > Richard Marscher
> > Software Engineer
> > Localytics
> > Localytics.com | Our Blog | Twitter | Facebook | LinkedIn
>



-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com <http://localytics.com/> | Our Blog
<http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
Facebook <http://facebook.com/localytics> | LinkedIn
<http://www.linkedin.com/company/1148792?trk=tyah>

Re: Does RDD.cartesian involve shuffling?

Posted by Meihua Wu <ro...@gmail.com>.
Thanks, Richard!

I basically have two RDD's: A and B; and I need to compute a value for
every pair of (a, b) for a in A and b in B. My first thought is
cartesian, but involves expensive shuffle.

Any alternatives? How about I convert B to an array and broadcast it
to every node (assuming B is relative small to fit)?



On Tue, Aug 4, 2015 at 8:23 AM, Richard Marscher
<rm...@localytics.com> wrote:
> Yes it does, in fact it's probably going to be one of the more expensive
> shuffles you could trigger.
>
> On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu <ro...@gmail.com>
> wrote:
>>
>> Does RDD.cartesian involve shuffling?
>>
>> Thanks!
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>
>
>
> --
> Richard Marscher
> Software Engineer
> Localytics
> Localytics.com | Our Blog | Twitter | Facebook | LinkedIn

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Does RDD.cartesian involve shuffling?

Posted by Richard Marscher <rm...@localytics.com>.
Yes it does, in fact it's probably going to be one of the more expensive
shuffles you could trigger.

On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu <ro...@gmail.com>
wrote:

> Does RDD.cartesian involve shuffling?
>
> Thanks!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com <http://localytics.com/> | Our Blog
<http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
Facebook <http://facebook.com/localytics> | LinkedIn
<http://www.linkedin.com/company/1148792?trk=tyah>