You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2014/09/30 01:02:48 UTC

in memory assumption in cogroup?

apologies for asking yet again about spark memory assumptions, but i cant
seem to keep it in my head.

if i use PairRDDFunctions.cogroup, it returns for every key 2 iterables. do
the contents of these iterables have to fit in memory? or is the data
streamed?

Re: in memory assumption in cogroup?

Posted by Liquan Pei <li...@gmail.com>.

Hi Koert,

cogroup is a transformation on RDD and it creates a cogroupRDD and then
perform some transformations on it. When later an action is called, the
compute() method of the cogroupRDD will be called. Roughly speaking, each
element in cogroupRDD is fetched one at a time. Thus the contents of the
two iterables  do not need to fit in memory.

Hope this helps!
Liq

On Mon, Sep 29, 2014 at 4:02 PM, Koert Kuipers <ko...@tresata.com> wrote:

> apologies for asking yet again about spark memory assumptions, but i cant
> seem to keep it in my head.
>
> if i use PairRDDFunctions.cogroup, it returns for every key 2 iterables.
> do the contents of these iterables have to fit in memory? or is the data
> streamed?
>
>

-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst