You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "DEVAN M.S." <ms...@gmail.com> on 2014/12/30 12:54:21 UTC

How to collect() each partition in scala ?

Hi all,
i have one large data-set. when i am getting the number of partitions its
showing 43.
We can't collect() the large data-set in to  memory so i am thinking like
this, collect() each partitions so that it will be small in size.

Any thoughts ?

Re: How to collect() each partition in scala ?

Posted by Cody Koeninger <co...@koeninger.org>.
I'm not sure exactly what you're trying to do, but take a look at
rdd.toLocalIterator if you haven't already.

On Tue, Dec 30, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:

> collect()-ing a partition still implies copying it to the driver, but
> you're suggesting you can't collect() the whole data set to the
> driver. What do you mean: collect() 1 partition? or collect() some
> smaller result from each partition?
>
> On Tue, Dec 30, 2014 at 11:54 AM, DEVAN M.S. <ms...@gmail.com> wrote:
> > Hi all,
> > i have one large data-set. when i am getting the number of partitions its
> > showing 43.
> > We can't collect() the large data-set in to  memory so i am thinking like
> > this, collect() each partitions so that it will be small in size.
> >
> > Any thoughts ?
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: How to collect() each partition in scala ?

Posted by Sean Owen <so...@cloudera.com>.
collect()-ing a partition still implies copying it to the driver, but
you're suggesting you can't collect() the whole data set to the
driver. What do you mean: collect() 1 partition? or collect() some
smaller result from each partition?

On Tue, Dec 30, 2014 at 11:54 AM, DEVAN M.S. <ms...@gmail.com> wrote:
> Hi all,
> i have one large data-set. when i am getting the number of partitions its
> showing 43.
> We can't collect() the large data-set in to  memory so i am thinking like
> this, collect() each partitions so that it will be small in size.
>
> Any thoughts ?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org