You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jamborta <ja...@gmail.com> on 2014/09/26 16:19:53 UTC

mappartitions data size

Hi all,

I am using mappartitions to do some heavy computing on subsets of the data.
I have a dataset with about 1m rows, running on a 32 core cluster.
Unfortunately, is seems that mappartitions splits the data into two sets so
it is only running on two cores. 

Is there a way to force it to split into smaller chunks? 

thanks,




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mappartitions-data-size-tp15231.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: mappartitions data size

Posted by Daniel Siegmann <da...@velos.io>.
Use RDD.repartition (see here:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
).

On Fri, Sep 26, 2014 at 10:19 AM, jamborta <ja...@gmail.com> wrote:

> Hi all,
>
> I am using mappartitions to do some heavy computing on subsets of the data.
> I have a dataset with about 1m rows, running on a 32 core cluster.
> Unfortunately, is seems that mappartitions splits the data into two sets so
> it is only running on two cores.
>
> Is there a way to force it to split into smaller chunks?
>
> thanks,
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/mappartitions-data-size-tp15231.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegmann@velos.io W: www.velos.io