You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daniel Mahler <dm...@gmail.com> on 2013/10/31 10:54:47 UTC

repartitioning RDDS

Is it possible to repartition RDDs other than by the coalesce method. I am
primarily interested in making finer grained partitioning or rebalancing an
unbalanced parttioning, without coalescing.

thanks
Daniel

Re: repartitioning RDDS

Posted by Robin Cjc <cj...@gmail.com>.
Thanks all,
As today is holiday here, I will try it out and feedback later.

Best Regards,
Chen Jingci


On Fri, Nov 1, 2013 at 1:18 AM, Stephen Haberman <stephen.haberman@gmail.com
> wrote:

>
> > I just wanted to point out that in Spark 0.8.1 and above, the
> "repartition"
> > function has been added to be a clearer way to accomplish what you want.
> > ("Coalescing" into a larger number of partitions doesn't make much
> linguistic
> > sense.)
>
> Nice!
>
> - Stephen
>
>

Re: repartitioning RDDS

Posted by Stephen Haberman <st...@gmail.com>.
> I just wanted to point out that in Spark 0.8.1 and above, the "repartition"
> function has been added to be a clearer way to accomplish what you want.
> ("Coalescing" into a larger number of partitions doesn't make much linguistic
> sense.)

Nice!

- Stephen


Re: repartitioning RDDS

Posted by Aaron Davidson <il...@gmail.com>.
Stephen is exactly correct, I just wanted to point out that in Spark 0.8.1
and above, the "repartition" function has been added to be a clearer way to
accomplish what you want. ("Coalescing" into a larger number of partitions
doesn't make much linguistic sense.)


On Thu, Oct 31, 2013 at 7:48 AM, Stephen Haberman <
stephen.haberman@gmail.com> wrote:

>
> > Is it possible to repartition RDDs other than by the coalesce method.
> > I am primarily interested in making finer grained partitioning or
> > rebalancing an unbalanced parttioning, without coalescing.
>
> I believe if you use the shuffle=true parameter, coalesce will do what
> you want, and essentially becomes a general "repartition" method.
>
> Specifically, yes, while shuffle=false can only make larger partitions,
> but with shuffle=true, you can break your partitions up into many
> smaller partitions, with the content based on a hash partitioner.
>
> I believe that's what you're asking for?
>
> - Stephen
>
>
>

Re: repartitioning RDDS

Posted by Stephen Haberman <st...@gmail.com>.
> Is it possible to repartition RDDs other than by the coalesce method.
> I am primarily interested in making finer grained partitioning or
> rebalancing an unbalanced parttioning, without coalescing.

I believe if you use the shuffle=true parameter, coalesce will do what
you want, and essentially becomes a general "repartition" method.

Specifically, yes, while shuffle=false can only make larger partitions,
but with shuffle=true, you can break your partitions up into many
smaller partitions, with the content based on a hash partitioner.

I believe that's what you're asking for?

- Stephen