You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Hyukjin Kwon <gu...@gmail.com> on 2015/12/11 07:56:20 UTC

coalesce at DataFrame missing argument for shuffle.

Hi all,

I accidentally met coalesce() function and found this taking arguments
different for RDD and DataFrame.

It looks shuffle option is missing for DataFrame.

I understand repartition() exactly works as coalesce() with shuffling but
it looks a bit weird that the same functions take different argument which
can be easily done just by adding single argument.

Could anybody tell me if this is intendedly missing or not?

Thanks!

Re: coalesce at DataFrame missing argument for shuffle.

Posted by Reynold Xin <rx...@databricks.com>.

I am not sure if we need it. The RDD API has way too many methods and
parameters. As you said, it is simply "repartition".


On Fri, Dec 11, 2015 at 2:56 PM, Hyukjin Kwon <gu...@gmail.com> wrote:

> Hi all,
>
> I accidentally met coalesce() function and found this taking arguments
> different for RDD and DataFrame.
>
> It looks shuffle option is missing for DataFrame.
>
> I understand repartition() exactly works as coalesce() with shuffling but
> it looks a bit weird that the same functions take different argument which
> can be easily done just by adding single argument.
>
> Could anybody tell me if this is intendedly missing or not?
>
> Thanks!
> 
>