You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Weiwei Zhang <wz...@dons.usfca.edu> on 2016/02/21 07:18:43 UTC

Behind the scene of RDD to DataFrame

Hi there,

Could someone explain to me what is behind the scene of rdd.toDF()? More
importantly, will this step involve a lot of shuffles and cause the surge
of the size of intermediate files? Thank you.

Best Regards,
Vivian

Re: Behind the scene of RDD to DataFrame

Posted by Weiwei Zhang <wz...@dons.usfca.edu>.
Thanks a lot!

Best Regards,
Weiwei

On Sat, Feb 20, 2016 at 11:53 PM, Hemant Bhanawat <he...@gmail.com>
wrote:

> toDF internally calls sqlcontext.createDataFrame which transforms the RDD
> to RDD[InternalRow]. This RDD[InternalRow] is then mapped to a dataframe.
>
> Type conversions (from scala types to catalyst types) are involved but no
> shuffling.
>
> Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811>
> www.snappydata.io
>
> On Sun, Feb 21, 2016 at 11:48 AM, Weiwei Zhang <wz...@dons.usfca.edu>
> wrote:
>
>> Hi there,
>>
>> Could someone explain to me what is behind the scene of rdd.toDF()? More
>> importantly, will this step involve a lot of shuffles and cause the surge
>> of the size of intermediate files? Thank you.
>>
>> Best Regards,
>> Vivian
>>
>
>

Re: Behind the scene of RDD to DataFrame

Posted by Hemant Bhanawat <he...@gmail.com>.
toDF internally calls sqlcontext.createDataFrame which transforms the RDD
to RDD[InternalRow]. This RDD[InternalRow] is then mapped to a dataframe.

Type conversions (from scala types to catalyst types) are involved but no
shuffling.

Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811>
www.snappydata.io

On Sun, Feb 21, 2016 at 11:48 AM, Weiwei Zhang <wz...@dons.usfca.edu>
wrote:

> Hi there,
>
> Could someone explain to me what is behind the scene of rdd.toDF()? More
> importantly, will this step involve a lot of shuffles and cause the surge
> of the size of intermediate files? Thank you.
>
> Best Regards,
> Vivian
>