You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Brandon White <bw...@gmail.com> on 2016/06/27 05:54:53 UTC

Difference between Dataframe and RDD Persisting

What is the difference between persisting a dataframe and a rdd? When I
persist my RDD, the UI says it takes 50G or more of memory. When I persist
my dataframe, the UI says it takes 9G or less of memory.

Does the dataframe not persist the actual content? Is it better / faster to
persist a RDD when doing a lot of filter, mapping, and collecting
operations?

Re: Difference between Dataframe and RDD Persisting

Posted by Jörn Franke <jo...@gmail.com>.

Dataframe uses a more efficient binary representation to store and persist data. You should go for that one in most of the cases. Rdd is slower.

> On 27 Jun 2016, at 07:54, Brandon White <bw...@gmail.com> wrote:
> 
> What is the difference between persisting a dataframe and a rdd? When I persist my RDD, the UI says it takes 50G or more of memory. When I persist my dataframe, the UI says it takes 9G or less of memory.
> 
> Does the dataframe not persist the actual content? Is it better / faster to persist a RDD when doing a lot of filter, mapping, and collecting operations? 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org