You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by pseudo oduesp <ps...@gmail.com> on 2016/06/16 12:17:21 UTC

cache datframe

hi,
if i cache same data frame and transforme and add collumns i should cache
second times

df.cache()

  transforamtion
  add new columns

df.cache()
?

Re: cache datframe

Posted by Jacek Laskowski <ja...@japila.pl>.
Yes. Yes.

What's the use case?

Jacek
On 16 Jun 2016 2:17 p.m., "pseudo oduesp" <ps...@gmail.com> wrote:

> hi,
> if i cache same data frame and transforme and add collumns i should cache
> second times
>
> df.cache()
>
>   transforamtion
>   add new columns
>
> df.cache()
> ?
>
>

Re: cache datframe

Posted by Alexey Pechorin <al...@taboola.com>.
What's the reason for your first cache call? It looks like you've used the
data only once to transform it without reusing the data, so there's no
reason for the first cache call, and you need only the second call (and
that also depends on the rest of your code).

On Thu, Jun 16, 2016 at 3:17 PM, pseudo oduesp <ps...@gmail.com>
wrote:

> hi,
> if i cache same data frame and transforme and add collumns i should cache
> second times
>
> df.cache()
>
>   transforamtion
>   add new columns
>
> df.cache()
> ?
>
>