You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Renyi Xiong <re...@gmail.com> on 2016/05/01 00:52:08 UTC

persist versus checkpoint

Hi,

Is RDD.persist equivalent to RDD.checkpoint If they save same number of
copies (say 3) to disk?

(I assume persist saves copies on different machines ?)

thanks,
Renyi.

Re: persist versus checkpoint

Posted by Holden Karau <ho...@pigscanfly.ca>.
They are different, also this might be better suited for the user list.
Persist by default will cache in memory on one machine, although you can
specify a different storage level. Checkpoint on the other hand will write
out to a persistent store and get rid of the dependency graph used to
compute the RDD (so it is often seen in iterative algorithms which may
build very large or complex dependency graphs over time).

On Saturday, April 30, 2016, Renyi Xiong <re...@gmail.com> wrote:

> Hi,
>
> Is RDD.persist equivalent to RDD.checkpoint If they save same number of
> copies (say 3) to disk?
>
> (I assume persist saves copies on different machines ?)
>
> thanks,
> Renyi.
>
>

-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau