You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by David Thomas <dt...@gmail.com> on 2014/04/14 04:20:52 UTC

Checkpoint Vs Cache

What is the difference between checkpointing and caching an RDD?

Re: Checkpoint Vs Cache

Posted by Mayur Rustagi <ma...@gmail.com>.
For starters cacheing may or may not be persisted on disk , but check
pointing will be.
Also cache is generic & check pointing is specific to streaming.
On Apr 14, 2014 7:51 AM, "David Thomas" <dt...@gmail.com> wrote:

> What is the difference between checkpointing and caching an RDD?
>

Re: Checkpoint Vs Cache

Posted by Chris Fregly <ch...@fregly.com>.
http://docs.sigmoidanalytics.com/index.php/Checkpoint_and_not_running_out_of_disk_space


On Mon, Apr 14, 2014 at 2:43 AM, Cheng Lian <li...@gmail.com> wrote:

> Checkpointed RDDs are materialized on disk, while cached RDDs are
> materialized in memory. When memory is insufficient, cached RDD blocks (1
> block per partition) will be evicted in an LRU manner. An evicted RDD block
> will be spilled to disk if the storage level of the RDD allows, otherwise
> this block vanishes entirely and must be recomputed from the lineage DAG if
> it's referenced later.
>
>
> On Mon, Apr 14, 2014 at 10:20 AM, David Thomas <dt...@gmail.com>wrote:
>
>> What is the difference between checkpointing and caching an RDD?
>>
>
>

Re: Checkpoint Vs Cache

Posted by Cheng Lian <li...@gmail.com>.
Checkpointed RDDs are materialized on disk, while cached RDDs are
materialized in memory. When memory is insufficient, cached RDD blocks (1
block per partition) will be evicted in an LRU manner. An evicted RDD block
will be spilled to disk if the storage level of the RDD allows, otherwise
this block vanishes entirely and must be recomputed from the lineage DAG if
it's referenced later.


On Mon, Apr 14, 2014 at 10:20 AM, David Thomas <dt...@gmail.com> wrote:

> What is the difference between checkpointing and caching an RDD?
>