You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by David Thomas <dt...@gmail.com> on 2014/04/14 04:20:52 UTC
Checkpoint Vs Cache
What is the difference between checkpointing and caching an RDD?
Re: Checkpoint Vs Cache
Posted by Mayur Rustagi <ma...@gmail.com>.
For starters cacheing may or may not be persisted on disk , but check
pointing will be.
Also cache is generic & check pointing is specific to streaming.
On Apr 14, 2014 7:51 AM, "David Thomas" <dt...@gmail.com> wrote:
> What is the difference between checkpointing and caching an RDD?
>
Re: Checkpoint Vs Cache
Posted by Chris Fregly <ch...@fregly.com>.
http://docs.sigmoidanalytics.com/index.php/Checkpoint_and_not_running_out_of_disk_space
On Mon, Apr 14, 2014 at 2:43 AM, Cheng Lian <li...@gmail.com> wrote:
> Checkpointed RDDs are materialized on disk, while cached RDDs are
> materialized in memory. When memory is insufficient, cached RDD blocks (1
> block per partition) will be evicted in an LRU manner. An evicted RDD block
> will be spilled to disk if the storage level of the RDD allows, otherwise
> this block vanishes entirely and must be recomputed from the lineage DAG if
> it's referenced later.
>
>
> On Mon, Apr 14, 2014 at 10:20 AM, David Thomas <dt...@gmail.com>wrote:
>
>> What is the difference between checkpointing and caching an RDD?
>>
>
>
Re: Checkpoint Vs Cache
Posted by Cheng Lian <li...@gmail.com>.
Checkpointed RDDs are materialized on disk, while cached RDDs are
materialized in memory. When memory is insufficient, cached RDD blocks (1
block per partition) will be evicted in an LRU manner. An evicted RDD block
will be spilled to disk if the storage level of the RDD allows, otherwise
this block vanishes entirely and must be recomputed from the lineage DAG if
it's referenced later.
On Mon, Apr 14, 2014 at 10:20 AM, David Thomas <dt...@gmail.com> wrote:
> What is the difference between checkpointing and caching an RDD?
>