You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by charles li <ch...@gmail.com> on 2016/02/05 04:15:09 UTC
rdd cache priority
say I have 2 RDDs, RDD1 and RDD2.
both are 20g in memory.
and I cache both of them in memory using RDD1.cache() and RDD2.cache()
the in the further steps on my app, I never use RDD1 but use RDD2 for lots
of time.
then here is my question:
if there is only 40G memory in my cluster, and here I have another RDD,
RDD3 for 20g, what happened if I cache RDD3 using RDD3.cache()?
as the document says, cache using the default cache level : MEMORY_ONLY .
it means that it will not definitely cache RDD3 but re-compute it every
time used.
I feel a little confused, will spark help me remove RDD1 and put RDD3 in
the memory?
or is there any concept like " Priority cache " in spark?
great thanks
--
*--------------------------------------*
a spark lover, a quant, a developer and a good man.
http://github.com/litaotao
Re: rdd cache priority
Posted by Takeshi Yamamuro <li...@gmail.com>.
Hi,
u're right; rdd3 is not totally cached and it is re-computed every time.
If MEMORY_AND_DISK, rdd3 is written to disk.
Also, the current Spark does not automatically unpersist rdds depends
on frequency of use.
On Fri, Feb 5, 2016 at 12:15 PM, charles li <ch...@gmail.com> wrote:
> say I have 2 RDDs, RDD1 and RDD2.
>
> both are 20g in memory.
>
> and I cache both of them in memory using RDD1.cache() and RDD2.cache()
>
>
> the in the further steps on my app, I never use RDD1 but use RDD2 for lots
> of time.
>
>
> then here is my question:
>
> if there is only 40G memory in my cluster, and here I have another RDD,
> RDD3 for 20g, what happened if I cache RDD3 using RDD3.cache()?
>
>
> as the document says, cache using the default cache level : MEMORY_ONLY .
> it means that it will not definitely cache RDD3 but re-compute it every
> time used.
>
> I feel a little confused, will spark help me remove RDD1 and put RDD3 in
> the memory?
>
> or is there any concept like " Priority cache " in spark?
>
>
> great thanks
>
>
>
> --
> *--------------------------------------*
> a spark lover, a quant, a developer and a good man.
>
> http://github.com/litaotao
>
--
---
Takeshi Yamamuro