You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Vladimir Rodionov <vr...@splicemachine.com> on 2014/09/10 01:13:58 UTC
Spark caching questions
Hi, users
1. Disk based cache eviction policy? The same LRU?
2. What is the scope of a cached RDD? Does it survive application? What
happen if I run Java app next time? Will RRD be created or read from cache?
If , answer is YES, then ...
3. Is there are any way to invalidate cached RDD automatically? RDD
partitions? Some API kind of : RDD.isValid()?
4. HadoopRDD InputFormat - based. Some partitions (splits) may become
invalid in cache. Can we reload only those partitions? Into cache?
-Vladimir
Re: Spark caching questions
Posted by Mayur Rustagi <ma...@gmail.com>.
Cached RDD do not survive SparkContext deletion (they are scoped on a per
sparkcontext basis).
I am not sure what you mean by disk based cache eviction, if you cache more
RDD than disk space the result will not be very pretty :)
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>
On Wed, Sep 10, 2014 at 4:43 AM, Vladimir Rodionov <
vrodionov@splicemachine.com> wrote:
> Hi, users
>
> 1. Disk based cache eviction policy? The same LRU?
>
> 2. What is the scope of a cached RDD? Does it survive application? What
> happen if I run Java app next time? Will RRD be created or read from cache?
>
> If , answer is YES, then ...
>
>
> 3. Is there are any way to invalidate cached RDD automatically? RDD
> partitions? Some API kind of : RDD.isValid()?
>
> 4. HadoopRDD InputFormat - based. Some partitions (splits) may become
> invalid in cache. Can we reload only those partitions? Into cache?
>
> -Vladimir
>