You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/06 11:05:39 UTC

[jira] [Resolved] (SPARK-1962) Add RDD cache reference counting

     [ https://issues.apache.org/jira/browse/SPARK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-1962.
------------------------------
    Resolution: Won't Fix

I understand the problem, but referencing counting isn't right semantically here. (It would also be a significant behavior change.) If a function needs to operate on a cached RDD, it can cache it, but then needs to know whether to unpersist when done. Really, it needs to return the persistence level to what it was before, and there is not just one possibility.

> Add RDD cache reference counting
> --------------------------------
>
>                 Key: SPARK-1962
>                 URL: https://issues.apache.org/jira/browse/SPARK-1962
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Taeyun Kim
>            Priority: Minor
>
> It would be nice if the RDD cache() method incorporate a reference counting information.
> That is,
> {code}
> void test()
> {
>     JavaRDD<...> rdd = ...;
>     rdd.cache();  // to reference count 1. actual caching happens.
>     rdd.cache();  // to reference count 2. Nop as long as the storage level is the same. Else, exception.
>     ...
>     rdd.uncache();  // to reference count 1. Nop.
>     rdd.uncache();  // to reference count 0. Actual unpersist happens.
> }
> {code}
> This can be useful when writing code in modular way.
> When a function receives an RDD as an argument, it doesn't necessarily know the cache status of the RDD.
> But it could want to cache the RDD, since it will use the RDD multiple times.
> But with the current RDD API, it cannot determine whether it should unpersist it or leave it alone (so that the caller can continue to use that RDD without rebuilding).
> For API compatibility, introducing a new method or adding a parameter may be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org