You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2014/03/17 22:30:17 UTC

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/126#discussion_r10679953
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1025,6 +1025,14 @@ abstract class RDD[T: ClassTag](
         checkpointData.flatMap(_.getCheckpointFile)
       }
     
    +  def cleanup() {
    --- End diff --
    
    Actually the current implementation wont. Calling rddA.cleanup() will only do two things
    1. Unpersist rddA
    2. Delete the shuffle dependencies and corresponding shuffle data only related to rddA. Lets assume rddA has two shuffle dependencies s1 and s2, one each to rdd1 and rdd2. These shuffle depencies are not shared with rddB. So cleaning rddA with the current implementation of RDD.clean() will not affect rddB. So the current implementation cannot be directly "misused". But it further reinforces Patrick's point earlier in this thread that it is not clear what is the desired semantics and its best to mark this function as a private[spark] for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---