You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by andrewor14 <gi...@git.apache.org> on 2015/04/24 12:19:22 UTC

[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/5685

    [SPARK-7120][SPARK-7121][WIP] Closure cleaner nesting + documentation

    For instance, in SparkContext, I tried to do the following:
    {code}
    def scope[T](body: => T): T = body // no-op
    def myCoolMethod(path: String): RDD[String] = scope {
      parallelize(1 to 10).map { _ => path }
    }
    {code}
    and I got an exception complaining that SparkContext is not serializable. The issue here is that the inner closure is getting its path from the outer closure (the scope), but the outer closure actually references the SparkContext object itself to get the `parallelize` method.
    
    Note, however, that the inner closure doesn't actually need the SparkContext; it just needs a field from the outer closure. If we modify ClosureCleaner to clean the outer closure recursively while using the fields accessed by the inner closure, then we can serialize the inner closure.
    
    This is blocking my effort on a separate task. This is WIP because I plan to add tests for this later.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark closure-cleaner

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5685
    
----
commit 86f78237b7623e4efa06c5feb053e0c304979c73
Author: Andrew Or <an...@databricks.com>
Date:   2015-04-24T10:05:58Z

    Implement transitive cleaning + add missing documentation
    
    See in-code comments for more detail on what this means.

commit 2390a608ed74a9703d3763d040421dccb51242ec
Author: Andrew Or <an...@databricks.com>
Date:   2015-04-24T10:08:11Z

    Feature flag this new behavior
    
    ... in case anything breaks, we should be able to resort to old
    behavior.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97202716
  
    I've reverted the previous commit since https://github.com/apache/spark/commit/bf35edd9d4b8b11df9f47b6ff43831bc95f06322 was merged into master and the workaround for SPARK-7180 is no longer needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97672308
  
    Hey Andrew,
    
    This is looking good. The code is quite dense but as far as I can tell, this is correct. I left some more surface level comments, if you can get to those I think it will be good to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97286976
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97244703
  
      [Test build #31172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31172/consoleFull) for   PR 5685 at commit [`9187066`](https://github.com/apache/spark/commit/9187066ef9834d62d488b0ccc3be70eea514ae2c).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97300067
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29393859
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -126,34 +243,66 @@ private[spark] object ClosureCleaner extends Logging {
           }
         }
     
    +    // List of outer (class, object) pairs, ordered from outermost to innermost
    +    // Note that all outer objects but the outermost one (first one in this list) must be closures
         var outerPairs: List[(Class[_], AnyRef)] = (outerClasses zip outerObjects).reverse
    -    var outer: AnyRef = null
    +    var parent: AnyRef = null
         if (outerPairs.size > 0 && !isClosure(outerPairs.head._1)) {
           // The closure is ultimately nested inside a class; keep the object of that
           // class without cloning it since we don't want to clone the user's objects.
    -      outer = outerPairs.head._2
    +      // Note that we still need to keep around the outermost object itself because
    +      // we need it to clone its child closure later (see below).
    +      logDebug(s" + outermost object is not a closure, so do not clone it: ${outerPairs.head}")
    +      parent = outerPairs.head._2 // e.g. SparkContext
           outerPairs = outerPairs.tail
    +    } else if (outerPairs.size > 0) {
    +      logDebug(s" + outermost object is a closure, so we just keep it: ${outerPairs.head}")
    +    } else {
    +      logDebug(" + there are no enclosing objects!")
         }
    +
         // Clone the closure objects themselves, nulling out any fields that are not
         // used in the closure we're working on or any of its inner closures.
         for ((cls, obj) <- outerPairs) {
    -      outer = instantiateClass(cls, outer, inInterpreter)
    +      logDebug(s" + cloning the object $obj of class ${cls.getName}")
    +      // We null out these unused references by cloning each object and then filling in all
    +      // required fields from the original object. We need the parent here because the Java
    +      // language specification requires the first constructor parameter of any closure to be
    +      // its enclosing object.
    +      val clone = instantiateClass(cls, parent, inInterpreter)
           for (fieldName <- accessedFields(cls)) {
             val field = cls.getDeclaredField(fieldName)
             field.setAccessible(true)
             val value = field.get(obj)
    -        // logInfo("1: Setting " + fieldName + " on " + cls + " to " + value);
    -        field.set(outer, value)
    +        field.set(clone, value)
           }
    +      // If transitive cleaning is enabled, we recursively clean any enclosing closure using
    +      // the already populated accessed fields map of the starting closure
    +      if (cleanTransitively && isClosure(clone.getClass)) {
    +        logDebug(s" + cleaning cloned closure $clone recursively (${cls.getName})")
    +        clean(clone, checkSerializable, cleanTransitively, accessedFields)
    --- End diff --
    
    should `checkSerailizable` be false here? It seems like theoretically we only need to check the very final closure. It's a bit hard for me to reason about whether we could have a weird state where `checkSerializable` throws false positive errors during recursion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97202690
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96297682
  
      [Test build #708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/708/consoleFull) for   PR 5685 at commit [`8b71cdb`](https://github.com/apache/spark/commit/8b71cdb7953ce622fd94fda7e0c5daafeb145cca).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch **adds the following new dependencies:**
       * `tachyon-0.6.4.jar`
       * `tachyon-client-0.6.4.jar`
    
     * This patch **removes the following dependencies:**
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29391891
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -101,21 +115,124 @@ private[spark] object ClosureCleaner extends Logging {
         }
       }
     
    -  def clean(func: AnyRef, checkSerializable: Boolean = true) {
    +  /**
    +   * Clean the given closure in place.
    +   *
    +   * More specifically, this renders the given closure serializable as long as it does not
    +   * explicitly reference unserializable objects.
    +   *
    +   * @param closure the closure to clean
    +   * @param checkSerializable whether to verify that the closure is serializable after cleaning
    +   * @param cleanTransitively whether to clean enclosing closures transitively
    +   */
    +  def clean(
    +      closure: AnyRef,
    +      checkSerializable: Boolean = true,
    +      cleanTransitively: Boolean = true): Unit = {
    +    clean(closure, checkSerializable, cleanTransitively, Map.empty)
    +  }
    +
    +  /**
    +   * Helper method to clean the given closure in place.
    +   *
    +   * The mechanism is to traverse the hierarchy of enclosing closures and null out any
    +   * references along the way that are not actually used by the starting closure, but are
    +   * nevertheless included in the compiled anonymous classes. Note that it is unsafe to
    +   * simply mutate the enclosing closures in place, as other code paths may depend on them.
    +   * Instead, we clone each enclosing closure and set the parent pointers accordingly.
    +   *
    +   * By default, closures are cleaned transitively. This means we detect whether enclosing
    +   * objects are actually referenced by the starting one, either directly or transitively,
    +   * and, if not, sever these closures from the hierarchy. In other words, in addition to
    +   * nulling out unused field references, we also null out any parent pointers that refer
    +   * to enclosing objects not actually needed by the starting closure. We determine
    +   * transitivity by tracing through the tree of all methods ultimately invoked by the
    +   * inner closure and record all the fields referenced in the process.
    +   *
    +   * For instance, transitive cleaning is necessary in the following scenario:
    +   *
    +   *   class SomethingNotSerializable {
    +   *     def someValue = 1
    +   *     def someMethod(): Unit = scope("one") {
    +   *       def x = someValue
    +   *       def y = 2
    +   *       scope("two") { println(y + 1) }
    +   *     }
    +   *     def scope(name: String)(body: => Unit) = body
    +   *   }
    +   *
    +   * In this example, scope "two" is not serializable because it references scope "one", which
    +   * references SomethingNotSerializable. Note that, however, the body of scope "two" does not
    +   * actually depend on SomethingNotSerializable. This means we can safely null out the parent
    +   * pointer of a cloned scope "one" and set it the parent of scope "two", such that scope "two"
    +   * no longer references SomethingNotSerializable transitively.
    +   *
    +   * @param func the starting closure to clean
    +   * @param checkSerializable whether to verify that the closure is serializable after cleaning
    +   * @param cleanTransitively whether to clean enclosing closures transitively
    +   * @param accessedFields a map from a class to a set of its fields that are accessed by
    +   *                       the starting closure
    +   */
    +  private def clean(
    +      func: AnyRef,
    +      checkSerializable: Boolean,
    +      cleanTransitively: Boolean,
    +      accessedFields: Map[Class[_], Set[String]]): Unit = {
    +
    +    // TODO: clean all inner closures first. This requires us to find the inner objects.
         // TODO: cache outerClasses / innerClasses / accessedFields
    -    val outerClasses = getOuterClasses(func)
    +
    +    if (func == null) {
    +      return
    +    }
    +
    +    logDebug(s"+++ Cleaning closure $func (${func.getClass.getName}}) +++")
    +
    +    // A list of classes that represents closures enclosed in the given one
         val innerClasses = getInnerClasses(func)
    --- End diff --
    
    should this be called `innerClosureClasses` and `getInnerClosureClasses`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97283668
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96905160
  
      [Test build #31109 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31109/consoleFull) for   PR 5685 at commit [`9419efe`](https://github.com/apache/spark/commit/9419efea98e190766887a9ab5fbac57f45bf007c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97244715
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31172/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29392113
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -55,10 +58,14 @@ private[spark] object ClosureCleaner extends Logging {
       private def getOuterClasses(obj: AnyRef): List[Class[_]] = {
    --- End diff --
    
    Should we merge this and `getOuterObjects` into one function that just returns pairs of objects and their classes? Maybe not just to keep the changes minimal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29357717
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1627,7 +1627,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
           partitions: Seq[Int],
           allowLocal: Boolean
           ): Array[U] = {
    -    runJob(rdd, (context: TaskContext, iter: Iterator[T]) => func(iter), partitions, allowLocal)
    +    // We must clean `func` here before using it in another closure below
    +    // Otherwise, the closure cleaner will only clean the outer closure but not `func`
    +    val cleanedFunc = clean(func)
    --- End diff --
    
    Yeah, I think so. This seems like a separate issue so maybe I should file another JIRA for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29081651
  
    --- Diff: core/src/test/scala/org/apache/spark/util/ClosureCleanerSuite.scala ---
    @@ -50,7 +50,7 @@ class ClosureCleanerSuite extends FunSuite {
         val obj = new TestClassWithNesting(1)
         assert(obj.run() === 96) // 4 * (1+2+3+4) + 4 * (1+2+3+4) + 16 * 1
       }
    -  
    +
    --- End diff --
    
    The suite should be expanded with the fix, right? Could you also test the common `for` loop case that others have seen in the past?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96103510
  
    Does this happen in user programs or just in SparkContext? This is exactly what ClosureCleaner was designed to deal with, so I'm surprised that it's a problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96450346
  
      [Test build #711 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/711/consoleFull) for   PR 5685 at commit [`6d4d3f1`](https://github.com/apache/spark/commit/6d4d3f1ac8da883fb814613afec35900b078b751).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29097065
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -77,6 +80,9 @@ private[spark] object ClosureCleaner extends Logging {
         Nil
       }
     
    +  /**
    +   * Return a list of classes that represent closures enclosed in the given closure object.
    +   */
       private def getInnerClasses(obj: AnyRef): List[Class[_]] = {
    --- End diff --
    
    how can I do that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29465512
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -55,10 +58,14 @@ private[spark] object ClosureCleaner extends Logging {
       private def getOuterClasses(obj: AnyRef): List[Class[_]] = {
         for (f <- obj.getClass.getDeclaredFields if f.getName == "$outer") {
           f.setAccessible(true)
    -      if (isClosure(f.getType)) {
    -        return f.getType :: getOuterClasses(f.get(obj))
    -      } else {
    -        return f.getType :: Nil // Stop at the first $outer that is not a closure
    +      val outer = f.get(obj)
    +      // The outer pointer may be null if we have cleaned this closure before
    +      if (outer != null) {
    +        if (isClosure(f.getType)) {
    +          return f.getType :: getOuterClasses(f.get(obj))
    --- End diff --
    
    "f.get(obj)"  -> "outer"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96006013
  
    Yeah I'm probably misunderstanding prior discussions, but I had the impression there was a reason you couldn't always clear references two hops away. This would be great to improve if it can be done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96108534
  
    The main issue with grabbing the values into the closure is that we'll need to do this everywhere. My intention is to wrap many existing methods (in SparkContext and other files) in closures, and it will be very difficult and manual to have to localize the variables in all of these places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97202661
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97565265
  
    I have moved the last two commits into a separate patch #5787 (SPARK-7237). It is really a distinct issue  that can be merged without this patch and should not further inflate the size of this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-95905246
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30930/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29404655
  
    --- Diff: core/src/test/scala/org/apache/spark/util/ClosureCleanerSuite2.scala ---
    @@ -0,0 +1,562 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.util
    +
    +import java.io.NotSerializableException
    +
    +import scala.collection.mutable
    +
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, PrivateMethodTester}
    +
    +import org.apache.spark.{SparkContext, SparkException}
    +import org.apache.spark.serializer.SerializerInstance
    +
    +/**
    + * Another test suite for the closure cleaner that is finer-grained.
    + * For tests involving end-to-end Spark jobs, see {{ClosureCleanerSuite}}.
    + */
    +class ClosureCleanerSuite2 extends FunSuite with BeforeAndAfterAll with PrivateMethodTester {
    +
    +  // Start a SparkContext so that the closure serializer is accessible
    +  // We do not actually use this explicitly otherwise
    +  private var sc: SparkContext = null
    +  private var closureSerializer: SerializerInstance = null
    +
    +  override def beforeAll(): Unit = {
    +    sc = new SparkContext("local", "test")
    +    closureSerializer = sc.env.closureSerializer.newInstance()
    +  }
    +
    +  override def afterAll(): Unit = {
    +    sc.stop()
    +    sc = null
    +    closureSerializer = null
    +  }
    +
    +  // Some fields and methods to reference in inner closures later
    +  private val someSerializableValue = 1
    +  private val someNonSerializableValue = new NonSerializable
    +  private def someSerializableMethod() = 1
    +  private def someNonSerializableMethod() = new NonSerializable
    +
    +  /** Assert that the given closure is serializable (or not). */
    +  private def assertSerializable(closure: AnyRef, serializable: Boolean): Unit = {
    +    if (serializable) {
    +      closureSerializer.serialize(closure)
    +    } else {
    +      intercept[NotSerializableException] {
    +        closureSerializer.serialize(closure)
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Helper method for testing whether closure cleaning works as expected.
    +   * This cleans the given closure twice, with and without transitive cleaning.
    +   */
    +  private def testClean(
    --- End diff --
    
    Maybe these should be called `verifyCleaning` or something. Because they are really verifying behavior according to what you pass as `serializableBefore`/`serializableAfter`. Also it would be good to document those parameters... it's not super clear what they mean. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96905575
  
    I found the issue. Turns out `SerializationDebugger` currently cannot serialize `ClosureCleanerSuite2` due to [SPARK-7180](https://issues.apache.org/jira/browse/SPARK-7180). In a nutshell, it turns out that the debugger currently doesn't handle the case where an object both (1) inherits a serializable parent, and (2) has a non-serializable field. The reason why it passed for me locally before was because the debugger was somehow not [enabled](https://github.com/apache/spark/blob/4d9e560b5470029143926827b1cb9d72a0bfbeff/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L62), and I suspect this has something to do with the fact that I'm running Java 6. I was able to reproduce the exception locally by always enabling the debugger.
    
    Anyway, I have disabled the debugger in the test for now and hopefully it will pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97283669
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31200/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96295779
  
      [Test build #708 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/708/consoleFull) for   PR 5685 at commit [`8b71cdb`](https://github.com/apache/spark/commit/8b71cdb7953ce622fd94fda7e0c5daafeb145cca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-95883549
  
      [Test build #30930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30930/consoleFull) for   PR 5685 at commit [`2390a60`](https://github.com/apache/spark/commit/2390a608ed74a9703d3763d040421dccb51242ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96030036
  
    @srowen we only cannot null fields in parent closures in place, but the closure cleaner actually clones these parent closures so it's safe to null things out there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-95991323
  
    @srowen wondering, what do you mean by "local state"? The closure cleaner only nulls out fields in _clones_ of local objects (actually the nulling mechanism is really just that it creates a clone and then selectively copies over certain fields from the original and neglects others). So I think it should be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97286994
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97300063
  
      [Test build #31217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31217/consoleFull) for   PR 5685 at commit [`e909a42`](https://github.com/apache/spark/commit/e909a4222a132b1fc4fff87a111ece7d32b82765).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `   *   class SomethingNotSerializable `
      * `      logDebug(s" + cloning the object $obj of class $`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29088048
  
    --- Diff: core/src/test/scala/org/apache/spark/util/ClosureCleanerSuite.scala ---
    @@ -50,7 +50,7 @@ class ClosureCleanerSuite extends FunSuite {
         val obj = new TestClassWithNesting(1)
         assert(obj.run() === 96) // 4 * (1+2+3+4) + 4 * (1+2+3+4) + 16 * 1
       }
    -  
    +
    --- End diff --
    
    coming soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97287023
  
      [Test build #31217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31217/consoleFull) for   PR 5685 at commit [`e909a42`](https://github.com/apache/spark/commit/e909a4222a132b1fc4fff87a111ece7d32b82765).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97595476
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31315/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97595473
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29391976
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -77,6 +80,9 @@ private[spark] object ClosureCleaner extends Logging {
         Nil
       }
     
    +  /**
    +   * Return a list of classes that represent closures enclosed in the given closure object.
    +   */
       private def getInnerClasses(obj: AnyRef): List[Class[_]] = {
    --- End diff --
    
    what about doing `isClosure` on the class name of the object?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-95906708
  
    You know a lot more about this than I, but I was under the impression that the closure cleaner couldn't clean beyond a level or so because it would then be modifying local object state by nulling fields in them and that's not necessarily permissible. I'm sure you're on top of that, just noting my recollection from similar discussions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97283634
  
      [Test build #31200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31200/consoleFull) for   PR 5685 at commit [`3998168`](https://github.com/apache/spark/commit/399816819e0f0bc9fde9fac1a3d6b6f6ef641a74).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `   *   class SomethingNotSerializable `
      * `      logDebug(s" + cloning the object $obj of class $`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96796263
  
      [Test build #30977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30977/consoleFull) for   PR 5685 at commit [`6d4d3f1`](https://github.com/apache/spark/commit/6d4d3f1ac8da883fb814613afec35900b078b751).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch **adds the following new dependencies:**
       * `tachyon-0.6.4.jar`
       * `tachyon-client-0.6.4.jar`
    
     * This patch **removes the following dependencies:**
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29344970
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1627,7 +1627,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
           partitions: Seq[Int],
           allowLocal: Boolean
           ): Array[U] = {
    -    runJob(rdd, (context: TaskContext, iter: Iterator[T]) => func(iter), partitions, allowLocal)
    +    // We must clean `func` here before using it in another closure below
    +    // Otherwise, the closure cleaner will only clean the outer closure but not `func`
    +    val cleanedFunc = clean(func)
    --- End diff --
    
    Do we need to apply such fix to other places? E.g., RDD.scala
    
    ```Scala
      def mapPartitions[U: ClassTag](
          f: Iterator[T] => Iterator[U], preservesPartitioning: Boolean = false): RDD[U] = {
        val func = (context: TaskContext, index: Int, iter: Iterator[T]) => f(iter)
        new MapPartitionsRDD(this, sc.clean(func), preservesPartitioning)
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96924189
  
      [Test build #31109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31109/consoleFull) for   PR 5685 at commit [`9419efe`](https://github.com/apache/spark/commit/9419efea98e190766887a9ab5fbac57f45bf007c).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `   *   class SomethingNotSerializable `
      * `      logDebug(s" + cloning the object $obj of class $`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29465612
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -68,15 +75,22 @@ private[spark] object ClosureCleaner extends Logging {
       private def getOuterObjects(obj: AnyRef): List[AnyRef] = {
         for (f <- obj.getClass.getDeclaredFields if f.getName == "$outer") {
           f.setAccessible(true)
    -      if (isClosure(f.getType)) {
    -        return f.get(obj) :: getOuterObjects(f.get(obj))
    -      } else {
    -        return f.get(obj) :: Nil // Stop at the first $outer that is not a closure
    +      val outer = f.get(obj)
    +      // The outer pointer may be null if we have cleaned this closure before
    +      if (outer != null) {
    +        if (isClosure(f.getType)) {
    +          return f.get(obj) :: getOuterObjects(f.get(obj))
    --- End diff --
    
    "f.get(obj)" -> "outer"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-95905239
  
      [Test build #30930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30930/consoleFull) for   PR 5685 at commit [`2390a60`](https://github.com/apache/spark/commit/2390a608ed74a9703d3763d040421dccb51242ec).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `   *   class SomethingNotSerializable `
      * `      logDebug(s" + cloning the object $obj of class $`
      * `class FieldAccessFinder(`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97244714
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97565964
  
      [Test build #31315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31315/consoleFull) for   PR 5685 at commit [`26c7aba`](https://github.com/apache/spark/commit/26c7abaca5ea6a76f6839c29d9c209e9aaee4948).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97202987
  
      [Test build #31172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31172/consoleFull) for   PR 5685 at commit [`9187066`](https://github.com/apache/spark/commit/9187066ef9834d62d488b0ccc3be70eea514ae2c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96285685
  
      [Test build #700 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/700/consoleFull) for   PR 5685 at commit [`eb127e5`](https://github.com/apache/spark/commit/eb127e54fa04ef17523806067d074b8560ea783e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29074575
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -167,15 +294,17 @@ private[spark] object ClosureCleaner extends Logging {
         }
       }
     
    -  private def instantiateClass(cls: Class[_], outer: AnyRef, inInterpreter: Boolean): AnyRef = {
    -    // logInfo("Creating a " + cls + " with outer = " + outer)
    +  private def instantiateClass(
    +      cls: Class[_],
    +      enclosingObject: AnyRef,
    +      inInterpreter: Boolean): AnyRef = {
         if (!inInterpreter) {
           // This is a bona fide closure class, whose constructor has no effects
           // other than to set its fields, so use its constructor
           val cons = cls.getConstructors()(0)
           val params = cons.getParameterTypes.map(createNullValue).toArray
    -      if (outer != null) {
    -        params(0) = outer // First param is always outer object
    +      if (enclosingObject!= null) {
    --- End diff --
    
    oops thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97300068
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31217/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97565470
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97595448
  
      [Test build #31315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31315/consoleFull) for   PR 5685 at commit [`26c7aba`](https://github.com/apache/spark/commit/26c7abaca5ea6a76f6839c29d9c209e9aaee4948).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `   *   class SomethingNotSerializable `
      * `      logDebug(s" + cloning the object $obj of class $`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97565410
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97275855
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96769433
  
      [Test build #30977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30977/consoleFull) for   PR 5685 at commit [`6d4d3f1`](https://github.com/apache/spark/commit/6d4d3f1ac8da883fb814613afec35900b078b751).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96479602
  
    Weird, tests pass locally. Investigating.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29466279
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -68,15 +75,22 @@ private[spark] object ClosureCleaner extends Logging {
       private def getOuterObjects(obj: AnyRef): List[AnyRef] = {
         for (f <- obj.getClass.getDeclaredFields if f.getName == "$outer") {
           f.setAccessible(true)
    -      if (isClosure(f.getType)) {
    -        return f.get(obj) :: getOuterObjects(f.get(obj))
    -      } else {
    -        return f.get(obj) :: Nil // Stop at the first $outer that is not a closure
    +      val outer = f.get(obj)
    +      // The outer pointer may be null if we have cleaned this closure before
    +      if (outer != null) {
    +        if (isClosure(f.getType)) {
    +          return f.get(obj) :: getOuterObjects(f.get(obj))
    +        } else {
    +          return f.get(obj) :: Nil // Stop at the first $outer that is not a closure
    --- End diff --
    
    "f.get(obj)" -> "outer"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96282586
  
    Jenkins, test this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96315314
  
      [Test build #709 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/709/consoleFull) for   PR 5685 at commit [`4aab379`](https://github.com/apache/spark/commit/4aab379c0d6bc12a9e0c4a984d87bbfc21bd948b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29362732
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1627,7 +1627,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
           partitions: Seq[Int],
           allowLocal: Boolean
           ): Array[U] = {
    -    runJob(rdd, (context: TaskContext, iter: Iterator[T]) => func(iter), partitions, allowLocal)
    +    // We must clean `func` here before using it in another closure below
    +    // Otherwise, the closure cleaner will only clean the outer closure but not `func`
    +    val cleanedFunc = clean(func)
    --- End diff --
    
    Ok, I have decided to fix this in a separate patch since it's really a separate issue. I have filed [SPARK-7237](https://issues.apache.org/jira/browse/SPARK-7237) for that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29083093
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -77,6 +80,9 @@ private[spark] object ClosureCleaner extends Logging {
         Nil
       }
     
    +  /**
    +   * Return a list of classes that represent closures enclosed in the given closure object.
    +   */
       private def getInnerClasses(obj: AnyRef): List[Class[_]] = {
    --- End diff --
    
    Can we do validations here that `obj` is a closure type (for instance doing an instanceOf check with an exception)? It would add a bit more type safety and clarity to the API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96107937
  
    Yes, this also affects user programs. For example I modified SparkPi to follow the pattern I described in the PR description
    ```
        ...
        val slices = if (args.length > 0) args(0).toInt else 2
        (1 to 1).foreach { j =>
          val count = spark.parallelize(1 until n, slices).map { i =>
            val z = slices // *** This is the culprit ***
            val x = random * 2 - 1
            val y = random * 2 - 1
            if (x * x + y * y < 1) 1 else 0
          }.reduce(_ + _)
        }
        ...
    ```
    If you run this in master branch you will run into a task not serializable exception. I was able to verify that this patch fixes this as long as `spark.closureCleaner.transitive` is enabled. If this is not enabled, if fails with the same exception as before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29065533
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -167,15 +294,17 @@ private[spark] object ClosureCleaner extends Logging {
         }
       }
     
    -  private def instantiateClass(cls: Class[_], outer: AnyRef, inInterpreter: Boolean): AnyRef = {
    -    // logInfo("Creating a " + cls + " with outer = " + outer)
    +  private def instantiateClass(
    +      cls: Class[_],
    +      enclosingObject: AnyRef,
    +      inInterpreter: Boolean): AnyRef = {
         if (!inInterpreter) {
           // This is a bona fide closure class, whose constructor has no effects
           // other than to set its fields, so use its constructor
           val cons = cls.getConstructors()(0)
           val params = cons.getParameterTypes.map(createNullValue).toArray
    -      if (outer != null) {
    -        params(0) = outer // First param is always outer object
    +      if (enclosingObject!= null) {
    --- End diff --
    
    nit spcae


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29405393
  
    --- Diff: core/src/test/scala/org/apache/spark/util/ClosureCleanerSuite2.scala ---
    @@ -0,0 +1,562 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.util
    +
    +import java.io.NotSerializableException
    +
    +import scala.collection.mutable
    +
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, PrivateMethodTester}
    +
    +import org.apache.spark.{SparkContext, SparkException}
    +import org.apache.spark.serializer.SerializerInstance
    +
    +/**
    + * Another test suite for the closure cleaner that is finer-grained.
    + * For tests involving end-to-end Spark jobs, see {{ClosureCleanerSuite}}.
    + */
    +class ClosureCleanerSuite2 extends FunSuite with BeforeAndAfterAll with PrivateMethodTester {
    +
    +  // Start a SparkContext so that the closure serializer is accessible
    +  // We do not actually use this explicitly otherwise
    +  private var sc: SparkContext = null
    +  private var closureSerializer: SerializerInstance = null
    +
    +  override def beforeAll(): Unit = {
    +    sc = new SparkContext("local", "test")
    +    closureSerializer = sc.env.closureSerializer.newInstance()
    +  }
    +
    +  override def afterAll(): Unit = {
    +    sc.stop()
    +    sc = null
    +    closureSerializer = null
    +  }
    +
    +  // Some fields and methods to reference in inner closures later
    +  private val someSerializableValue = 1
    +  private val someNonSerializableValue = new NonSerializable
    +  private def someSerializableMethod() = 1
    +  private def someNonSerializableMethod() = new NonSerializable
    +
    +  /** Assert that the given closure is serializable (or not). */
    +  private def assertSerializable(closure: AnyRef, serializable: Boolean): Unit = {
    +    if (serializable) {
    +      closureSerializer.serialize(closure)
    +    } else {
    +      intercept[NotSerializableException] {
    +        closureSerializer.serialize(closure)
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Helper method for testing whether closure cleaning works as expected.
    +   * This cleans the given closure twice, with and without transitive cleaning.
    +   */
    +  private def testClean(
    +      closure: AnyRef,
    +      serializableBefore: Boolean,
    +      serializableAfter: Boolean): Unit = {
    +    testClean(closure, serializableBefore, serializableAfter, transitive = true)
    +    testClean(closure, serializableBefore, serializableAfter, transitive = false)
    +  }
    +
    +  /** Helper method for testing whether closure cleaning works as expected. */
    +  private def testClean(
    +      closure: AnyRef,
    +      serializableBefore: Boolean,
    +      serializableAfter: Boolean,
    +      transitive: Boolean): Unit = {
    +    assertSerializable(closure, serializableBefore)
    +    // If the resulting closure is not serializable even after
    +    // cleaning, we expect ClosureCleaner to throw a SparkException
    +    if (serializableAfter) {
    +      ClosureCleaner.clean(closure, checkSerializable = true, transitive)
    +    } else {
    +      intercept[SparkException] {
    +        ClosureCleaner.clean(closure, checkSerializable = true, transitive)
    +      }
    +    }
    +    assertSerializable(closure, serializableAfter)
    +  }
    +
    +  /**
    +   * Return the fields accessed by the given closure by class.
    +   * This also optionally finds the fields transitively referenced through methods invocations.
    +   */
    +  private def findAccessedFields(
    +      closure: AnyRef,
    +      outerClasses: Seq[Class[_]],
    +      findTransitively: Boolean): Map[Class[_], Set[String]] = {
    +    val fields = new mutable.HashMap[Class[_], mutable.Set[String]]
    +    outerClasses.foreach { c => fields(c) = new mutable.HashSet[String] }
    +    ClosureCleaner.getClassReader(closure.getClass)
    +      .accept(new FieldAccessFinder(fields, findTransitively), 0)
    +    fields.mapValues(_.toSet).toMap
    +  }
    +
    +  // Accessors for private methods
    +  private val _isClosure = PrivateMethod[Boolean]('isClosure)
    +  private val _getInnerClasses = PrivateMethod[List[Class[_]]]('getInnerClasses)
    +  private val _getOuterClasses = PrivateMethod[List[Class[_]]]('getOuterClasses)
    +  private val _getOuterObjects = PrivateMethod[List[AnyRef]]('getOuterObjects)
    +
    +  private def isClosure(obj: AnyRef): Boolean = {
    +    ClosureCleaner invokePrivate _isClosure(obj)
    +  }
    +
    +  private def getInnerClasses(closure: AnyRef): List[Class[_]] = {
    +    ClosureCleaner invokePrivate _getInnerClasses(closure)
    +  }
    +
    +  private def getOuterClasses(closure: AnyRef): List[Class[_]] = {
    +    ClosureCleaner invokePrivate _getOuterClasses(closure)
    +  }
    +
    +  private def getOuterObjects(closure: AnyRef): List[AnyRef] = {
    +    ClosureCleaner invokePrivate _getOuterObjects(closure)
    +  }
    +
    +  test("get inner classes") {
    +    val closure1 = () => 1
    +    val closure2 = () => { () => 1 }
    +    val closure3 = (i: Int) => {
    +      (1 to i).map { x => x + 1 }.filter { x => x > 5 }
    +    }
    +    val closure4 = (j: Int) => {
    +      (1 to j).flatMap { x =>
    +        (1 to x).flatMap { y =>
    +          (1 to y).map { z => z + 1 }
    +        }
    +      }
    +    }
    +    val inner1 = getInnerClasses(closure1)
    +    val inner2 = getInnerClasses(closure2)
    +    val inner3 = getInnerClasses(closure3)
    +    val inner4 = getInnerClasses(closure4)
    +    assert(inner1.isEmpty)
    +    assert(inner2.size === 1)
    +    assert(inner3.size === 2)
    +    assert(inner4.size === 3)
    +    assert(inner2.forall(isClosure))
    +    assert(inner3.forall(isClosure))
    +    assert(inner4.forall(isClosure))
    +  }
    +
    +  test("get outer classes and objects") {
    +    val localValue = someSerializableValue
    +    val closure1 = () => 1
    +    val closure2 = () => localValue
    +    val closure3 = () => someSerializableValue
    +    val closure4 = () => someSerializableMethod()
    +    val outerClasses1 = getOuterClasses(closure1)
    +    val outerClasses2 = getOuterClasses(closure2)
    +    val outerClasses3 = getOuterClasses(closure3)
    +    val outerClasses4 = getOuterClasses(closure4)
    +    val outerObjects1 = getOuterObjects(closure1)
    +    val outerObjects2 = getOuterObjects(closure2)
    +    val outerObjects3 = getOuterObjects(closure3)
    +    val outerObjects4 = getOuterObjects(closure4)
    +
    +    // The classes and objects should have the same size
    +    assert(outerClasses1.size === outerObjects1.size)
    +    assert(outerClasses2.size === outerObjects2.size)
    +    assert(outerClasses3.size === outerObjects3.size)
    +    assert(outerClasses4.size === outerObjects4.size)
    +
    +    // These do not have $outer pointers because they reference only local variables
    +    assert(outerClasses1.isEmpty)
    +    assert(outerClasses2.isEmpty)
    +
    +    // These closures do have $outer pointers because they ultimately reference `this`
    +    // The first $outer pointer refers to the closure defines this test (see FunSuite#test)
    +    // The second $outer pointer refers to ClosureCleanerSuite2
    +    assert(outerClasses3.size === 2)
    +    assert(outerClasses4.size === 2)
    +    assert(isClosure(outerClasses3(0)))
    +    assert(isClosure(outerClasses4(0)))
    +    assert(outerClasses3(0) === outerClasses4(0)) // part of the same "FunSuite#test" scope
    +    assert(outerClasses3(1) === this.getClass)
    +    assert(outerClasses4(1) === this.getClass)
    +    assert(outerObjects3(1) === this)
    +    assert(outerObjects4(1) === this)
    +  }
    +
    +  test("get outer classes and objects with nesting") {
    +    val localValue = someSerializableValue
    +
    +    val test1 = () => {
    +      val x = 1
    +      val closure1 = () => 1
    +      val closure2 = () => x
    +      val outerClasses1 = getOuterClasses(closure1)
    +      val outerClasses2 = getOuterClasses(closure2)
    +      val outerObjects1 = getOuterObjects(closure1)
    +      val outerObjects2 = getOuterObjects(closure2)
    +      assert(outerClasses1.size === outerObjects1.size)
    +      assert(outerClasses2.size === outerObjects2.size)
    +      // These inner closures only reference local variables, and so do not have $outer pointers
    +      assert(outerClasses1.isEmpty)
    +      assert(outerClasses2.isEmpty)
    +    }
    +
    +    val test2 = () => {
    +      def y = 1
    +      val closure1 = () => 1
    +      val closure2 = () => y
    +      val closure3 = () => localValue
    +      val outerClasses1 = getOuterClasses(closure1)
    +      val outerClasses2 = getOuterClasses(closure2)
    +      val outerClasses3 = getOuterClasses(closure3)
    +      val outerObjects1 = getOuterObjects(closure1)
    +      val outerObjects2 = getOuterObjects(closure2)
    +      val outerObjects3 = getOuterObjects(closure3)
    +      assert(outerClasses1.size === outerObjects1.size)
    +      assert(outerClasses2.size === outerObjects2.size)
    +      assert(outerClasses3.size === outerObjects3.size)
    +      // Same as above, this closure only references local variables
    +      assert(outerClasses1.isEmpty)
    +      // This closure references the "test2" scope because it needs to find the method `y`
    +      // Scope hierarchy: "test2" < "FunSuite#test" < ClosureCleanerSuite2
    +      assert(outerClasses2.size === 3)
    +      // This closure references the "test2" scope because it needs to find the `localValue`
    +      // defined outside of this scope
    +      assert(outerClasses3.size === 3)
    +      assert(isClosure(outerClasses2(0)))
    +      assert(isClosure(outerClasses3(0)))
    +      assert(isClosure(outerClasses2(1)))
    +      assert(isClosure(outerClasses3(1)))
    +      assert(outerClasses2(0) === outerClasses3(0)) // part of the same "test2" scope
    +      assert(outerClasses2(1) === outerClasses3(1)) // part of the same "FunSuite#test" scope
    +      assert(outerClasses2(2) === this.getClass)
    +      assert(outerClasses3(2) === this.getClass)
    +      assert(outerObjects2(2) === this)
    +      assert(outerObjects3(2) === this)
    +    }
    +
    +    test1()
    +    test2()
    +  }
    +
    +  test("find accessed fields") {
    +    val localValue = someSerializableValue
    +    val closure1 = () => 1
    +    val closure2 = () => localValue
    +    val closure3 = () => someSerializableValue
    +    val outerClasses1 = getOuterClasses(closure1)
    +    val outerClasses2 = getOuterClasses(closure2)
    +    val outerClasses3 = getOuterClasses(closure3)
    +
    +    val fields1 = findAccessedFields(closure1, outerClasses1, findTransitively = false)
    +    val fields2 = findAccessedFields(closure2, outerClasses2, findTransitively = false)
    +    val fields3 = findAccessedFields(closure3, outerClasses3, findTransitively = false)
    +    assert(fields1.isEmpty)
    +    assert(fields2.isEmpty)
    +    assert(fields3.size === 2)
    +    // This corresponds to the "FunSuite#test" closure. This is empty because the
    +    // `someSerializableValue` belongs to its parent (i.e. ClosureCleanerSuite2).
    +    assert(fields3(outerClasses3(0)).isEmpty)
    +    // This corresponds to the ClosureCleanerSuite2. This is also empty, however,
    +    // because accessing a `ClosureCleanerSuite2#someSerializableValue` actually involves a
    +    // method call. Since we do not find fields transitively, we will not recursively trace
    +    // through the fields referenced by this method.
    +    assert(fields3(outerClasses3(1)).isEmpty)
    +
    +    val fields1t = findAccessedFields(closure1, outerClasses1, findTransitively = true)
    +    val fields2t = findAccessedFields(closure2, outerClasses2, findTransitively = true)
    +    val fields3t = findAccessedFields(closure3, outerClasses3, findTransitively = true)
    +    assert(fields1t.isEmpty)
    +    assert(fields2t.isEmpty)
    +    assert(fields3t.size === 2)
    +    // Because we find fields transitively now, we are able to detect that we need the
    +    // $outer pointer to get the field from the ClosureCleanerSuite2
    +    assert(fields3t(outerClasses3(0)).size === 1)
    +    assert(fields3t(outerClasses3(0)).head === "$outer")
    +    assert(fields3t(outerClasses3(1)).size === 1)
    +    assert(fields3t(outerClasses3(1)).head.contains("someSerializableValue"))
    +  }
    +
    +  test("find accessed fields with nesting") {
    +    val localValue = someSerializableValue
    +
    +    val test1 = () => {
    +      def a = localValue + 1
    +      val closure1 = () => 1
    +      val closure2 = () => a
    +      val closure3 = () => localValue
    +      val closure4 = () => someSerializableValue
    +      val outerClasses1 = getOuterClasses(closure1)
    +      val outerClasses2 = getOuterClasses(closure2)
    +      val outerClasses3 = getOuterClasses(closure3)
    +      val outerClasses4 = getOuterClasses(closure4)
    +
    +      // First, find only fields accessed directly, not transitively, by these closures
    +      val fields1 = findAccessedFields(closure1, outerClasses1, findTransitively = false)
    +      val fields2 = findAccessedFields(closure2, outerClasses2, findTransitively = false)
    +      val fields3 = findAccessedFields(closure3, outerClasses3, findTransitively = false)
    +      val fields4 = findAccessedFields(closure4, outerClasses4, findTransitively = false)
    +      assert(fields1.isEmpty)
    +      // "test1" < "FunSuite#test" < ClosureCleanerSuite2
    --- End diff --
    
    I think the middle one should maybe be clear that it's really the anonymous class passed into the `test` method, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29082998
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1740,7 +1740,8 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
        *   serializable
        */
       private[spark] def clean[F <: AnyRef](f: F, checkSerializable: Boolean = true): F = {
    -    ClosureCleaner.clean(f, checkSerializable)
    +    val cleanTransitively = conf.getBoolean("spark.closureCleaner.transitive", true)
    --- End diff --
    
    I'm not sure that it makes sense to flag this since (provided we add scopes) then every user program will break if this is toggled. I think this feature will need to be removed wholesale if there are issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97276105
  
      [Test build #31200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31200/consoleFull) for   PR 5685 at commit [`3998168`](https://github.com/apache/spark/commit/399816819e0f0bc9fde9fac1a3d6b6f6ef641a74).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96285701
  
    @andrewor14 I manually triggered this using Josh's tool - looks like PRB is being tempermental.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121][WIP] Closure cleaner ...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-96103803
  
    BTW the general workaround for this kind of stuff is to grab the values from outside the closure into a local val before you call something. I'd look into how to do that before changing the ClosureCleaner too heavily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29391102
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -101,21 +115,124 @@ private[spark] object ClosureCleaner extends Logging {
         }
       }
     
    -  def clean(func: AnyRef, checkSerializable: Boolean = true) {
    +  /**
    +   * Clean the given closure in place.
    +   *
    +   * More specifically, this renders the given closure serializable as long as it does not
    +   * explicitly reference unserializable objects.
    +   *
    +   * @param closure the closure to clean
    +   * @param checkSerializable whether to verify that the closure is serializable after cleaning
    +   * @param cleanTransitively whether to clean enclosing closures transitively
    +   */
    +  def clean(
    +      closure: AnyRef,
    +      checkSerializable: Boolean = true,
    +      cleanTransitively: Boolean = true): Unit = {
    +    clean(closure, checkSerializable, cleanTransitively, Map.empty)
    +  }
    +
    +  /**
    +   * Helper method to clean the given closure in place.
    +   *
    +   * The mechanism is to traverse the hierarchy of enclosing closures and null out any
    +   * references along the way that are not actually used by the starting closure, but are
    +   * nevertheless included in the compiled anonymous classes. Note that it is unsafe to
    +   * simply mutate the enclosing closures in place, as other code paths may depend on them.
    +   * Instead, we clone each enclosing closure and set the parent pointers accordingly.
    +   *
    +   * By default, closures are cleaned transitively. This means we detect whether enclosing
    +   * objects are actually referenced by the starting one, either directly or transitively,
    +   * and, if not, sever these closures from the hierarchy. In other words, in addition to
    +   * nulling out unused field references, we also null out any parent pointers that refer
    +   * to enclosing objects not actually needed by the starting closure. We determine
    +   * transitivity by tracing through the tree of all methods ultimately invoked by the
    +   * inner closure and record all the fields referenced in the process.
    +   *
    +   * For instance, transitive cleaning is necessary in the following scenario:
    +   *
    +   *   class SomethingNotSerializable {
    +   *     def someValue = 1
    +   *     def someMethod(): Unit = scope("one") {
    +   *       def x = someValue
    +   *       def y = 2
    +   *       scope("two") { println(y + 1) }
    +   *     }
    +   *     def scope(name: String)(body: => Unit) = body
    --- End diff --
    
    maybe move this def up to define it before you use it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5685#issuecomment-97275891
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5685#discussion_r29392231
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -184,19 +335,17 @@ private[spark] object ClosureCleaner extends Logging {
           val parentCtor = classOf[java.lang.Object].getDeclaredConstructor()
           val newCtor = rf.newConstructorForSerialization(cls, parentCtor)
           val obj = newCtor.newInstance().asInstanceOf[AnyRef]
    -      if (outer != null) {
    -        // logInfo("3: Setting $outer on " + cls + " to " + outer);
    +      if (enclosingObject != null) {
             val field = cls.getDeclaredField("$outer")
             field.setAccessible(true)
    -        field.set(obj, outer)
    +        field.set(obj, enclosingObject)
           }
           obj
         }
       }
     }
     
    -private[spark]
    -class ReturnStatementFinder extends ClassVisitor(ASM4) {
    +private class ReturnStatementFinder extends ClassVisitor(ASM4) {
       override def visitMethod(access: Int, name: String, desc: String,
           sig: String, exceptions: Array[String]): MethodVisitor = {
         if (name.contains("apply")) {
    --- End diff --
    
    We can do this later, but if possible we should try to return the line number of the return statement if possible (is that even possible?).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org