You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2016/04/12 14:45:14 UTC

[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/12327

    [SPARK-14558][CORE] In ClosureCleaner, clean the outer pointer if it's a REPL line object

    ## What changes were proposed in this pull request?
    
    When we clean a closure, if its outermost parent is not a closure, we won't clone and clean it as cloning user's objects is dangerous. However, if it's a REPL line object, which may carry a lot of unnecessary references(like hadoop conf, spark conf, etc.), we should clean it as it's not a user object.
    
    This PR improves the check for user's objects to exclude REPL line object.
    
    ## How was this patch tested?
    
    existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark closure

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12327
    
----
commit b78b2cef91a07784e2c92bb266585af4e2df9896
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-04-12T12:33:45Z

    clean the outer pointer if it's a REPL line object

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-212182451
  
    @a-roberts wow that's interesting, I think maybe IBM JDK brings more information to scala REPL line object, which increases the cache size. It will be great if you can look into it, thanks!
    
    BTW if this problem do matters for you, feel free to send a PR to increase the threshold(20%).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-210077561
  
    LGTM merging into master thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by a-roberts <gi...@git.apache.org>.
Github user a-roberts commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-212991789
  
    @cloud-fan I've had a closer look at this and think a more robust method would be to use weak references to identify when an object is out of scope, with IBM Java we see the 29% reduction between cache size 1 and cache size 2 but with OpenJDK we see a 4% increase, suggesting that we can't rely on the sizes being similar across JDK vendors, now thinking this is a test case issue rather than a problem in the ClosureCleaner or IBM Java code.
    
    With IBM Java our second cache size (after repartitioning) is much smaller; repartitioning uses ContextCleaner whereas with OpenJDK it grows. Either we have a bigger memory footprint or the cached size is being calculated incorrectly (looks fine to me and we actually have smaller object sizes). The problem on Z was due to using repl/pom.xml instead of pom.xml in the Spark home directory (same result if we use the right pom.xml file) so can be discarded for this discussion.
    
    I'm going to figure out what's in the Scala REPL line objects between vendors, I think the intention of this commit is to test that the REPL line object is being cleaned but the assertion in place at the moment doesn't look to be correct (the size is bigger after the cleaning and cacheSize2 is the result of cleaning if I'm understanding the code correctly), have I missed a trick?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by a-roberts <gi...@git.apache.org>.
Github user a-roberts commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-222107940
  
    > cacheSize1 and cacheSize2 are both the size after cleaning. The difference is that, cacheSize1 is the size after cleaned the data with line object reference, cacheSize2 is the size after cleaned the data without line object reference.
    
    Looking for clarity here, is it true that clean "with the reference" should be bigger (cacheSize1) and clean "without the reference" should be smaller (cacheSize2)? 
    
    OpenJDK, cacheSize1: 180392, cacheSize2: 187896 (bigger without the line object reference)
    
    IBM JDK, cacheSize1: 354692, cacheSize2: 263800 (smaller without the line object reference)
    
    What exactly does "without line object reference" mean and should cacheSize1 be smaller or bigger than cacheSize2?
    
    I know the SizeEstimator overestimates for IBM Java so our cached footprint is much larger (handling this), so because of the larger difference we get this test failing, OpenJDK **fails** with Kryo and IBM **passes** with Kryo for this test.
    
    A better check would be to run with and without the closure cleaner change and to check the second result is less by the size of the line object, so based on our cacheSize2 being smaller (without the line object reference), I'm thinking that IBM Java functions as expected and OpenJDK doesn't - but this depends on my questions above, interested to hear what you think @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-209779278
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59605684
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -233,16 +217,22 @@ private[spark] object ClosureCleaner extends Logging {
         // Note that all outer objects but the outermost one (first one in this list) must be closures
         var outerPairs: List[(Class[_], AnyRef)] = (outerClasses zip outerObjects).reverse
         var parent: AnyRef = null
    -    if (outerPairs.size > 0 && !isClosure(outerPairs.head._1)) {
    -      // The closure is ultimately nested inside a class; keep the object of that
    -      // class without cloning it since we don't want to clone the user's objects.
    -      // Note that we still need to keep around the outermost object itself because
    -      // we need it to clone its child closure later (see below).
    -      logDebug(s" + outermost object is not a closure, so do not clone it: ${outerPairs.head}")
    -      parent = outerPairs.head._2 // e.g. SparkContext
    -      outerPairs = outerPairs.tail
    -    } else if (outerPairs.size > 0) {
    -      logDebug(s" + outermost object is a closure, so we just keep it: ${outerPairs.head}")
    +    if (outerPairs.size > 0) {
    +      if (isClosure(outerPairs.head._1)) {
    +        logDebug(s" + outermost object is a closure, so we just keep it: ${outerPairs.head}")
    --- End diff --
    
    I wrote this comment originally and now found it confusing. Can you say `so we clone it` instead? Same in L224


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-209595775
  
    @cloud-fan Can you add a test to `ClosureCleanerSuite2`? Otherwise this LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59368774
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -77,35 +77,19 @@ private[spark] object ClosureCleaner extends Logging {
        */
       private def getInnerClosureClasses(obj: AnyRef): List[Class[_]] = {
         val seen = Set[Class[_]](obj.getClass)
    -    var stack = List[Class[_]](obj.getClass)
    +    val stack = Stack[Class[_]](obj.getClass)
    --- End diff --
    
    kind of unrelated, but it's obvious that using `Stach` is more efficient here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-208939090
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59606594
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -233,16 +217,22 @@ private[spark] object ClosureCleaner extends Logging {
         // Note that all outer objects but the outermost one (first one in this list) must be closures
         var outerPairs: List[(Class[_], AnyRef)] = (outerClasses zip outerObjects).reverse
         var parent: AnyRef = null
    -    if (outerPairs.size > 0 && !isClosure(outerPairs.head._1)) {
    -      // The closure is ultimately nested inside a class; keep the object of that
    -      // class without cloning it since we don't want to clone the user's objects.
    -      // Note that we still need to keep around the outermost object itself because
    -      // we need it to clone its child closure later (see below).
    -      logDebug(s" + outermost object is not a closure, so do not clone it: ${outerPairs.head}")
    -      parent = outerPairs.head._2 // e.g. SparkContext
    -      outerPairs = outerPairs.tail
    -    } else if (outerPairs.size > 0) {
    -      logDebug(s" + outermost object is a closure, so we just keep it: ${outerPairs.head}")
    +    if (outerPairs.size > 0) {
    +      if (isClosure(outerPairs.head._1)) {
    +        logDebug(s" + outermost object is a closure, so we just keep it: ${outerPairs.head}")
    --- End diff --
    
    +1; I also was confused by this comment's existing wording and it look me a while to understand that "keeping it" here means making it eligible for cloning and cleaning further down.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-208930686
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55604/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-209000940
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55614/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-208930152
  
    **[Test build #55604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55604/consoleFull)** for PR 12327 at commit [`b78b2ce`](https://github.com/apache/spark/commit/b78b2cef91a07784e2c92bb266585af4e2df9896).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-209779108
  
    **[Test build #55792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55792/consoleFull)** for PR 12327 at commit [`3db685c`](https://github.com/apache/spark/commit/3db685c5bf4ad189da49fd75eb723c6485e83671).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59762122
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -20,6 +20,7 @@ package org.apache.spark.util
     import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
     
     import scala.collection.mutable.{Map, Set, Stack}
    +import scala.language.existentials
    --- End diff --
    
    not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-208930676
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-209000141
  
    **[Test build #55614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55614/consoleFull)** for PR 12327 at commit [`b78b2ce`](https://github.com/apache/spark/commit/b78b2cef91a07784e2c92bb266585af4e2df9896).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59762537
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -19,7 +19,8 @@ package org.apache.spark.util
     
     import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
     
    -import scala.collection.mutable.{Map, Set}
    +import scala.collection.mutable.{Map, Set, Stack}
    +import scala.language.existentials
    --- End diff --
    
    oops, I forgot to. Oh well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59762228
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -19,7 +19,8 @@ package org.apache.spark.util
     
     import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
     
    -import scala.collection.mutable.{Map, Set}
    +import scala.collection.mutable.{Map, Set, Stack}
    +import scala.language.existentials
    --- End diff --
    
    i'll remove this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-222195981
  
    `cacheSize1` and `cacheSize2` should be almost same, because they cache the same data, with different RDD partition numbers. However, before this PR, `cacheSize1` is much larger because it references line object and doesn't clean it.
    
    > A better check would be to run with and without the closure cleaner change
    Yea, this is what I did locally, but how to write a test for it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by a-roberts <gi...@git.apache.org>.
Github user a-roberts commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-211999755
  
    Hi, with this new test I'm seeing large deviations using IBM JDKs and different platforms, for example:
    
    `java.lang.AssertionError: assertion failed: deviation too large: 0.25359712230215825, first size: 26688, second size: 19920`
    
    Is the 20% deviation particularly important? Why this number?
    
    FYI here's the comparison between different Java vendors and architectures:
    - OpenJDK on Intel (passes), cacheSize1: 180392 and cacheSize2: 187896 (4.07%)
    - IBM JDK with SUSE on zSystems (fails): cacheSize1: 26688 and cacheSize2: 19920 (29%)
    - IBM JDK with Ubuntu 14 04 on Power 8 LE (fails): cacheSize1: 26688 and cacheSize2: 19920 (29%)
    - IBM JDK with Ubuntu 14 04 on Intel (fails): cacheSize1: 354692 and cacheSize2: 263800 (29.3%)
    
    I'll look into this, interesting that with IBM Java it's always a very similar and much larger percentage


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-208940317
  
    **[Test build #55614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55614/consoleFull)** for PR 12327 at commit [`b78b2ce`](https://github.com/apache/spark/commit/b78b2cef91a07784e2c92bb266585af4e2df9896).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12327


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-209163845
  
    cc @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59605580
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -233,16 +217,22 @@ private[spark] object ClosureCleaner extends Logging {
         // Note that all outer objects but the outermost one (first one in this list) must be closures
         var outerPairs: List[(Class[_], AnyRef)] = (outerClasses zip outerObjects).reverse
         var parent: AnyRef = null
    -    if (outerPairs.size > 0 && !isClosure(outerPairs.head._1)) {
    -      // The closure is ultimately nested inside a class; keep the object of that
    -      // class without cloning it since we don't want to clone the user's objects.
    -      // Note that we still need to keep around the outermost object itself because
    -      // we need it to clone its child closure later (see below).
    -      logDebug(s" + outermost object is not a closure, so do not clone it: ${outerPairs.head}")
    -      parent = outerPairs.head._2 // e.g. SparkContext
    -      outerPairs = outerPairs.tail
    -    } else if (outerPairs.size > 0) {
    -      logDebug(s" + outermost object is a closure, so we just keep it: ${outerPairs.head}")
    +    if (outerPairs.size > 0) {
    +      if (isClosure(outerPairs.head._1)) {
    +        logDebug(s" + outermost object is a closure, so we just keep it: ${outerPairs.head}")
    +      } else if (outerPairs.head._1.getName.startsWith("$line")) {
    --- End diff --
    
    can you add a short comment in this `if` case describing why we treat REPL classes as a special case (and include `SPARK-14558`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59764217
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -19,7 +19,8 @@ package org.apache.spark.util
     
     import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
     
    -import scala.collection.mutable.{Map, Set}
    +import scala.collection.mutable.{Map, Set, Stack}
    +import scala.language.existentials
    --- End diff --
    
    oh, good thing I forgot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-208887603
  
    **[Test build #55604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55604/consoleFull)** for PR 12327 at commit [`b78b2ce`](https://github.com/apache/spark/commit/b78b2cef91a07784e2c92bb266585af4e2df9896).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-209000936
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59368815
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -77,35 +77,19 @@ private[spark] object ClosureCleaner extends Logging {
        */
       private def getInnerClosureClasses(obj: AnyRef): List[Class[_]] = {
         val seen = Set[Class[_]](obj.getClass)
    -    var stack = List[Class[_]](obj.getClass)
    +    val stack = Stack[Class[_]](obj.getClass)
         while (!stack.isEmpty) {
    -      val cr = getClassReader(stack.head)
    -      stack = stack.tail
    +      val cr = getClassReader(stack.pop())
           val set = Set[Class[_]]()
           cr.accept(new InnerClosureFinder(set), 0)
           for (cls <- set -- seen) {
             seen += cls
    -        stack = cls :: stack
    +        stack.push(cls)
           }
         }
         (seen - obj.getClass).toList
       }
     
    -  private def createNullValue(cls: Class[_]): AnyRef = {
    --- End diff --
    
    This method is never used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-208886915
  
    cc @yhuai @JoshRosen @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-209779279
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55792/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-213170704
  
    `cacheSize1` and `cacheSize2` are both the size after cleaning. The difference is that, `cacheSize1` is the size after cleaned the data with line object reference, `cacheSize2` is the size after cleaned the data without line object reference.
    
    And yes, the test missed the difference of size of Scala REPL line objects between vendors, feel free to send a PR to fix it and thanks for investigating this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-208939061
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-209548747
  
    @JoshRosen Can you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59605473
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -233,16 +217,22 @@ private[spark] object ClosureCleaner extends Logging {
         // Note that all outer objects but the outermost one (first one in this list) must be closures
         var outerPairs: List[(Class[_], AnyRef)] = (outerClasses zip outerObjects).reverse
         var parent: AnyRef = null
    -    if (outerPairs.size > 0 && !isClosure(outerPairs.head._1)) {
    -      // The closure is ultimately nested inside a class; keep the object of that
    -      // class without cloning it since we don't want to clone the user's objects.
    -      // Note that we still need to keep around the outermost object itself because
    -      // we need it to clone its child closure later (see below).
    -      logDebug(s" + outermost object is not a closure, so do not clone it: ${outerPairs.head}")
    -      parent = outerPairs.head._2 // e.g. SparkContext
    -      outerPairs = outerPairs.tail
    -    } else if (outerPairs.size > 0) {
    -      logDebug(s" + outermost object is a closure, so we just keep it: ${outerPairs.head}")
    +    if (outerPairs.size > 0) {
    +      if (isClosure(outerPairs.head._1)) {
    --- End diff --
    
    not really your code, but can you do:
    ```
    val (outermostClass, outermostObject) = outerPairs.head
    if (isClosure(outermostClass)) {
      ...
    } else if (outermostClass.getName.startsWith("$line")) {
      ...
    } else {
      ...
      parent = outermostObject
      outerPairs = outerPairs.tail
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12327#issuecomment-209757628
  
    **[Test build #55792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55792/consoleFull)** for PR 12327 at commit [`3db685c`](https://github.com/apache/spark/commit/3db685c5bf4ad189da49fd75eb723c6485e83671).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14558][CORE] In ClosureCleaner, clean t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12327#discussion_r59763316
  
    --- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
    @@ -19,7 +19,8 @@ package org.apache.spark.util
     
     import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
     
    -import scala.collection.mutable.{Map, Set}
    +import scala.collection.mutable.{Map, Set, Stack}
    +import scala.language.existentials
    --- End diff --
    
    Actually it's needed, or you will get compile warning, it's caused by this line: `val (outermostClass, outermostObject) = outerPairs.head`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org