You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/05/02 13:57:37 UTC

[GitHub] spark pull request #17833: [SPARK-20558][CORE] clear InheritableThreadLocal ...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/17833

    [SPARK-20558][CORE] clear InheritableThreadLocal variables in SparkContext when stopping it

    ## What changes were proposed in this pull request?
    
    To better understand this problem, let's take a look at an example first:
    ```
    object Main {
      def main(args: Array[String]): Unit = {
        var t = new Test
        new Thread(new Runnable {
          override def run() = {}
        }).start()
        println("first thread finished")
    
        t.a = null
        t = new Test
        new Thread(new Runnable {
          override def run() = {}
        }).start()
      }
    
    }
    
    class Test {
      var a = new InheritableThreadLocal[String] {
        override protected def childValue(parent: String): String = {
          println("parent value is: " + parent)
          parent
        }
      }
      a.set("hello")
    }
    ```
    The result is:
    ```
    parent value is: hello
    first thread finished
    parent value is: hello
    parent value is: hello
    ```
    
    Once an `InheritableThreadLocal` has been set value, child threads will inherit its value as long as it has not been GCed, so setting the variable which holds the `InheritableThreadLocal` to `null` doesn't work as we expected.
    
    In `SparkContext`, we have an `InheritableThreadLocal` for local properties, we should clear it when stopping `SparkContext`, or all the future child threads will still inherit it and copy the properties and waste memory.
    
    This is the root cause of https://issues.apache.org/jira/browse/SPARK-20548 , which creates/stops `SparkContext` many times and finally have a lot of `InheritableThreadLocal` alive, and cause OOM when starting new threads in the internal thread pools.
    
    ## How was this patch tested?
    
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark core

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17833.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17833
    
----
commit 822a32b3ddcd3ef591800fdfe628f1462c7eec31
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-05-02T13:40:51Z

    clear InheritableThreadLocal variables in SparkContext when stopping it

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    > I think it only cleans localProperties in the current thread. localProperties overrides childValue and always clones a new Properties for child threads.
    
    Yea, that's true. If some child threads are already there and cloned the local properties, we can't clean them. But we can avoid future child threads to inherit this local properties, which can reduce the memory footprint a lot if users create new `SparkContext` and stop it, and repeat this many times.
    
    Anyway, I'll merge this PR and see if it can fix the flaky test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    In a second thought, for https://issues.apache.org/jira/browse/SPARK-20548 , we can just combine some tests into one test to reduce the SparkContexts and REPLs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    I think it only cleans `localProperties` in the current thread. `localProperties` overrides `childValue` and always clones a new Properties for child threads.
    
    In addition, I think it doesn't fix the flaky REPL tests. Last time I checked the head dump, I observed most of memory usage comes from REPL and it's class loader referred by `SparkContext`s. And the GC roots of `SparkContext`s are JVM internal threads in this thread pool: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/lang/UNIXProcess.java#220 It's a cache thread pool. I could not figure out a fix except adding `Thread.sleep` to wait for these threads being killed automatically. That's why we can only observe OOM in REPL tests.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    merging to master/2.2/2.1/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    **[Test build #76384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76384/testReport)** for PR 17833 at commit [`310ad75`](https://github.com/apache/spark/commit/310ad753c594e272266689d760049cdc3b53e04c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    By the way, this PR is good to me since it does reduce a little memory footprint. But we still cannot close https://issues.apache.org/jira/browse/SPARK-20548 though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    Are we still investigating the root cause? If not, perhaps try re-enabling the test in 2.2 to see if it's failing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17833: [SPARK-20558][CORE] clear InheritableThreadLocal ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17833


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    **[Test build #76384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76384/testReport)** for PR 17833 at commit [`310ad75`](https://github.com/apache/spark/commit/310ad753c594e272266689d760049cdc3b53e04c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76384/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17833: [SPARK-20558][CORE] clear InheritableThreadLocal variabl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17833
  
    cc @JoshRosen @rxin @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org