You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2014/10/02 09:13:02 UTC

[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/2624

    [SPARK-3762] clear reference of SparkEnv after stop

    SparkEnv is cached in ThreadLocal object, so after stop and create a new SparkContext, old SparkEnv is still used by some threads, it will trigger many problems, for example, pyspark will have problem after restart SparkContext, because py4j use thread pool for RPC.
    
    This patch will clear all the references after stop a SparkEnv.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark env

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2624
    
----
commit 4d0ea8bf5df513d5d1f4250286ca328192018f08
Author: Davies Liu <da...@gmail.com>
Date:   2014-10-02T07:10:38Z

    clear reference of SparkEnv after stop

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-58149042
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21368/Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2624#discussion_r18498596
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -119,30 +118,20 @@ class SparkEnv (
     }
     
     object SparkEnv extends Logging {
    -  private val env = new ThreadLocal[SparkEnv]
    -  @volatile private var lastSetSparkEnv : SparkEnv = _
    +  @volatile private var env: SparkEnv = _
     
       private[spark] val driverActorSystemName = "sparkDriver"
       private[spark] val executorActorSystemName = "sparkExecutor"
     
       def set(e: SparkEnv) {
    -    lastSetSparkEnv = e
    -    env.set(e)
    +    env = e
       }
     
       /**
    -   * Returns the ThreadLocal SparkEnv, if non-null. Else returns the SparkEnv
    -   * previously set in any thread.
    +   * Returns the SparkEnv.
        */
       def get: SparkEnv = {
    -    Option(env.get()).getOrElse(lastSetSparkEnv)
    -  }
    -
    -  /**
    -   * Returns the ThreadLocal SparkEnv.
    -   */
    -  def getThreadLocal: SparkEnv = {
    --- End diff --
    
    Good point.  Let's throw `@deprecated` on it when we put it back, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2624#discussion_r18498152
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -119,30 +118,20 @@ class SparkEnv (
     }
     
     object SparkEnv extends Logging {
    -  private val env = new ThreadLocal[SparkEnv]
    -  @volatile private var lastSetSparkEnv : SparkEnv = _
    +  @volatile private var env: SparkEnv = _
     
       private[spark] val driverActorSystemName = "sparkDriver"
       private[spark] val executorActorSystemName = "sparkExecutor"
     
       def set(e: SparkEnv) {
    -    lastSetSparkEnv = e
    -    env.set(e)
    +    env = e
       }
     
       /**
    -   * Returns the ThreadLocal SparkEnv, if non-null. Else returns the SparkEnv
    -   * previously set in any thread.
    +   * Returns the SparkEnv.
        */
       def get: SparkEnv = {
    -    Option(env.get()).getOrElse(lastSetSparkEnv)
    -  }
    -
    -  /**
    -   * Returns the ThreadLocal SparkEnv.
    -   */
    -  def getThreadLocal: SparkEnv = {
    --- End diff --
    
    You should leave this in instead of removing it, because some user code might be calling this public method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57729754
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21232/consoleFull) for   PR 2624 at commit [`ee62bb7`](https://github.com/apache/spark/commit/ee62bb7afdd728a9195b1549d3a9f4db5709e9ff).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57730092
  
    We should probably update the docstrings to remove all references to ThreadLocal, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2624#discussion_r18371260
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -130,6 +133,12 @@ object SparkEnv extends Logging {
         env.set(e)
       }
     
    +  // clear all the threadlocal references
    --- End diff --
    
    What about threads that have called `SparkEnv.set(SparkEnv.get)`?  I don't think those ThreadLocals will get cleared.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57758615
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21237/consoleFull) for   PR 2624 at commit [`ba77ca4`](https://github.com/apache/spark/commit/ba77ca4f84f8fefff88714f0c0fa196d0a6f844b).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  protected case class Keyword(str: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57721877
  
    @davies The SparkEnv class is marked `@developerAPI` and has this note in its Scaladoc:
    
    ```
     * NOTE: This is not intended for external use. This is exposed for Shark and may be made private
     *       in a future release.
    ```
    
    Nearly every object that's accessible through SparkEnv's public fields is `private[spark]`.  I'd be _very_ surprised if there was third-party code that was relying on SparkEnv remaining backwards-compatible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2624


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57733279
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21232/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57718816
  
    It looks like SparkEnv has two "getter" methods: `get`, which returns either the ThreadLocal value or the last SparkEnv set _by any thread_, and `getThreadLocal`, which just reads the current ThreadLocal.
    
    It turns out that there's no code which calls `getThreadLocal`.  We _do_ call `SparkEnv.set()` in a few places, but we seem to always set it using either a new SparkEnv (e.g. when starting a SparkContext or Executor) or using the value obtained from `SparkEnv.get()`.  Therefore, it seems like SparkEnv effectively acts as a global object.  Why not just remove the ThreadLocals and have `get` just return some `@volatile` field in SparkEnv that holds the current instance (a sort of Singleton pattern)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57733269
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21232/consoleFull) for   PR 2624 at commit [`ee62bb7`](https://github.com/apache/spark/commit/ee62bb7afdd728a9195b1549d3a9f4db5709e9ff).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class GetPeers(blockManagerId: BlockManagerId) extends ToBlockManagerMaster`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-58241504
  
    Yup, looks good! I'm going to merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-58143704
  
    @mateiz @JoshRosen I had put getThreadLocal() back and deprecated it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2624#discussion_r18498170
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -119,30 +118,20 @@ class SparkEnv (
     }
     
     object SparkEnv extends Logging {
    -  private val env = new ThreadLocal[SparkEnv]
    -  @volatile private var lastSetSparkEnv : SparkEnv = _
    +  @volatile private var env: SparkEnv = _
     
       private[spark] val driverActorSystemName = "sparkDriver"
       private[spark] val executorActorSystemName = "sparkExecutor"
     
       def set(e: SparkEnv) {
    -    lastSetSparkEnv = e
    -    env.set(e)
    +    env = e
       }
     
       /**
    -   * Returns the ThreadLocal SparkEnv, if non-null. Else returns the SparkEnv
    -   * previously set in any thread.
    +   * Returns the SparkEnv.
        */
       def get: SparkEnv = {
    -    Option(env.get()).getOrElse(lastSetSparkEnv)
    -  }
    -
    -  /**
    -   * Returns the ThreadLocal SparkEnv.
    -   */
    -  def getThreadLocal: SparkEnv = {
    --- End diff --
    
    (Even though it's a DeveloperApi we shouldn't break it if we can help it)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57850587
  
    > Either we can do something minimal to just clear the reference, so that repeated sparkContext creation works from pySpark.
    
    I'm not sure that there's an easy, minimal approach that's also correct, though.  The problem is that some threads obtain a SparkEnv by calling `SparkEnv.get`, so these threads are prone to reading old ThreadLocals that haven't been cleaned up.  In order for an approach that clears ThreadLocals to be safe, I think we'd need some way to ensure that _any_ thread that sets the ThreadLocal eventually clears that ThreadLocal before it's re-used.  I suppose that we could audit all calls of `SparkEnv.set()` and add the equivalent of a `try ... finally` to ensure that the ThreadLocal is eventually cleared.  This is starting to get complex, though, and I'm not sure that it's simpler than simply removing the ThreadLocals for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-58143439
  
    @mateiz Aside from restoring the `getThreadLocal` method in order to preserve API compatibility, is this patch otherwise ready to merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57598250
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21184/consoleFull) for   PR 2624 at commit [`4d0ea8b`](https://github.com/apache/spark/commit/4d0ea8bf5df513d5d1f4250286ca328192018f08).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57892709
  
    @JoshRosen the simple fix is to delete the threadlocal variable completely. Then any access to the threadlocal variable from any thread (even threadpool in Py4J) is going to be reset. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57592595
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21184/consoleFull) for   PR 2624 at commit [`4d0ea8b`](https://github.com/apache/spark/commit/4d0ea8bf5df513d5d1f4250286ca328192018f08).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2624#discussion_r18498163
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -119,30 +118,20 @@ class SparkEnv (
     }
     
     object SparkEnv extends Logging {
    -  private val env = new ThreadLocal[SparkEnv]
    -  @volatile private var lastSetSparkEnv : SparkEnv = _
    +  @volatile private var env: SparkEnv = _
     
       private[spark] val driverActorSystemName = "sparkDriver"
       private[spark] val executorActorSystemName = "sparkExecutor"
     
       def set(e: SparkEnv) {
    -    lastSetSparkEnv = e
    -    env.set(e)
    +    env = e
       }
     
       /**
    -   * Returns the ThreadLocal SparkEnv, if non-null. Else returns the SparkEnv
    -   * previously set in any thread.
    +   * Returns the SparkEnv.
        */
       def get: SparkEnv = {
    -    Option(env.get()).getOrElse(lastSetSparkEnv)
    -  }
    -
    -  /**
    -   * Returns the ThreadLocal SparkEnv.
    -   */
    -  def getThreadLocal: SparkEnv = {
    --- End diff --
    
    Actually I'm surprised MIMA didn't catch this..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-58149038
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21368/consoleFull) for   PR 2624 at commit [`a69f30c`](https://github.com/apache/spark/commit/a69f30cdb8e63d526ebee06162d8f1b9f2adb253).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57840767
  
    Okay, this is a pretty significant change to remove the threadlocal object completely. There are two things we can do
    - Either we can do something minimal to just clear the reference, so that repeated sparkContext creation works from pySpark.
    - Or we do this scary refactoring and eliminate the threadlocal completely. I know that there are probably no usecases where the threadlocal object on any thread would be different from that of the global lastSparkEnv, but are we entirely sure?
    
    I think we need more input from others @pwendell @mateiz @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57721319
  
    @JoshRosen I'd like to move in this way, but getThreadLocal() is a public API, I'm afraid to remove it.
    
    In the past, I remembered that we can have two different SparkEnv in the same JVM, a different SparkEnv for executor to hold different TrackerClients. Maybe it's not needed anymore.
    
    Also we can keep the getThreadLocal() and deprecate it, remove the ThreadLocal object, how is it? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57598254
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21184/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57739142
  
    Hmm, strange test failure:
    
    ```
    [info] - block generator throttling *** FAILED ***
    [info]   org.scalatest.exceptions.TestFailedException was thrown. (NetworkReceiverSuite.scala:180)
    ```
    
    This is a known flaky test, though, so let's try running the tests again.
    
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57758618
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21237/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-58143977
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21368/consoleFull) for   PR 2624 at commit [`a69f30c`](https://github.com/apache/spark/commit/a69f30cdb8e63d526ebee06162d8f1b9f2adb253).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57853572
  
    It's worth noting that the ThreadLocals haven't seemed to cause problems in any of the existing uses of Spark / PySpark.  In PySpark Streaming, I think we're running into a scenario that's something like this:
    
    - Java invokes a Python callback through the Py4J callback server.  Internally, the callback server uses some thread pool.
    - The Python callback calls back into Java through Py4J.
    - Somewhere along the line, `SparkEnv.set()` is called, leaking the current SparkEnv into one of the Py4J GatewayServer or CallbackServer pool threads.
    - This thread is re-used when a new Python SparkContext is created using the same GatewayServer.
    
    I thought of another fix that will allow the ThreadLocals to work: add a mutable field to SparkEnv instances that records whether that environment is associated with a SparkContext that's been stopped.  In SparkEnv.get(), we can check this field to determine whether to return the ThreadLocal or return lastSparkEnv.  This approach is more confusing / complex than removing the ThreadLocals, though.
    
    I'm still strongly in favor of doing the work to confirm that SparkEnv is currently used as though it's a global object and then removing the ThreadLocals.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57751658
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21237/consoleFull) for   PR 2624 at commit [`ba77ca4`](https://github.com/apache/spark/commit/ba77ca4f84f8fefff88714f0c0fa196d0a6f844b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3762] clear reference of SparkEnv after...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2624#issuecomment-57722450
  
    @davies:
    
    > In the past, I remembered that we can have two different SparkEnv in the same JVM, a different SparkEnv for executor to hold different TrackerClients. Maybe it's not needed anymore.
    
    `SparkEnv.create` is only called in two places, SparkContext and Executor's constructors.
    
    In Executor, we only create a new SparkContext if we're not running in local mode:
    
    ```scala
      // Initialize Spark environment (using system properties read above)
      private val env = {
        if (!isLocal) {
          val _env = SparkEnv.create(conf, executorId, slaveHostname, 0,
            isDriver = false, isLocal = false)
          SparkEnv.set(_env)
          _env.metricsSystem.registerSource(executorSource)
          _env
        } else {
          SparkEnv.get
        }
      }
    ```
    
    Since we only have one `Executor` instance per JVM, we only have one SparkEnv per JVM (since we don't currently support concurrently-running SparkContexts in the same JVM).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org