You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chtyim <gi...@git.apache.org> on 2016/04/12 04:26:52 UTC

[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

GitHub user chtyim opened a pull request:

    https://github.com/apache/spark/pull/12318

    [SPARK-14513][CORE] Fix threads left behind after stopping SparkContext

    ## What changes were proposed in this pull request?
    
    Shutting down `QueuedThreadPool` used by Jetty `Server` to avoid threads leakage after SparkContext is stopped.
    
    Note: If this fix is going to apply to the `branch-1.6`, one more patch on the `NettyRpcEnv` class is needed so that the `NettyRpcEnv._fileServer.shutdown` is called in the `NettyRpcEnv.cleanup` method. This is due to the removal of `_fileServer` field in the `NettyRpcEnv` class in the master branch. Please advice if a second PR is necessary for bring this fix back to `branch-1.6`
    
    ## How was this patch tested?
    
    Ran the ./dev/run-tests locally


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chtyim/spark fixes/SPARK-14513-thread-leak

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12318.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12318
    
----
commit cf7b296e9b56951a6c114007101e38ca79a49d13
Author: Terence Yim <te...@cask.co>
Date:   2016-04-11T22:10:51Z

    [SPARK-14513][CORE] Fix threads left behind after stopping SparkContext

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12318


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12318#issuecomment-209084165
  
    **[Test build #2779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2779/consoleFull)** for PR 12318 at commit [`0232a7a`](https://github.com/apache/spark/commit/0232a7a353797a1f0b3d2c4d2441ce44689d687b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by chtyim <gi...@git.apache.org>.
Github user chtyim commented on the pull request:

    https://github.com/apache/spark/pull/12318#issuecomment-208718393
  
    Addressed comment. Please have a look again. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59335396
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +156,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      Option(server.getThreadPool).collect { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    Interesting opportunity to talk about style -- yes I would have written it like @rxin suggests. It reads naturally: "if the pool is not null and is a LifeCycle, then call stop on it". The `Option` way splits the difference, and is OK by me, but is a little more obscure "an optional pool, collect them to themselves as LifeCycles, and then on all of them call stop". I also tend to think Scala syntax is overused.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12318#issuecomment-209095251
  
    Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59318015
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +158,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      condOpt(server.getThreadPool) { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    yea let's not use condOpt. It's too cryptic.
    
    Do something more obvious, e.g.
    
    ```scala
    if (server.getThreadPool != null) {
      server.getThreadPool match {
        case l: LifeCycle => l.stop()
        case _ =>  // Do nothing
      }
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by chtyim <gi...@git.apache.org>.
Github user chtyim commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59331847
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +156,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      Option(server.getThreadPool).collect { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    I agree that when there are multiple chaining with not so obvious function/partial function applications, scala can become unreadable. However, for the case we are having here, I do think that using Option collect foreach is quite straightforward and easily readable (and compare to someone chaining multiple RDD operations, this is by far very readable).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by chtyim <gi...@git.apache.org>.
Github user chtyim commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59320494
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +158,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      condOpt(server.getThreadPool) { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    That's a bit non idiomatic in Scala. I'll go with the way as @srowen suggested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59324981
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +158,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      condOpt(server.getThreadPool) { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    I don't think we are going for idiomatic Scala in Spark. Actually in most cases we go for readability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59325045
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +156,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      Option(server.getThreadPool).collect { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    As I said, I don't think we are going for idiomatic Scala in Spark. Actually in most cases we go for readability.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12318#issuecomment-208671355
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59314056
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +158,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      condOpt(server.getThreadPool) { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    I had to look up `condOpt` here .. is this just trying to operate on it only if non-null and a `LifeCycle`? I personally think it's more idiomatic to:
    
    `Option(server.getThreadPool).collect { case l : LifeCycle => l }.foreach(_.stop())`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59325455
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +156,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      Option(server.getThreadPool).collect { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    btw can you also explain as inline comment when is the threadpool LifeCycle?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59328586
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +156,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      Option(server.getThreadPool).collect { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    btw in this case the option, colllect, foreach isn't that bad. but i often find people learn from these examples and go overboard with chaining. i constantly need to remind contributors to simplify their use of scala features. most recently
    
    https://github.com/apache/spark/pull/12256/files#r58988046



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by chtyim <gi...@git.apache.org>.
Github user chtyim commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59317619
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +158,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      condOpt(server.getThreadPool) { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    Yes it is effectively doing the same thing, but using the `condOpt` from `PartitialFunction` instead. I can change it to your version if it is more preferable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by chtyim <gi...@git.apache.org>.
Github user chtyim commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59338307
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +156,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      Option(server.getThreadPool).collect { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    I think I'll use the if `isInstanceOf` as it's the most obvious on the intention and doesn't need an empty catch all case (which is a bit overkill in here since there is only one condition to match).
    
    Just for the sake of discussion of style, I think it depends on whether you interpret in the imperative way "if some condition then do something" vs the functional way "create option -> filter by condition -> apply operation", which can be a forever debate :) I myself lean against the functional way for doing simple task (like there one liner in here), as it is more concise; but the imperative way when need to implement more complicated logical flow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59318036
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
    @@ -350,4 +352,10 @@ private[spark] object JettyUtils extends Logging {
     private[spark] case class ServerInfo(
         server: Server,
         boundPort: Int,
    -    rootHandler: ContextHandlerCollection)
    +    rootHandler: ContextHandlerCollection) {
    +
    +  def stop(): Unit = {
    +    server.stop()
    +    condOpt(server.getThreadPool) { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by chtyim <gi...@git.apache.org>.
Github user chtyim commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59326211
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +156,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      Option(server.getThreadPool).collect { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    Just checking, should I use `isInstanceOf` instead of `case`, which will both avoid creating a partial function as well as it's most readable to both Scala and Java dev? Also it's fewer lines of code and no need to write the quite ugly `case _ => // Do nothing`?
    
    ```scala
        val threadPool = server.getThreadPool
        if (threadPool != null && threadPool.isInstanceOf[LifeCycle]) {
          threadPool.asInstanceOf[LifeCycle].stop
        }
    ```
    
    Besides, if we have to check `null` like you suggested, why not use `Option`, which is one of the most common construct Scala?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12318#issuecomment-209029211
  
    **[Test build #2779 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2779/consoleFull)** for PR 12318 at commit [`0232a7a`](https://github.com/apache/spark/commit/0232a7a353797a1f0b3d2c4d2441ce44689d687b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14513][CORE] Fix threads left behind af...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12318#discussion_r59328490
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala ---
    @@ -155,6 +156,7 @@ private[spark] class HttpServer(
           throw new ServerStateException("Server is already stopped")
         } else {
           server.stop()
    +      Option(server.getThreadPool).collect { case x: LifeCycle => x }.foreach(_.stop())
    --- End diff --
    
    yea both works for me. i just wanted to avoid the chained Option, collect, and foreach.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org