You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2015/11/11 19:26:59 UTC

[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/9633

    [SPARK-11655] [core] Fix deadlock in handling of launcher stop().

    The stop() callback was trying to close the launcher connection in the
    same thread that handles connection data, which ended up causing a
    deadlock. So avoid that by dispatching the stop() request in its own
    thread.
    
    On top of that, add some exception safety to a few parts of the code,
    and use "destroyForcibly" from Java 8 if it's available, to force
    kill the child process. The flip side is that "kill()" may not actually
    work if running Java 7.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-11655

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9633
    
----
commit 9fcd201a24a1d5631ddfe971c6761cc511dd1a54
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-11-11T18:14:03Z

    [SPARK-11655] [core] Fix deadlock in handling of launcher stop().
    
    The stop() callback was trying to close the launcher connection in the
    same thread that handles connection data, which ended up causing a
    deadlock. So avoid that by dispatching the stop() request in its own
    thread.
    
    On top of that, add some exception safety to a few parts of the code,
    and use "destroyForcibly" from Java 8 if it's available, to force
    kill the child process. The flip side is that "kill()" may not actually
    work if running Java 7.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-156306783
  
    Sorry for the late / flaky review replies; I've been home sick with strep throat and spent most of the day asleep. This seems fine to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-156523502
  
    `LauncherBackend.close()` waits for the communication thread to finish execution, so it can't be called from that thread or it will deadlock. (It's a little weird that you're even allowed to do that, but go figure.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-155868353
  
    @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-156230999
  
    @JoshRosen do you have any extra feedback here? I'll push the change otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-155966594
  
    I can confirm that this seems to fix the problem when running locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-155907343
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-155869442
  
    **[Test build #45656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45656/consoleFull)** for PR 9633 at commit [`9fcd201`](https://github.com/apache/spark/commit/9fcd201a24a1d5631ddfe971c6761cc511dd1a54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-155868405
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-155868363
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-155907346
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45656/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-155907183
  
    **[Test build #45656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45656/consoleFull)** for PR 9633 at commit [`9fcd201`](https://github.com/apache/spark/commit/9fcd201a24a1d5631ddfe971c6761cc511dd1a54).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-155973129
  
    Based on http://bugs.java.com/view_bug.do?bug_id=4073195, it sounds like many *nix implementations of `Process.destroy()` work by sending `SIGTERM` to the child process. I suppose that anything that caused SIGTERM to be swallowed / ignored by one of the child processes could keep this from working on Java 7. PySpark used to be vulnerable to similar problems, so it includes a test case which specifically checks the `SIGTERM`-handling behavior: https://github.com/apache/spark/blob/b8ff6888e76b437287d7d6bf2d4b9c759710a195/python/pyspark/tests.py#L1580
    
    I commented out the `handle.stop()` call and verified that the child process stops almost immediately under Java 7, so it appears that this has fixed the issue. I suppose that we could try adding regression tests, but I'd also be fine doing that as a followup; I'd like to try to get this fix in sooner rather than later given the impact that it will have on Jenkins performance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-156257491
  
    Merging to master / 1.6, we can do post-review later if needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9633


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9633#discussion_r44614090
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
    @@ -102,8 +103,20 @@ public synchronized void kill() {
           disconnect();
    --- End diff --
    
    I was initially worried that this needs to be in a `try` block but it doesn't look like `disconnect()` is capable of throwing any exceptions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-156309940
  
    Maybe I'm overlooking something really obvious, but I think it's pretty hard to spot the circular wait condition which led to the deadlock. For posterity, could you post a brief description of the participants in that cycle?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9633#issuecomment-155977153
  
    Note that the fix is *NOT* about whether `destroy` or `destroyForcibly` is used. The fix was a real deadlock in the code; that was made worse by the `destroy` call not actually killing the child process, which caused the process leak.
    
    With the deadlock out of the way, calling `destroy` shouldn't really be needed since the child process will exit properly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org