You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2015/11/11 19:26:59 UTC
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/9633
[SPARK-11655] [core] Fix deadlock in handling of launcher stop().
The stop() callback was trying to close the launcher connection in the
same thread that handles connection data, which ended up causing a
deadlock. So avoid that by dispatching the stop() request in its own
thread.
On top of that, add some exception safety to a few parts of the code,
and use "destroyForcibly" from Java 8 if it's available, to force
kill the child process. The flip side is that "kill()" may not actually
work if running Java 7.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-11655
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9633.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9633
----
commit 9fcd201a24a1d5631ddfe971c6761cc511dd1a54
Author: Marcelo Vanzin <va...@cloudera.com>
Date: 2015-11-11T18:14:03Z
[SPARK-11655] [core] Fix deadlock in handling of launcher stop().
The stop() callback was trying to close the launcher connection in the
same thread that handles connection data, which ended up causing a
deadlock. So avoid that by dispatching the stop() request in its own
thread.
On top of that, add some exception safety to a few parts of the code,
and use "destroyForcibly" from Java 8 if it's available, to force
kill the child process. The flip side is that "kill()" may not actually
work if running Java 7.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-156306783
Sorry for the late / flaky review replies; I've been home sick with strep throat and spent most of the day asleep. This seems fine to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-156523502
`LauncherBackend.close()` waits for the communication thread to finish execution, so it can't be called from that thread or it will deadlock. (It's a little weird that you're even allowed to do that, but go figure.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-155868353
@JoshRosen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-156230999
@JoshRosen do you have any extra feedback here? I'll push the change otherwise.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-155966594
I can confirm that this seems to fix the problem when running locally.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-155907343
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-155869442
**[Test build #45656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45656/consoleFull)** for PR 9633 at commit [`9fcd201`](https://github.com/apache/spark/commit/9fcd201a24a1d5631ddfe971c6761cc511dd1a54).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-155868405
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-155868363
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-155907346
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45656/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-155907183
**[Test build #45656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45656/consoleFull)** for PR 9633 at commit [`9fcd201`](https://github.com/apache/spark/commit/9fcd201a24a1d5631ddfe971c6761cc511dd1a54).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-155973129
Based on http://bugs.java.com/view_bug.do?bug_id=4073195, it sounds like many *nix implementations of `Process.destroy()` work by sending `SIGTERM` to the child process. I suppose that anything that caused SIGTERM to be swallowed / ignored by one of the child processes could keep this from working on Java 7. PySpark used to be vulnerable to similar problems, so it includes a test case which specifically checks the `SIGTERM`-handling behavior: https://github.com/apache/spark/blob/b8ff6888e76b437287d7d6bf2d4b9c759710a195/python/pyspark/tests.py#L1580
I commented out the `handle.stop()` call and verified that the child process stops almost immediately under Java 7, so it appears that this has fixed the issue. I suppose that we could try adding regression tests, but I'd also be fine doing that as a followup; I'd like to try to get this fix in sooner rather than later given the impact that it will have on Jenkins performance.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-156257491
Merging to master / 1.6, we can do post-review later if needed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/9633
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9633#discussion_r44614090
--- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
@@ -102,8 +103,20 @@ public synchronized void kill() {
disconnect();
--- End diff --
I was initially worried that this needs to be in a `try` block but it doesn't look like `disconnect()` is capable of throwing any exceptions.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-156309940
Maybe I'm overlooking something really obvious, but I think it's pretty hard to spot the circular wait condition which led to the deadlock. For posterity, could you post a brief description of the participants in that cycle?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11655] [core] Fix deadlock in handling ...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/9633#issuecomment-155977153
Note that the fix is *NOT* about whether `destroy` or `destroyForcibly` is used. The fix was a real deadlock in the code; that was made worse by the `destroy` call not actually killing the child process, which caused the process leak.
With the deadlock out of the way, calling `destroy` shouldn't really be needed since the child process will exit properly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org