You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mattf <gi...@git.apache.org> on 2014/08/28 16:53:40 UTC
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
GitHub user mattf opened a pull request:
https://github.com/apache/spark/pull/2183
[SPARK-2435] Add shutdown hook to pyspark
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mattf/spark SPARK-2435
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2183.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2183
----
commit ee0ee9994846862aa5b69d67c860c2f0b1c231b4
Author: Matthew Farrellee <ma...@redhat.com>
Date: 2014-08-28T14:25:12Z
[SPARK-2435] Add shutdown hook to pyspark
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by mattf <gi...@git.apache.org>.
Github user mattf commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-54062440
> Is it better to put atexit.register() in context.py? So all the pyspark jobs can have this.
i think it's a question of who owns the context. the owner is whomever constructed it. they should be responsible for stopping it, and given the open not to stop it if they so desire.
in the pyspark shell, it's the shell.py that creates it and should therefore stop it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-53758559
[QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19407/consoleFull) for PR 2183 at commit [`ee0ee99`](https://github.com/apache/spark/commit/ee0ee9994846862aa5b69d67c860c2f0b1c231b4).
* This patch **passes** unit tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-53837582
Is it better to put atexit.register() in context.py? So all the pyspark jobs can have this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-53746821
[QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19407/consoleFull) for PR 2183 at commit [`ee0ee99`](https://github.com/apache/spark/commit/ee0ee9994846862aa5b69d67c860c2f0b1c231b4).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-53902169
It looks like Spark core automatically registers JVM shutdown hooks for several different components, so it might okay to have the similar global logic here.
One (perhaps minor) concern with the global hook is that the functions we register will contain strong references to the SparkContext, which might lead to resource leaks in a long-running Python process that creates and destroys many contexts.
PySpark does not currently support running multiple SparkContexts at the same time, so one option would be to define a single shutdown hook that stops `SparkContext._active_spark_context`. There's currently a lock (`SparkContext._lock`) guarding that field and I'm not sure whether it's safe to attempt to acquire it during a shutdown hook (it's fine for shutdown hooks to throw exceptions, but they shouldn't block). To guard against this, maybe we can attempt to acquire the lock and just throw an exception after a short timeout. This is a super-rare edge case, though, and I'd be shocked if anyone ran into it, since it requires a separate thread attempting to start or stop a SparkContext while the Python interpreter is exiting.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/2183
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-53809863
I think the idea is to perform a more graceful shutdown of SparkContext. Do we have similar shutdown hooks in Java / Scala?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-54396749
This seems like a good fix, if only for consistency with bin/spark-shell, so I'm going to merge this into master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-53820703
In the Scala shell if you do `Ctrl+D` it stops the SparkContext. In Python I think there is no way to stop the SparkContext unless you call `sc.stop()`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-53791070
What's the problem without this patch? I remember that the JVM will shutdown itself after shell exited.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-53752758
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark
Posted by mattf <gi...@git.apache.org>.
Github user mattf commented on the pull request:
https://github.com/apache/spark/pull/2183#issuecomment-54061965
> What's the problem without this patch? I remember that the JVM will shutdown itself after shell exited.
davies, i went back and tried to reproduce the shell issues (while scripting the shell) and couldn't reproduce w/ or w/o this patch. so i don't have a bug to justify the patch. apologies.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org