You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mattf <gi...@git.apache.org> on 2014/08/28 16:53:40 UTC

[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

GitHub user mattf opened a pull request:

    https://github.com/apache/spark/pull/2183

    [SPARK-2435] Add shutdown hook to pyspark

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mattf/spark SPARK-2435

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2183
    
----
commit ee0ee9994846862aa5b69d67c860c2f0b1c231b4
Author: Matthew Farrellee <ma...@redhat.com>
Date:   2014-08-28T14:25:12Z

    [SPARK-2435] Add shutdown hook to pyspark

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by mattf <gi...@git.apache.org>.
Github user mattf commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-54062440
  
    > Is it better to put atexit.register() in context.py? So all the pyspark jobs can have this.
    
    i think it's a question of who owns the context. the owner is whomever constructed it. they should be responsible for stopping it, and given the open not to stop it if they so desire.
    
    in the pyspark shell, it's the shell.py that creates it and should therefore stop it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-53758559
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19407/consoleFull) for   PR 2183 at commit [`ee0ee99`](https://github.com/apache/spark/commit/ee0ee9994846862aa5b69d67c860c2f0b1c231b4).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-53837582
  
    Is it better to put atexit.register() in context.py? So all the pyspark jobs can have this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-53746821
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19407/consoleFull) for   PR 2183 at commit [`ee0ee99`](https://github.com/apache/spark/commit/ee0ee9994846862aa5b69d67c860c2f0b1c231b4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-53902169
  
    It looks like Spark core automatically registers JVM shutdown hooks for several different components, so it might okay to have the similar global logic here.
    
    One (perhaps minor) concern with the global hook is that the functions we register will contain strong references to the SparkContext, which might lead to resource leaks in a long-running Python process that creates and destroys many contexts.
    
    PySpark does not currently support running multiple SparkContexts at the same time, so one option would be to define a single shutdown hook that stops `SparkContext._active_spark_context`.  There's currently a lock (`SparkContext._lock`) guarding that field and I'm not sure whether it's safe to attempt to acquire it during a shutdown hook (it's fine for shutdown hooks to throw exceptions, but they shouldn't block).  To guard against this, maybe we can attempt to acquire the lock and just throw an exception after a short timeout.  This is a super-rare edge case, though, and I'd be shocked if anyone ran into it, since it requires a separate thread attempting to start or stop a SparkContext while the Python interpreter is exiting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2183


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-53809863
  
    I think the idea is to perform a more graceful shutdown of SparkContext.  Do we have similar shutdown hooks in Java / Scala?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-54396749
  
    This seems like a good fix, if only for consistency with bin/spark-shell, so I'm going to merge this into master.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-53820703
  
    In the Scala shell if you do `Ctrl+D` it stops the SparkContext. In Python I think there is no way to stop the SparkContext unless you call `sc.stop()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-53791070
  
    What's the problem without this patch? I remember that the JVM will shutdown itself after shell exited.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-53752758
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2435] Add shutdown hook to pyspark

Posted by mattf <gi...@git.apache.org>.
Github user mattf commented on the pull request:

    https://github.com/apache/spark/pull/2183#issuecomment-54061965
  
    > What's the problem without this patch? I remember that the JVM will shutdown itself after shell exited.
    
    davies, i went back and tried to reproduce the shell issues (while scripting the shell) and couldn't reproduce w/ or w/o this patch. so i don't have a bug to justify the patch. apologies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org