You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by aarondav <gi...@git.apache.org> on 2014/05/16 02:40:20 UTC

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

GitHub user aarondav opened a pull request:

    https://github.com/apache/spark/pull/800

    SPARK-1860: Do not cleanup application work/ directories by default

    This causes an unrecoverable error for applications that are running for longer
    than 7 days that have jars added to the SparkContext, as the jars are cleaned up
    even though the application is still running.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aarondav/spark shitty-defaults

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/800.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #800
    
----
commit a573fbbbc51f3fb061536fd06f29d38977935459
Author: Aaron Davidson <aa...@databricks.com>
Date:   2014-05-16T00:35:26Z

    SPARK-1860: Do not cleanup application work/ directories by default
    
    This causes an unrecoverable error for applications that are running for longer
    than 7 days that have jars added to the SparkContext, as the jars are cleaned up
    even though the application is still running.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43284777
  
    Sure - might be good to have it off by default. /cc @velvia.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43354675
  
    Just a little miffed because it took some time to figure out why our executors suddenly started failing with jar-not-found errors :)
    
    I'd prefer a full solution; last modified time runs into an issue if the executor lies dormant for a week. You might say, "that's unlikely", but I'd say, "it'll happen to someone, and they'll be a little miffed." The worker should have enough state to figure out which executors are currently active, though I'm not sure if the problem is made more difficult by multi-worker scenarios.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43282996
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/800


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43285803
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15035/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43286675
  
    This patch is intended as a hotfix in the hopes that it can make it into the 1.0 release. Avoiding cleaning up running applications seems like the better solution in general, but is out of scope of this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43283009
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43302348
  
    @aarondav @pwendell I agree we don't want to clean up currently running apps, but also that this should be default to on when its fixed.   Maybe its as simple as checking last modified time of the directory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43296302
  
    btw - regarding the branch name - I don't think this was _too_ shitty of a default. I'd actually like to have this on by default if we can get it into working order, because otherwise users will only find out once it's too late that they are out of disk space :P


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43289901
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43285802
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43285316
  
    @aarondav what about just not cleaning up the data if the app is still running? In the future we should probably assess the TTL based on the finish time of the app, not the start time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1860: Do not cleanup application work/ d...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/800#issuecomment-43287410
  
    I agree, okay let's just pull in this fix and we can hopefully patch the bigger issue later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---