You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by velvia <gi...@git.apache.org> on 2014/04/01 22:37:07 UTC

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

GitHub user velvia opened a pull request:

    https://github.com/apache/spark/pull/288

    SPARK-1154: Clean up app folders in worker nodes

    This is a fix for [SPARK-1154](https://issues.apache.org/jira/browse/SPARK-1154).   The issue is that worker nodes fill up with a huge number of app-* folders after some time.  This change adds a periodic cleanup task which asynchronously deletes app directories older than a configurable TTL.
    
    Two new configuration parameters have been introduced:
      spark.worker.cleanup_interval
      spark.worker.app_data_ttl
    
    This change does not include moving the downloads of application jars to a location outside of the work directory.  We will address that if we have time, but that potentially involves caching so it will come either as part of this PR or a separate PR.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/velvia/spark SPARK-1154-cleanup-app-folders

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #288
    
----
commit 6a9eed14e28af301966566c7e2566cca7808cfd9
Author: Evan Chan <ev...@ooyala.com>
Date:   2014-04-01T18:04:33Z

    SPARK-1154: Add a periodic task to clean up app directories
    
    This adds two config params:
      spark.worker.cleanup_interval
      spark.worker.app_data_ttl

commit 574bc82743a5a8fcf3c0b3185c1c9166c54b8655
Author: Evan Chan <ev...@ooyala.com>
Date:   2014-04-01T18:31:29Z

    Document the two new settings

commit 31961673a4f3c3474e8bb1594153d4908e7f8d11
Author: Evan Chan <ev...@ooyala.com>
Date:   2014-04-01T20:30:54Z

    Don't recompute current time with every new file

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39362889
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39628136
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11275005
  
    --- Diff: docs/configuration.md ---
    @@ -586,6 +586,30 @@ Apart from these, the following properties are also available, and may be useful
         Number of cores to allocate for each task.
       </td>
     </tr>
    +<tr>
    +  <td>spark.worker.cleanup.enabled</td>
    +  <td>true</td>
    +  <td>
    +    Enable periodic cleanup of worker / application directories
    +  </td>
    +</tr>
    +<tr>
    +  <td>spark.worker.cleanup.interval</td>
    +  <td>1800 (30 minutes)</td>
    +  <td>
    +    Controls the interval, in seconds, at which the worker cleans up old application work dirs
    --- End diff --
    
    should mention in these docs that it is only for standalone mode, since this won't help YARN. Also perhaps move these properties up to spark.worker.timeout, which is the only other worker config in this list. (Really, we shouldn't have standalone-only config on this page, but there's no place right now for it in the spark-standalone page.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11222849
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -179,12 +184,26 @@ private[spark] class Worker(
           registered = true
           changeMaster(masterUrl, masterWebUiUrl)
           context.system.scheduler.schedule(0 millis, HEARTBEAT_MILLIS millis, self, SendHeartbeat)
    +      context.system.scheduler.schedule(CLEANUP_INTERVAL_MILLIS millis,
    +                                        CLEANUP_INTERVAL_MILLIS millis, self, Worker.AppDirCleanup)
    --- End diff --
    
    This is called the "WorkDir" within the Worker itself, perhaps we should rename all instances of "AppDir" to that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39265407
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13644/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39490098
  
    Really, there should be no need for that.  Just do a 'pull --rebase' in
    your branch and everything that is not already part of master will be moved
    to after master.  Then 'push -f' or 'push +refSpec' and your PR should be
    in good shape.
    
    
    On Thu, Apr 3, 2014 at 11:49 AM, Evan Chan <no...@github.com> wrote:
    
    > Mark, thanks. Yeah, I realize instead of doing a merge I should just do a
    > pull now, but it's probably too late for this PR. Instead I'll close it
    > and reopen and squash it.
    >
    >
    > On Thu, Apr 3, 2014 at 11:15 AM, Mark Hamstra <notifications@github.com
    > >wrote:
    >
    > > Rebasing is no big deal if the commits that you are rebasing are only in
    > a
    > > private repo. For example, even though
    > > https://github.com/markhamstra/spark is technically a public repo and
    > > people could have forked it and be depending on the history in my repo,
    > as
    > > a practical matter nobody is going to be using it that way, but instead
    > > will be relying upon the canonical upstream repo,
    > > https://github.com/apache/spark. That means that if I am working on a PR
    > > in a feature branch Foo, then the easiest and cleanest way to keep my Foo
    > > work mergeable with the master branch of Spark is to only 'pull --rebase'
    > > the upstream master, not merge pull, thereby avoiding merge commit
    > clutter
    > > and the interleaving of my work with other commits to master. Instead, my
    > > PR commits will appear last and be applied only after the current state
    > of
    > > master.
    > >
    > > That means that the SHAs of my commits will be changing in my repo (but
    > > not those of any commits already in master), but nobody should be relying
    > > upon those until after they are in master anyway. Each rebase on an open
    > PR
    > > will also cause Jenkins/Travis to re-run the tests, but that's no big
    > deal
    > > unless you are rebasing really frequently and unnecessarily.
    > >
    > > --
    > > Reply to this email directly or view it on GitHub<
    > https://github.com/apache/spark/pull/288#issuecomment-39485564>
    > > .
    > >
    >
    >
    >
    > --
    > The fruit of silence is prayer;
    > the fruit of prayer is faith;
    > the fruit of faith is love;
    > the fruit of love is service;
    > the fruit of service is peace. -- Mother Teresa
    >
    > --
    > Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/288#issuecomment-39489767>
    > .
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39362915
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39511005
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39691647
  
    Thanks Patrick!
    
    -Evan
    To be free is not merely to cast off one's chains, but to live in a way that respects & enhances the freedom of others. (#NelsonMandela)
    
    > On Apr 6, 2014, at 7:22 PM, Patrick Wendell <no...@github.com> wrote:
    > 
    > I just went ahead and fixed the conflict and merged this.
    > 
    > —
    > Reply to this email directly or view it on GitHub.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39511006
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13734/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39308221
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39402284
  
    Mind doing a rebase to simplify the diff? It'll be squashed anyhow when we merge it into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39691483
  
    I just went ahead and fixed the conflict and merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11274466
  
    --- Diff: docs/configuration.md ---
    @@ -586,6 +586,30 @@ Apart from these, the following properties are also available, and may be useful
         Number of cores to allocate for each task.
       </td>
     </tr>
    +<tr>
    --- End diff --
    
    nit: indent in line with other `<tr>`s


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Clone a new copy of your fork just to be safe; checkout your SPARK-1154
branch;  git pull --rebase git@github.com:apache/spark.git master; now your
and Kelvin's changes should appear last, after everything that is already
in master; push origin +SPARK-1154-cleanup-app-folders -- and you're golden.


On Thu, Apr 3, 2014 at 12:50 AM, velvia <gi...@git.apache.org> wrote:

> Github user velvia commented on the pull request:
>
>     https://github.com/apache/spark/pull/288#issuecomment-39421885
>
>     Sorry, but how do I do a rebase safely, now that I've pushed?    I've
> never
>     had good luck with rebases (other than rebase -i before I push....
>  sorry
>     should have done that.)
>
>
>     On Wed, Apr 2, 2014 at 6:00 PM, Aaron Davidson <
> notifications@github.com>wrote:
>
>     > Mind doing a rebase to simplify the diff? It'll be squashed anyhow
> when we
>     > merge it into master.
>     >
>     > --
>     > Reply to this email directly or view it on GitHub<
> https://github.com/apache/spark/pull/288#issuecomment-39402284>
>     > .
>     >
>
>
>
>     --
>     The fruit of silence is prayer;
>     the fruit of prayer is faith;
>     the fruit of faith is love;
>     the fruit of love is service;
>     the fruit of service is peace.  -- Mother Teresa
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39421885
  
    Sorry, but how do I do a rebase safely, now that I've pushed?    I've never
    had good luck with rebases (other than rebase -i before I push....  sorry
    should have done that.)
    
    
    On Wed, Apr 2, 2014 at 6:00 PM, Aaron Davidson <no...@github.com>wrote:
    
    > Mind doing a rebase to simplify the diff? It'll be squashed anyhow when we
    > merge it into master.
    >
    > --
    > Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/288#issuecomment-39402284>
    > .
    >
    
    
    
    -- 
    The fruit of silence is prayer;
    the fruit of prayer is faith;
    the fruit of faith is love;
    the fruit of love is service;
    the fruit of service is peace.  -- Mother Teresa


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11231197
  
    --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
    @@ -154,5 +154,18 @@ class UtilsSuite extends FunSuite {
         val iterator = Iterator.range(0, 5)
         assert(Utils.getIteratorSize(iterator) === 5L)
       }
    +
    +  test("findOldFiles") {
    +    // create some temporary directories and files
    +    val parent: File = Utils.createTempDir()
    +    val child1: File = Utils.createTempDir(parent.getCanonicalPath) // The parent directory has two child directories
    +    val child2: File = Utils.createTempDir(parent.getCanonicalPath)
    +    // set the last modified time of child1 to 10 secs old
    +    child1.setLastModified(System.currentTimeMillis() - (1000 * 10))
    +
    +    val result = Utils.findOldFiles(parent, 5) // find files older than 5 secs
    +    assert(result.size.equals(1))
    +    assert(result(0).getCanonicalPath.equals(child1.getCanonicalPath))
    +  }
    --- End diff --
    
    That's true.  We could add a unit test for deleteRecursively.
    
    Actually, we noticed there isn't a test for Worker in general.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39518393
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39505559
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39518396
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13739/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39481008
  
    @velvia what happens if you just close and re-open the pull request? Will that work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39398980
  
    I think something weird happened with the diff... seems to be showing a lot of changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39398928
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11222213
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -179,12 +184,26 @@ private[spark] class Worker(
           registered = true
           changeMaster(masterUrl, masterWebUiUrl)
           context.system.scheduler.schedule(0 millis, HEARTBEAT_MILLIS millis, self, SendHeartbeat)
    +      context.system.scheduler.schedule(CLEANUP_INTERVAL_MILLIS millis,
    +                                        CLEANUP_INTERVAL_MILLIS millis, self, Worker.AppDirCleanup)
     
         case SendHeartbeat =>
           masterLock.synchronized {
             if (connected) { master ! Heartbeat(workerId) }
           }
     
    +    case Worker.AppDirCleanup =>
    +      // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor
    +      val cleanupFuture = concurrent.future {
    +        logInfo("Cleaning up oldest application directories in " + workDir + " ...")
    +        Utils.findOldFiles(workDir, APP_DATA_RETENTION_SECS)
    +             .foreach(Utils.deleteRecursively(_))
    --- End diff --
    
    nit: I would just do `.foreach(Utils.deleteRecursively)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39623424
  
    Hey guys - think everything is addressed.  Jenkins claims the build passes, but for some reason there is a Failed message at the bottom of the PR.  ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39402194
  
    Build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11222786
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -331,6 +350,7 @@ private[spark] class Worker(
     }
     
     private[spark] object Worker {
    +  case object AppDirCleanup      // Sent to Worker actor periodically for cleaning up app folders
    --- End diff --
    
    Looks a little out of place here. Maybe just move this to `DeployMessages.scala`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39372537
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13688/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39482892
  
    rebasing and then force pushing isn't kind to Git, but GitHub does handle it well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39628968
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13793/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11222159
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -64,6 +64,11 @@ private[spark] class Worker(
       val REGISTRATION_TIMEOUT = 20.seconds
       val REGISTRATION_RETRIES = 3
     
    +  // How often worker will clean up old app folders
    +  val CLEANUP_INTERVAL_MILLIS = conf.getLong("spark.worker.cleanup_interval", 60 * 30) * 1000
    +  // TTL for app folders/data;  after TTL expires it will be cleaned up
    +  val APP_DATA_RETENTION_SECS = conf.getLong("spark.worker.app_data_ttl", 7 * 24 * 3600)
    +
    --- End diff --
    
    For Spark configuration we use camel case: see http://spark.apache.org/docs/0.9.0/configuration.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39308222
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13678/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39398920
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39372535
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11222683
  
    --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
    @@ -154,5 +154,18 @@ class UtilsSuite extends FunSuite {
         val iterator = Iterator.range(0, 5)
         assert(Utils.getIteratorSize(iterator) === 5L)
       }
    +
    +  test("findOldFiles") {
    +    // create some temporary directories and files
    +    val parent: File = Utils.createTempDir()
    +    val child1: File = Utils.createTempDir(parent.getCanonicalPath) // The parent directory has two child directories
    +    val child2: File = Utils.createTempDir(parent.getCanonicalPath)
    +    // set the last modified time of child1 to 10 secs old
    +    child1.setLastModified(System.currentTimeMillis() - (1000 * 10))
    +
    +    val result = Utils.findOldFiles(parent, 5) // find files older than 5 secs
    +    assert(result.size.equals(1))
    +    assert(result(0).getCanonicalPath.equals(child1.getCanonicalPath))
    +  }
    --- End diff --
    
    It would be good to also test the actual "delete recursively" part. Right now if the deleting part fails for some reason this test doesn't catch it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by Mark Hamstra <ma...@clearstorydata.com>.
You're probably better off with a rebase.


On Wed, Apr 2, 2014 at 5:05 PM, velvia <gi...@git.apache.org> wrote:

> Github user velvia commented on the pull request:
>
>     https://github.com/apache/spark/pull/288#issuecomment-39399074
>
>     @pwendell I did a `git merge master` and for some reason it didn't
> display as a merge..... :(    must have gotten confused about multiple
> remotes.
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39399074
  
    @pwendell I did a `git merge master` and for some reason it didn't display as a merge..... :(    must have gotten confused about multiple remotes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39398967
  
    Most comments addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39485564
  
    Rebasing is no big deal if the commits that you are rebasing are only in a private repo.  For example, even though https://github.com/markhamstra/spark is technically a public repo and people could have forked it and be depending on the history in my repo, as a practical matter nobody is going to be using it that way, but instead will be relying upon the canonical upstream repo, https://github.com/apache/spark.  That means that if I am working on a PR in a feature branch Foo, then the easiest and cleanest way to keep my Foo work mergeable with the master branch of Spark is to only 'pull --rebase' the upstream master, not merge pull, thereby avoiding merge commit clutter and the interleaving of my work with other commits to master.  Instead, my PR commits will appear last and be applied only after the current state of master.
    
    That means that the SHAs of my commits will be changing in my repo (but not those of any commits already in master), but nobody should be relying upon those until after they are in master anyway.  Each rebase on an open PR will also cause Jenkins/Travis to re-run the tests, but that's no big deal unless you are rebasing really frequently and unnecessarily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39628967
  
    Build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39255525
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39514444
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39255550
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39301788
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11222791
  
    --- Diff: docs/configuration.md ---
    @@ -586,6 +586,23 @@ Apart from these, the following properties are also available, and may be useful
         Number of cores to allocate for each task.
       </td>
     </tr>
    +<tr>
    +  <td>spark.worker.cleanup_interval</td>
    --- End diff --
    
    It would be good to have an enable/disable option for this. Maybe it could be something like:
    ```
    spark.worker.cleanup.enabled
    spark.worker.cleanup.interval
    spark.worker.cleanup.appDataTtl
    ```
    
    I guess having it enabled by default is okay as long as the default TTL is pretty long (which it is).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11233834
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -331,6 +350,7 @@ private[spark] class Worker(
     }
     
     private[spark] object Worker {
    +  case object AppDirCleanup      // Sent to Worker actor periodically for cleaning up app folders
    --- End diff --
    
    Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39505379
  
    This looks good to me. Any more comments @aarondav or @andrewor14?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/288


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39402195
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13706/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39628076
  
    Jenkins, test this please. (just running on jeknins again, think the travis one is a false negative).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39629852
  
    @aarondav look good to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39301776
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39505544
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39628139
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11231160
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -537,6 +537,21 @@ private[spark] object Utils extends Logging {
       }
     
       /**
    +   * Finds all the files in a directory whose last modified time is older than cutoff seconds.
    +   * @param dir  must be the path to a directory, or IllegalArgumentException is thrown
    +   * @param cutoff measured in seconds. Files older than this are returned.
    +   */
    +  def findOldFiles(dir: File, cutoff: Long): Seq[File] = {
    +    val currentTimeMillis = System.currentTimeMillis
    +    if (dir.isDirectory) {
    +      val files = listFilesSafely(dir)
    +      files.filter { file => file.lastModified < (currentTimeMillis - cutoff * 1000) }
    +    } else {
    +      throw new IllegalArgumentException(dir + " is not a directory!")
    +    }
    +  }
    --- End diff --
    
    Easier to test for sure.  I just prefer smaller more functional pieces of code in general for easier testing and to limit side effects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39514459
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11230586
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -64,6 +64,11 @@ private[spark] class Worker(
       val REGISTRATION_TIMEOUT = 20.seconds
       val REGISTRATION_RETRIES = 3
     
    +  // How often worker will clean up old app folders
    +  val CLEANUP_INTERVAL_MILLIS = conf.getLong("spark.worker.cleanup_interval", 60 * 30) * 1000
    +  // TTL for app folders/data;  after TTL expires it will be cleaned up
    +  val APP_DATA_RETENTION_SECS = conf.getLong("spark.worker.app_data_ttl", 7 * 24 * 3600)
    +
    --- End diff --
    
    Thanks, will fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11319796
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -537,6 +537,21 @@ private[spark] object Utils extends Logging {
       }
     
       /**
    +   * Finds all the files in a directory whose last modified time is older than cutoff seconds.
    +   * @param dir  must be the path to a directory, or IllegalArgumentException is thrown
    +   * @param cutoff measured in seconds. Files older than this are returned.
    --- End diff --
    
    Don't need to rev this again just for this - but is there supposed to be one space or two spaces after the parameter name? It's inconsistent here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39265405
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39489767
  
    Mark, thanks.  Yeah, I realize instead of doing a merge I should just do a
    pull now, but it's probably too late for this PR.  Instead I'll close it
    and reopen and squash it.
    
    
    On Thu, Apr 3, 2014 at 11:15 AM, Mark Hamstra <no...@github.com>wrote:
    
    > Rebasing is no big deal if the commits that you are rebasing are only in a
    > private repo. For example, even though
    > https://github.com/markhamstra/spark is technically a public repo and
    > people could have forked it and be depending on the history in my repo, as
    > a practical matter nobody is going to be using it that way, but instead
    > will be relying upon the canonical upstream repo,
    > https://github.com/apache/spark. That means that if I am working on a PR
    > in a feature branch Foo, then the easiest and cleanest way to keep my Foo
    > work mergeable with the master branch of Spark is to only 'pull --rebase'
    > the upstream master, not merge pull, thereby avoiding merge commit clutter
    > and the interleaving of my work with other commits to master. Instead, my
    > PR commits will appear last and be applied only after the current state of
    > master.
    >
    > That means that the SHAs of my commits will be changing in my repo (but
    > not those of any commits already in master), but nobody should be relying
    > upon those until after they are in master anyway. Each rebase on an open PR
    > will also cause Jenkins/Travis to re-run the tests, but that's no big deal
    > unless you are rebasing really frequently and unnecessarily.
    >
    > --
    > Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/288#issuecomment-39485564>
    > .
    >
    
    
    
    -- 
    The fruit of silence is prayer;
    the fruit of prayer is faith;
    the fruit of faith is love;
    the fruit of love is service;
    the fruit of service is peace.  -- Mother Teresa


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11222616
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -537,6 +537,21 @@ private[spark] object Utils extends Logging {
       }
     
       /**
    +   * Finds all the files in a directory whose last modified time is older than cutoff seconds.
    +   * @param dir  must be the path to a directory, or IllegalArgumentException is thrown
    +   * @param cutoff measured in seconds. Files older than this are returned.
    +   */
    +  def findOldFiles(dir: File, cutoff: Long): Seq[File] = {
    +    val currentTimeMillis = System.currentTimeMillis
    +    if (dir.isDirectory) {
    +      val files = listFilesSafely(dir)
    +      files.filter { file => file.lastModified < (currentTimeMillis - cutoff * 1000) }
    +    } else {
    +      throw new IllegalArgumentException(dir + " is not a directory!")
    +    }
    +  }
    --- End diff --
    
    Is there a reason for not making this `deleteOldFiles` directly? I don't see another use case for getting old files other than deleting them. Is this because we need it for testing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39505488
  
    @markhamstra Thanks for the tips.  The `git push -f` worked.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by velvia <gi...@git.apache.org>.
Github user velvia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11231217
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -179,12 +184,26 @@ private[spark] class Worker(
           registered = true
           changeMaster(masterUrl, masterWebUiUrl)
           context.system.scheduler.schedule(0 millis, HEARTBEAT_MILLIS millis, self, SendHeartbeat)
    +      context.system.scheduler.schedule(CLEANUP_INTERVAL_MILLIS millis,
    +                                        CLEANUP_INTERVAL_MILLIS millis, self, Worker.AppDirCleanup)
     
         case SendHeartbeat =>
           masterLock.synchronized {
             if (connected) { master ! Heartbeat(workerId) }
           }
     
    +    case Worker.AppDirCleanup =>
    +      // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor
    +      val cleanupFuture = concurrent.future {
    --- End diff --
    
    There is a default one for each actor, yes -- from its actorSystem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/288#discussion_r11222765
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -179,12 +184,26 @@ private[spark] class Worker(
           registered = true
           changeMaster(masterUrl, masterWebUiUrl)
           context.system.scheduler.schedule(0 millis, HEARTBEAT_MILLIS millis, self, SendHeartbeat)
    +      context.system.scheduler.schedule(CLEANUP_INTERVAL_MILLIS millis,
    +                                        CLEANUP_INTERVAL_MILLIS millis, self, Worker.AppDirCleanup)
     
         case SendHeartbeat =>
           masterLock.synchronized {
             if (connected) { master ! Heartbeat(workerId) }
           }
     
    +    case Worker.AppDirCleanup =>
    +      // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor
    +      val cleanupFuture = concurrent.future {
    --- End diff --
    
    What ExecutionContext is this using? Is there a default one created for actors?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1154: Clean up app folders in worker nod...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/288#issuecomment-39655659
  
    LGTM, would you mind bringing this up to date with master (looks like a pretty trivial conflict).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---