You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mccheah <gi...@git.apache.org> on 2014/10/01 03:59:33 UTC

[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

GitHub user mccheah opened a pull request:

    https://github.com/apache/spark/pull/2609

    [SPARK-1860] More conservative app directory cleanup.

    First contribution to the project, so apologize for any significant errors.
    
    This PR addresses [SPARK-1860]. The application directories are now cleaned up in a more conservative manner.
    
    Previously, app-* directories were cleaned up if the directory's timestamp was older than a given time. However, the timestamp on a directory does not reflect the modification times of the files in that directory. Therefore, app-* directories were wiped out even if the files inside them were created recently and possibly being used by Executor tasks.
    
    The solution is to change the cleanup logic to inspect all files within the app-* directory and only eliminate the app-* directory if all files in the directory are stale.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mccheah/spark worker-better-app-dir-cleanup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2609.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2609
    
----
commit 620e4a51219552b6faa61a3f07df8fba3928e940
Author: mcheah <mc...@palantir.com>
Date:   2014-09-30T18:23:39Z

    [SPARK-1860] More conservative app directory cleanup.
    
    Before, the app-* directory was cleaned up whenever its timestamp was
    older than a given time. However, the timestamp on a directory may be
    older than the timestamps of the files the directory contains. This
    change only cleans up app-* directories if all of the directory's
    contents are old.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18303980
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -23,6 +23,7 @@ import java.util.Date
     
     import scala.collection.mutable.HashMap
     import scala.concurrent.duration._
    +import scala.collection.JavaConversions._
    --- End diff --
    
    Please reorganize imports according to our style: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57567579
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21167/consoleFull) for   PR 2609 at commit [`a045620`](https://github.com/apache/spark/commit/a04562069843603725cd4320afce9e5b19abe53b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18360895
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -703,17 +705,20 @@ private[spark] object Utils extends Logging {
       }
     
       /**
    -   * Finds all the files in a directory whose last modified time is older than cutoff seconds.
    -   * @param dir  must be the path to a directory, or IllegalArgumentException is thrown
    -   * @param cutoff measured in seconds. Files older than this are returned.
    +   * Determines if a directory contains any files newer than cutoff seconds.
    +   * 
    +   * @param dir must be the path to a directory, or IllegalArgumentException is thrown
    +   * @param cutoff measured in seconds. Returns true if there are any files in dir newer than this.
        */
    -  def findOldFiles(dir: File, cutoff: Long): Seq[File] = {
    +  def doesDirectoryContainAnyNewFiles(dir: File, cutoff: Long) : Boolean = {
         val currentTimeMillis = System.currentTimeMillis
    -    if (dir.isDirectory) {
    -      val files = listFilesSafely(dir)
    -      files.filter { file => file.lastModified < (currentTimeMillis - cutoff * 1000) }
    +    if (!dir.isDirectory) {
    +      throw new IllegalArgumentException (dir + " is not a directory!")
         } else {
    -      throw new IllegalArgumentException(dir + " is not a directory!")
    +      val files = FileUtils.listFilesAndDirs(dir, TrueFileFilter.TRUE, TrueFileFilter.TRUE)
    +      val cutoffTimeInMillis = (currentTimeMillis - (cutoff * 1000))
    +      val newFiles = files.filter { file => file.lastModified > cutoffTimeInMillis }
    +      (dir.lastModified > cutoffTimeInMillis) || (!newFiles.isEmpty)
    --- End diff --
    
    nit: newFiles.nonEmpty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18316820
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -22,6 +22,7 @@ import java.text.SimpleDateFormat
     import java.util.Date
     
     import scala.collection.mutable.HashMap
    +import scala.collection.JavaConversions._
    --- End diff --
    
    Is there settings in the Eclipse Scala IDE I can use to remember this? I would have thought that sbt/sbt eclipse would have generated the project to conform to the style guidelines automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57561449
  
    Jenkins, ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18365510
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala ---
    @@ -174,7 +168,7 @@ private[spark] class ExecutorRunner(
             killProcess(None)
           }
           case e: Exception => {
    -        logError("Error running executor", e)
    +        logError(e.toString, e)
    --- End diff --
    
    That link pushes me to the diff view, so it is not exactly clear which line you're referring to. Can you give me a line number in the file on the head of my branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57409825
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18362643
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -202,9 +205,20 @@ private[spark] class Worker(
           // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor
           val cleanupFuture = concurrent.future {
             logInfo("Cleaning up oldest application directories in " + workDir + " ...")
    --- End diff --
    
    Let's get rid of this unconditional periodic log message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57840806
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21255/consoleFull) for   PR 2609 at commit [`87b5d03`](https://github.com/apache/spark/commit/87b5d036439ee51df0c95ab48f6ec664ed8a66e2).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18362788
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -202,9 +205,20 @@ private[spark] class Worker(
           // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor
           val cleanupFuture = concurrent.future {
             logInfo("Cleaning up oldest application directories in " + workDir + " ...")
    --- End diff --
    
    Also, perhaps add a logInfo on line 196 above when CLEANUP_ENABLED to say like "Worker cleanup is enabled, so old application directories will be deleted." I think this message on startup, plus the message on actual directory deletion, is ideal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18360191
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala ---
    @@ -174,7 +168,7 @@ private[spark] class ExecutorRunner(
             killProcess(None)
           }
           case e: Exception => {
    -        logError("Error running executor", e)
    +        logError(e.toString, e)
    --- End diff --
    
    I think this is fine as it was -- I was referring to this line:
    https://github.com/apache/spark/pull/2609/files#diff-916ca56b663f178f302c265b7ef38499R271


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57688203
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21211/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18360953
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -703,17 +705,20 @@ private[spark] object Utils extends Logging {
       }
     
       /**
    -   * Finds all the files in a directory whose last modified time is older than cutoff seconds.
    -   * @param dir  must be the path to a directory, or IllegalArgumentException is thrown
    -   * @param cutoff measured in seconds. Files older than this are returned.
    +   * Determines if a directory contains any files newer than cutoff seconds.
    +   * 
    +   * @param dir must be the path to a directory, or IllegalArgumentException is thrown
    +   * @param cutoff measured in seconds. Returns true if there are any files in dir newer than this.
        */
    -  def findOldFiles(dir: File, cutoff: Long): Seq[File] = {
    +  def doesDirectoryContainAnyNewFiles(dir: File, cutoff: Long) : Boolean = {
         val currentTimeMillis = System.currentTimeMillis
    -    if (dir.isDirectory) {
    -      val files = listFilesSafely(dir)
    -      files.filter { file => file.lastModified < (currentTimeMillis - cutoff * 1000) }
    +    if (!dir.isDirectory) {
    +      throw new IllegalArgumentException (dir + " is not a directory!")
         } else {
    -      throw new IllegalArgumentException(dir + " is not a directory!")
    +      val files = FileUtils.listFilesAndDirs(dir, TrueFileFilter.TRUE, TrueFileFilter.TRUE)
    +      val cutoffTimeInMillis = (currentTimeMillis - (cutoff * 1000))
    +      val newFiles = files.filter { file => file.lastModified > cutoffTimeInMillis }
    --- End diff --
    
    nit: could be simplified to
    `files.filter(_.lastModified > cutoffTimeInMillis)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57717841
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21224/consoleFull) for   PR 2609 at commit [`802473e`](https://github.com/apache/spark/commit/802473e56b99f3382d2cf4737d3c9e7e86281398).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57831124
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21255/consoleFull) for   PR 2609 at commit [`87b5d03`](https://github.com/apache/spark/commit/87b5d036439ee51df0c95ab48f6ec664ed8a66e2).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18316319
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -233,8 +244,15 @@ private[spark] class Worker(
           } else {
             try {
               logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
    +
    +          // Create the executor's working directory
    +          val executorDir = new File(workDir, appId + "/" + execId)
    +          if (!executorDir.mkdirs()) {
    +            throw new IOException("Failed to create directory " + executorDir)
    --- End diff --
    
    I just realized that the ExecutorStateChanged here does not give any indication of what happened. Would you mind setting the message to `Some(e.toString)`. This will include the Exception's class and message, but not stack trace, which seems reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57725130
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21224/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18360930
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -703,17 +705,20 @@ private[spark] object Utils extends Logging {
       }
     
       /**
    -   * Finds all the files in a directory whose last modified time is older than cutoff seconds.
    -   * @param dir  must be the path to a directory, or IllegalArgumentException is thrown
    -   * @param cutoff measured in seconds. Files older than this are returned.
    +   * Determines if a directory contains any files newer than cutoff seconds.
    +   * 
    +   * @param dir must be the path to a directory, or IllegalArgumentException is thrown
    +   * @param cutoff measured in seconds. Returns true if there are any files in dir newer than this.
        */
    -  def findOldFiles(dir: File, cutoff: Long): Seq[File] = {
    +  def doesDirectoryContainAnyNewFiles(dir: File, cutoff: Long) : Boolean = {
         val currentTimeMillis = System.currentTimeMillis
    -    if (dir.isDirectory) {
    -      val files = listFilesSafely(dir)
    -      files.filter { file => file.lastModified < (currentTimeMillis - cutoff * 1000) }
    +    if (!dir.isDirectory) {
    +      throw new IllegalArgumentException (dir + " is not a directory!")
         } else {
    -      throw new IllegalArgumentException(dir + " is not a directory!")
    +      val files = FileUtils.listFilesAndDirs(dir, TrueFileFilter.TRUE, TrueFileFilter.TRUE)
    +      val cutoffTimeInMillis = (currentTimeMillis - (cutoff * 1000))
    +      val newFiles = files.filter { file => file.lastModified > cutoffTimeInMillis }
    +      (dir.lastModified > cutoffTimeInMillis) || (!newFiles.isEmpty)
    --- End diff --
    
    FileUtils.listFilesAndDirs() appears to include the top-level directory as well, so I don't think we need to special-case it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18407971
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -191,6 +194,8 @@ private[spark] class Worker(
           changeMaster(masterUrl, masterWebUiUrl)
           context.system.scheduler.schedule(0 millis, HEARTBEAT_MILLIS millis, self, SendHeartbeat)
           if (CLEANUP_ENABLED) {
    +        logInfo("Worker cleanup is enabled, so old application directories will be deleted"
    --- End diff --
    
    Except that length does not account for indenting ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57573135
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21169/consoleFull) for   PR 2609 at commit [`77a9de0`](https://github.com/apache/spark/commit/77a9de0adb733c406440e0f498f888939461f831).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18360897
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -703,17 +705,20 @@ private[spark] object Utils extends Logging {
       }
     
       /**
    -   * Finds all the files in a directory whose last modified time is older than cutoff seconds.
    -   * @param dir  must be the path to a directory, or IllegalArgumentException is thrown
    -   * @param cutoff measured in seconds. Files older than this are returned.
    +   * Determines if a directory contains any files newer than cutoff seconds.
    +   * 
    +   * @param dir must be the path to a directory, or IllegalArgumentException is thrown
    +   * @param cutoff measured in seconds. Returns true if there are any files in dir newer than this.
        */
    -  def findOldFiles(dir: File, cutoff: Long): Seq[File] = {
    +  def doesDirectoryContainAnyNewFiles(dir: File, cutoff: Long) : Boolean = {
    --- End diff --
    
    nit: style is to not put a space before the colon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18316404
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -233,8 +244,15 @@ private[spark] class Worker(
           } else {
             try {
               logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
    +
    +          // Create the executor's working directory
    +          val executorDir = new File(workDir, appId + "/" + execId)
    +          if (!executorDir.mkdirs()) {
    +            throw new IOException("Failed to create directory " + executorDir)
    --- End diff --
    
    To be clear, I'm referring to the `catch` clause of this try.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18365112
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -233,8 +244,15 @@ private[spark] class Worker(
           } else {
             try {
               logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
    +
    +          // Create the executor's working directory
    +          val executorDir = new File(workDir, appId + "/" + execId)
    +          if (!executorDir.mkdirs()) {
    +            throw new IOException("Failed to create directory " + executorDir)
    --- End diff --
    
    Can you point me to a specific line number? There's a few places where logs are done in catches.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18373175
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -242,7 +267,8 @@ private[spark] class Worker(
               master ! ExecutorStateChanged(appId, execId, manager.state, None, None)
             } catch {
               case e: Exception => {
    -            logError("Failed to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
    +            logError("Failed to launch executor %s/%d for %s. Caused by exception: %s"
    --- End diff --
    
    The second parameter to log* is an exception, which is printed with full stack trace, so we should use this instead. Additionally, this code was written when Spark was using 2.8 or 2.9, now we can just use string interpolation:
    ```scala
    logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
    ```
    (The initial "s" activates the string interpolation.)
    
    Additionally, looking on line 276, let's change it to
    ```scala
    master ! ExecutorStateChanged(appId, execId, ExecutorState.FAILED, Some(e.toString), None)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57962393
  
    FYI, this broke the build for some versions of Hadoop:
    
    ```
    [INFO] Compiling 395 Scala sources and 29 Java sources to <https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=2.0.0-mr1-cdh4.1.2,label=centos/ws/core/target/scala-2.10/classes...>
    [ERROR] <https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=2.0.0-mr1-cdh4.1.2,label=centos/ws/core/src/main/scala/org/apache/spark/util/Utils.scala>:720: value listFilesAndDirs is not a member of object org.apache.commons.io.FileUtils
    [ERROR]       val files = FileUtils.listFilesAndDirs(dir, TrueFileFilter.TRUE, TrueFileFilter.TRUE)
    [ERROR]                             ^
    [ERROR] one error found
    ```
    
    This is being addressed by #2662.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18316775
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -22,6 +22,7 @@ import java.text.SimpleDateFormat
     import java.util.Date
     
     import scala.collection.mutable.HashMap
    +import scala.collection.JavaConversions._
    --- End diff --
    
    Imports still need to be reorganized -- this one should go at the top, and the bottom two imports need to be moved into their right positions :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57567979
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21168/consoleFull) for   PR 2609 at commit [`7b7cae4`](https://github.com/apache/spark/commit/7b7cae481661b7b2a58ad96e08291573a0e043e7).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57567664
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21167/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18373242
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -191,6 +194,8 @@ private[spark] class Worker(
           changeMaster(masterUrl, masterWebUiUrl)
           context.system.scheduler.schedule(0 millis, HEARTBEAT_MILLIS millis, self, SendHeartbeat)
           if (CLEANUP_ENABLED) {
    +        logInfo("Worker cleanup is enabled, so old application directories will be deleted"
    --- End diff --
    
    nit: I noticed that if you make the log message less grammatically correct and used string interpolation, you can fit it in exactly 100 characters.
    ```scala
    logInfo(s"Worker cleanup enabled; old application directories will be deleted in: $workDir")
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57568072
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21168/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18372979
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -22,6 +22,7 @@ import java.text.SimpleDateFormat
     import java.util.Date
     
     import scala.collection.mutable.HashMap
    +import scala.collection.JavaConversions._
    --- End diff --
    
    Unfortunately I'm an IntelliJ user -- IntelliJ (recently) got the ability to organize these by default, but I'm not sure if the Eclipse version has it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18322230
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -202,9 +205,20 @@ private[spark] class Worker(
           // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor
           val cleanupFuture = concurrent.future {
             logInfo("Cleaning up oldest application directories in " + workDir + " ...")
    -        Utils.findOldFiles(workDir, APP_DATA_RETENTION_SECS)
    -          .foreach(Utils.deleteRecursively)
    +        val appDirs = workDir.listFiles()
    +        if (appDirs == null) {
    +          throw new IOException("ERROR: Failed to list files in " + appDirs)
    +        }
    +        appDirs.filter { dir => {
    --- End diff --
    
    You do not need the extra bracket after the "dir =>". We use the enclosing bracket's scope.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57688191
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21211/consoleFull) for   PR 2609 at commit [`e0a1f2e`](https://github.com/apache/spark/commit/e0a1f2e874cc852ce0cc942130a7765a7deb8133).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57567662
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21167/consoleFull) for   PR 2609 at commit [`a045620`](https://github.com/apache/spark/commit/a04562069843603725cd4320afce9e5b19abe53b).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57568070
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21168/consoleFull) for   PR 2609 at commit [`7b7cae4`](https://github.com/apache/spark/commit/7b7cae481661b7b2a58ad96e08291573a0e043e7).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57716110
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21223/consoleFull) for   PR 2609 at commit [`620f52f`](https://github.com/apache/spark/commit/620f52f32ae24719b95d14e79bb0fc004b8e5770).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2609


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57716270
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21223/consoleFull) for   PR 2609 at commit [`620f52f`](https://github.com/apache/spark/commit/620f52f32ae24719b95d14e79bb0fc004b8e5770).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class GetPeers(blockManagerId: BlockManagerId) extends ToBlockManagerMaster`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18316745
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -43,6 +43,7 @@ import tachyon.client.{TachyonFile,TachyonFS}
     import org.apache.spark._
     import org.apache.spark.executor.ExecutorUncaughtExceptionHandler
     import org.apache.spark.serializer.{DeserializationStream, SerializationStream, SerializerInstance}
    +import org.apache.commons.io.FileUtils
    --- End diff --
    
    Reorganize imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57568876
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21169/consoleFull) for   PR 2609 at commit [`77a9de0`](https://github.com/apache/spark/commit/77a9de0adb733c406440e0f498f888939461f831).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57725119
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21224/consoleFull) for   PR 2609 at commit [`802473e`](https://github.com/apache/spark/commit/802473e56b99f3382d2cf4737d3c9e7e86281398).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class GetPeers(blockManagerId: BlockManagerId) extends ToBlockManagerMaster`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57573144
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21169/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57864056
  
    LGTM, merging into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57716274
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21223/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18362661
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -202,9 +205,20 @@ private[spark] class Worker(
           // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor
           val cleanupFuture = concurrent.future {
             logInfo("Cleaning up oldest application directories in " + workDir + " ...")
    -        Utils.findOldFiles(workDir, APP_DATA_RETENTION_SECS)
    -          .foreach(Utils.deleteRecursively)
    +        val appDirs = workDir.listFiles()
    +        if (appDirs == null) {
    +          throw new IOException("ERROR: Failed to list files in " + appDirs)
    +        }
    +        appDirs.filter { dir =>
    +          // the directory is used by an application - check that the application is not running
    +          // when cleaning up
    +          val appIdFromDir = dir.getName
    +          val isAppStillRunning = executors.values.map(_.appId).contains(appIdFromDir)
    +          dir.isDirectory && !isAppStillRunning &&
    +          !Utils.doesDirectoryContainAnyNewFiles(dir, APP_DATA_RETENTION_SECS)
    +        }.foreach(Utils.deleteRecursively)
    --- End diff --
    
    Let's add a logInfo here that we're cleaning up a particular directory right before the deleteRecursively.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18303912
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -202,9 +205,16 @@ private[spark] class Worker(
           // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor
           val cleanupFuture = concurrent.future {
             logInfo("Cleaning up oldest application directories in " + workDir + " ...")
    -        Utils.findOldFiles(workDir, APP_DATA_RETENTION_SECS)
    -          .foreach(Utils.deleteRecursively)
    +        val appDirs = workDir.listFiles()
    +        if (appDirs == null) {
    +          throw new IOException("ERROR: Failed to list files in " + appDirs)
    +        }
    +        appDirs.filter {
    --- End diff --
    
    The typical style here is to put the parameter on the this line, i.e.
    
    ```scala
    appDirs.filter { dir =>
      dir.isDirectory && !Utils.doesDirectoryContainAnyNewFiles(dir, APP_DATA_RETENTION_SECS)
    }.foreach(Utils.deleteRecursively)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57529239
  
    As mentioned in the JIRA, I think it would be very good to also check the appId to make sure the Executors are indeed terminated. It does not seem unreasonable to me that some Spark clusters might remain idle for a couple days before someone comes back to them, with the expectation that they still work.
    
    I think we can achieve this in a pretty type-safe manner by changing the
    think we can achieve this in a pretty type-safe manner by changing the ExecutorRunner to take in the "executorWorkDir" instead of "workDir", and thus making Worker have control over the fact that app dirs are named with the app's ID.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57694773
  
    This change looks good to me. I tested it locally with a small cluster, and it behaves as expected. My main remaining comments are about the logging, as it was pretty opaque when the feature was turned on and when it was actually deleting things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57840815
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21255/Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57671921
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21211/consoleFull) for   PR 2609 at commit [`e0a1f2e`](https://github.com/apache/spark/commit/e0a1f2e874cc852ce0cc942130a7765a7deb8133).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2609#issuecomment-57562399
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21164/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1860] More conservative app directory c...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2609#discussion_r18316727
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ---
    @@ -202,9 +205,17 @@ private[spark] class Worker(
           // Spin up a separate thread (in a future) to do the dir cleanup; don't tie up worker actor
           val cleanupFuture = concurrent.future {
             logInfo("Cleaning up oldest application directories in " + workDir + " ...")
    -        Utils.findOldFiles(workDir, APP_DATA_RETENTION_SECS)
    -          .foreach(Utils.deleteRecursively)
    +        val appDirs = workDir.listFiles()
    +        if (appDirs == null) {
    +          throw new IOException("ERROR: Failed to list files in " + appDirs)
    +        }
    +        appDirs.filter {
    +          dir => 
    +            dir.isDirectory && executors.keys.filter(key => dir.getPath.contains(key)).isEmpty &&
    --- End diff --
    
    I think the key to the executors is appId/execId, which is not the same as the dir's path, which contains only appId. Additionally I would clarify this logic, either by extracting it out to a variable or adding a comment, or both. Something like
    
    ```scala
    val appId = dir.getPath.getName
    // Attempt to filter out running applications based on the existence of executors for that appId.
    val isAppStillRunning = executors.values.map(_.appId).contains(dir.getPath.getName)
    ```
    
    I would also add a higher level comment that describes the overall logic (i.e., only delete directories for apps that are not running anymore and whose files are sufficiently old).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org