You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by Sephiroth-Lin <gi...@git.apache.org> on 2015/02/06 02:42:38 UTC

[GitHub] spark pull request: Delete tmp dir when sc is stop

GitHub user Sephiroth-Lin opened a pull request:

    https://github.com/apache/spark/pull/4412

    Delete tmp dir when sc is stop

    When we run driver as a service, and for each time we run job we only call sc.stop, then will not delete tmp dir create by HttpFileServer and SparkEnv, it will be deleted until the service process exit, so we need to delete these tmp dirs when sc is stop directly.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Sephiroth-Lin/spark bug-fix-master-01

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4412
    
----
commit c0d5b283ecbf3332d5a68280b2e617325d5d9b6b
Author: Sephiroth-Lin <li...@gmail.com>
Date:   2015-02-06T01:33:12Z

    Delete tmp dir when sc is stop
    
    When we run driver as a service, and for each time we run job we only call sc.stop, then will not delete tmp dir create by HttpFileServer and SparkEnv, it will be deleted until the service process exit

commit aeac51861e5c5bb185b0738dba2d25e57752038c
Author: Sephiroth-Lin <li...@gmail.com>
Date:   2015-02-06T01:35:33Z

    Delete tmp dir when sc is stop
    
    When we run driver as a service, and for each time we run job we only call sc.stop, then will not delete tmp dir create by HttpFileServer and SparkEnv, it will be deleted until the service process exit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24305906
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -93,6 +93,19 @@ class SparkEnv (
         // actorSystem.awaitTermination()
     
         // Note that blockTransferService is stopped by BlockManager since it is started by it.
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete
    +    // the tmp dir, if not, it will create too many tmp dirs.
    +    // We only need to delete the tmp dir create by driver, because sparkFilesDir is point to the
    +    // current working dir in executor which we do not need to delete.
    +    if (SparkContext.DRIVER_IDENTIFIER == executorId) {
    --- End diff --
    
    I mean that this is not exactly the same condition that decided whether to make a temp dir, although, it looks like it will end up being the same. Since the consequence of accidentally getting this wrong or due to a later change, I suggest making this much more intimately bound. For example: save a reference to the temp dir if it exists. That you know you can delete.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by Sephiroth-Lin <gi...@git.apache.org>.
Github user Sephiroth-Lin commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73682144
  
    @srowen thank you, please help to check again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4412


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by Sephiroth-Lin <gi...@git.apache.org>.
Github user Sephiroth-Lin commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73655978
  
    @srowen thank you, now I add a member to store the reference of the tmp dir if it was created, please help to check again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73681631
  
      [Test build #27199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27199/consoleFull) for   PR 4412 at commit [`b968e14`](https://github.com/apache/spark/commit/b968e148655f108b7501573bc4958369c7e17238).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Delete tmp dir when sc is stop

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73188296
  
    You'll need a JIRA for this. I think it's probably a harmless change, but, a process contains one SparkContext, and isn't generally doing anything else but Spark, so what problem does this cause in practice?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by Sephiroth-Lin <gi...@git.apache.org>.
Github user Sephiroth-Lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24305832
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -93,6 +93,19 @@ class SparkEnv (
         // actorSystem.awaitTermination()
     
         // Note that blockTransferService is stopped by BlockManager since it is started by it.
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete
    +    // the tmp dir, if not, it will create too many tmp dirs.
    +    // We only need to delete the tmp dir create by driver, because sparkFilesDir is point to the
    +    // current working dir in executor which we do not need to delete.
    +    if (SparkContext.DRIVER_IDENTIFIER == executorId) {
    --- End diff --
    
    @srowen sorry, I don't very clear. You mean we can not use the executorId to distinguish the driver and executor?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Delete tmp dir when sc is stop

Posted by Sephiroth-Lin <gi...@git.apache.org>.
Github user Sephiroth-Lin commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73190453
  
    @srowen we run a process as a service which will not stop. In this service process we will create SparkContext and run job and then stop it, because we only call sc.stop but not exit this service process so the tmp dirs created by HttpFileServer and SparkEnv will not be deleted after SparkContext is stopped, and this will lead to creating too many tmp dirs if we create many SparkContext to run job in this service process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73501847
  
      [Test build #27097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27097/consoleFull) for   PR 4412 at commit [`b2018a5`](https://github.com/apache/spark/commit/b2018a5996ad10ef5938bab1dfd9d0dfbc178a56).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73493147
  
      [Test build #27097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27097/consoleFull) for   PR 4412 at commit [`b2018a5`](https://github.com/apache/spark/commit/b2018a5996ad10ef5938bab1dfd9d0dfbc178a56).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by Sephiroth-Lin <gi...@git.apache.org>.
Github user Sephiroth-Lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24306573
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -93,6 +93,19 @@ class SparkEnv (
         // actorSystem.awaitTermination()
     
         // Note that blockTransferService is stopped by BlockManager since it is started by it.
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete
    +    // the tmp dir, if not, it will create too many tmp dirs.
    +    // We only need to delete the tmp dir create by driver, because sparkFilesDir is point to the
    +    // current working dir in executor which we do not need to delete.
    +    if (SparkContext.DRIVER_IDENTIFIER == executorId) {
    --- End diff --
    
    @srowen Than you. If we want to make this much more intimately bound, may be we can check the sparkFilesDir directly, or else, we need to add a parameter to SparkEnv class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24399263
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -367,6 +385,13 @@ object SparkEnv extends Logging {
           metricsSystem,
           shuffleMemoryManager,
           conf)
    +      
    +    // Add a reference to tmp dir created by driver
    +    if (isDriver) {
    --- End diff --
    
    Might call out why this is linked to the `isDriver` logic above for future readers. @JoshRosen thoughts now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24399179
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpFileServer.scala ---
    @@ -50,6 +50,15 @@ private[spark] class HttpFileServer(
     
       def stop() {
         httpServer.stop()
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete 
    +    // the tmp dir, if not, it will create too many tmp dirs
    +    try {
    +      Utils.deleteRecursively(baseDir)
    +    } catch {
    +      case e: Exception =>
    +        logWarning("Exception while deleting Spark temp dir: " + baseDir.getAbsolutePath, e)
    --- End diff --
    
    Nit: consider interpolation here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24289455
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -93,6 +93,19 @@ class SparkEnv (
         // actorSystem.awaitTermination()
     
         // Note that blockTransferService is stopped by BlockManager since it is started by it.
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete
    +    // the tmp dir, if not, it will create too many tmp dirs.
    +    // We only need to delete the tmp dir create by driver, because sparkFilesDir is point to the
    +    // current working dir in executor which we do not need to delete.
    +    if (SparkContext.DRIVER_IDENTIFIER == executorId) {
    --- End diff --
    
    I see that this condition should indeed match the logic which controls whether this directory is a temp dir, but since the consequence of getting this wrong is pretty significant, I wonder if we should be more conservative. For example, save off the `isDriver` flag and use it in both places to be sure that these are exactly in sync? or maintain a separate reference to the temp dir that was created, which is `None` if no temp dir was created, and so you can be sure you can delete it if it's not `None`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73663224
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27186/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73690234
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27199/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73701313
  
      [Test build #27200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27200/consoleFull) for   PR 4412 at commit [`fbbc785`](https://github.com/apache/spark/commit/fbbc7857816aae5ce0c5d900783485f938f28447).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by Sephiroth-Lin <gi...@git.apache.org>.
Github user Sephiroth-Lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24404265
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpFileServer.scala ---
    @@ -50,6 +50,15 @@ private[spark] class HttpFileServer(
     
       def stop() {
         httpServer.stop()
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete 
    +    // the tmp dir, if not, it will create too many tmp dirs
    +    try {
    +      Utils.deleteRecursively(baseDir)
    +    } catch {
    +      case e: Exception =>
    +        logWarning("Exception while deleting Spark temp dir: " + baseDir.getAbsolutePath, e)
    --- End diff --
    
    OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73663218
  
      [Test build #27186 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27186/consoleFull) for   PR 4412 at commit [`1339c96`](https://github.com/apache/spark/commit/1339c969af36f94eb7c07e96ee6459373b7678f9).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73701317
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27200/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24246142
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -93,6 +93,14 @@ class SparkEnv (
         // actorSystem.awaitTermination()
     
         // Note that blockTransferService is stopped by BlockManager since it is started by it.
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete 
    +    // the tmp dir, if not, it will create too many tmp dirs
    +    try {
    +      Utils.deleteRecursively(new File(sparkFilesDir))
    --- End diff --
    
    I'm not sure this one is safe. Looking at the code, `sparkFilesDir` will be the current working directory on executors. Is that really safe to delete recursively?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73690222
  
      [Test build #27199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27199/consoleFull) for   PR 4412 at commit [`b968e14`](https://github.com/apache/spark/commit/b968e148655f108b7501573bc4958369c7e17238).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Delete tmp dir when sc is stop

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73167179
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by Sephiroth-Lin <gi...@git.apache.org>.
Github user Sephiroth-Lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24285284
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -93,6 +93,14 @@ class SparkEnv (
         // actorSystem.awaitTermination()
     
         // Note that blockTransferService is stopped by BlockManager since it is started by it.
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete 
    +    // the tmp dir, if not, it will create too many tmp dirs
    +    try {
    +      Utils.deleteRecursively(new File(sparkFilesDir))
    --- End diff --
    
    @srowen @JoshRosen you are right, as we only need to delete the tmp dir create by driver, so check the sparkFilesDir first to confirm it was created by driver. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73655674
  
      [Test build #27186 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27186/consoleFull) for   PR 4412 at commit [`1339c96`](https://github.com/apache/spark/commit/1339c969af36f94eb7c07e96ee6459373b7678f9).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73492548
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73501855
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27097/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24399228
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -76,6 +76,8 @@ class SparkEnv (
       // (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
       private[spark] val hadoopJobMetadata = new MapMaker().softValues().makeMap[String, Any]()
     
    +  private var tmpFilesDir: Option[String] = None
    --- End diff --
    
    Yeah this looks good. Maybe call it `driverTmpDirToDelete` or something to be extra clear?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73738745
  
    OK, that's looking good to me. I'll pause shortly to let others review it too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24316626
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -93,6 +93,19 @@ class SparkEnv (
         // actorSystem.awaitTermination()
     
         // Note that blockTransferService is stopped by BlockManager since it is started by it.
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete
    +    // the tmp dir, if not, it will create too many tmp dirs.
    +    // We only need to delete the tmp dir create by driver, because sparkFilesDir is point to the
    +    // current working dir in executor which we do not need to delete.
    +    if (SparkContext.DRIVER_IDENTIFIER == executorId) {
    --- End diff --
    
    I think you just need to store one more member, which is a reference to the tmp dir if it was created. See above. That's pretty much fool-proof and doesn't require any new params anywhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24245999
  
    --- Diff: core/src/main/scala/org/apache/spark/HttpFileServer.scala ---
    @@ -50,6 +50,15 @@ private[spark] class HttpFileServer(
     
       def stop() {
         httpServer.stop()
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete 
    +    // the tmp dir, if not, it will create too many tmp dirs
    +    try {
    +      Utils.deleteRecursively(baseDir)
    --- End diff --
    
    This seems OK to me. I would make the error log a warning though as it's non-fatal. Consider using an interpolated string here as you do below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4412#discussion_r24267365
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
    @@ -93,6 +93,14 @@ class SparkEnv (
         // actorSystem.awaitTermination()
     
         // Note that blockTransferService is stopped by BlockManager since it is started by it.
    +    
    +    // If we only stop sc, but the driver process still run as a services then we need to delete 
    +    // the tmp dir, if not, it will create too many tmp dirs
    +    try {
    +      Utils.deleteRecursively(new File(sparkFilesDir))
    --- End diff --
    
    I agree; this seems unsafe.  It would be a disaster if we accidentally deleted directories that we didn't create, so we can't delete any path that could point to the CWD.  Instead, we might be able to either ensure that the CWD is a subfolder of a spark local directory (so it will be cleaned up as part of our baseDir cleanup) or just change `sparkFilesDir` to not download files to the CWD (e.g. create a temporary directory in both the driver and executors).
    
    Old versions of the `addFile` API contract said that files would be downloaded to the CWD, but we haven't made that promise since Spark 0.7-ish, I think; we only technically guarantee that SparkFIles.get will return the file paths.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4412#issuecomment-73691995
  
      [Test build #27200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27200/consoleFull) for   PR 4412 at commit [`fbbc785`](https://github.com/apache/spark/commit/fbbc7857816aae5ce0c5d900783485f938f28447).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org