You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zach Fry (JIRA)" <ji...@apache.org> on 2015/05/28 18:12:25 UTC

[jira] [Commented] (SPARK-4834) Spark fails to clean up cache / lock files in local dirs

    [ https://issues.apache.org/jira/browse/SPARK-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563183#comment-14563183 ] 

Zach Fry commented on SPARK-4834:
---------------------------------

[~joshrosen], 

We are seeing behavior on Spark 1.3.0 where the _files_ in the {{spark.local.dir}} directories are getting cleaned up, but not the _directories_ themselves. 

Its a pretty simple repro: 
Run a job that does some shuffling, wait for the shuffle files to get cleaned up, go and look on disk at {{spark.local.dir}} and notice that the directory(s) are still there, but there are no files in them. 

Should we reopen another ticket for this? Or can we reopen this one? 



> Spark fails to clean up cache / lock files in local dirs
> --------------------------------------------------------
>
>                 Key: SPARK-4834
>                 URL: https://issues.apache.org/jira/browse/SPARK-4834
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>             Fix For: 1.2.1, 1.3.0
>
>
> This issue was caused by https://github.com/apache/spark/commit/7aacb7bfa.
> That change shares downloaded jar / files among multiple executors running on the same host by using a lock file and a cache file for each file the executor needs to download. The problem is that these lock and cache files are never deleted.
> On Yarn, the app's dir is automatically deleted when the app ends, so no files are left behind. But on standalone, there's no such thing as "the app's dir"; files will end up in "/tmp" or in whatever place the user configure in "SPARK_LOCAL_DIRS", and will eventually start to fill that volume.
> We should add a way to clean up these files. It's not as simple as "hey, just call File.deleteOnExit()!" because we're talking about multiple processes accessing these files, so to maintain the efficiency gains of the original change, the files should only be deleted when the application is finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org