You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2014/12/13 01:13:13 UTC

[jira] [Commented] (SPARK-4834) Spark fails to clean up cache / lock files in local dirs

    [ https://issues.apache.org/jira/browse/SPARK-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245025#comment-14245025 ] 

Marcelo Vanzin commented on SPARK-4834:
---------------------------------------

For example, after running a job that ends up uploading 3 files (the job's jar + two files using "sc.addJar()"), I end up with this:

{noformat}
/tmp/8760687361418428261220_lock
/tmp/-15888563551418428259097_lock
/tmp/16304071011418428261208_lock
/tmp/8760687361418428261220_cache
/tmp/-15888563551418428259097_cache
/tmp/16304071011418428261208_cache
{noformat}

My first idea was to somehow delete these files when the executors go down. But for that, you'd need to keep a ref count somewhere, turning the locking into something way more complicated than it is now.

So my second solution would change the way local dirs are assigned. Basically, each Worker would create one app-specific dir for each directory in SPARK_LOCAL_DIRS, and set that variable when starting the executor. After the app is done, the Worker would clean up that directory. I haven't looked at whether this would require protocol changes, but it should be sort of simple to do.

I'll start looking at the above solution, but feel free to suggest different approaches in the meantime.

> Spark fails to clean up cache / lock files in local dirs
> --------------------------------------------------------
>
>                 Key: SPARK-4834
>                 URL: https://issues.apache.org/jira/browse/SPARK-4834
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
>
> This issue was caused by https://github.com/apache/spark/commit/7aacb7bfa.
> That change shares downloaded jar / files among multiple executors running on the same host by using a lock file and a cache file for each file the executor needs to download. The problem is that these lock and cache files are never deleted.
> On Yarn, the app's dir is automatically deleted when the app ends, so no files are left behind. But on standalone, there's no such thing as "the app's dir"; files will end up in "/tmp" or in whatever place the user configure in "SPARK_LOCAL_DIRS", and will eventually start to fill that volume.
> We should add a way to clean up these files. It's not as simple as "hey, just call File.deleteOnExit()!" because we're talking about multiple processes accessing these files, so to maintain the efficiency gains of the original change, the files should only be deleted when the application is finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org