You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2014/12/13 00:57:13 UTC
[jira] [Created] (SPARK-4834) Spark fails to clean up cache / lock
files in local dirs
Marcelo Vanzin created SPARK-4834:
-------------------------------------
Summary: Spark fails to clean up cache / lock files in local dirs
Key: SPARK-4834
URL: https://issues.apache.org/jira/browse/SPARK-4834
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.2.0
Reporter: Marcelo Vanzin
This issue was caused by https://github.com/apache/spark/commit/7aacb7bfa.
That change shares downloaded jar / files among multiple executors running on the same host by using a lock file and a cache file for each file the executor needs to download. The problem is that these lock and cache files are never deleted.
On Yarn, the app's dir is automatically deleted when the app ends, so no files are left behind. But on standalone, there's no such thing as "the app's dir"; files will end up in "/tmp" or in whatever place the user configure in "SPARK_LOCAL_DIRS", and will eventually start to fill that volume.
We should add a way to clean up these files. It's not as simple as "hey, just call File.deleteOnExit()!" because we're talking about multiple processes accessing these files, so to maintain the efficiency gains of the original change, the files should only be deleted when the application is finished.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org