You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2014/12/13 00:57:13 UTC

[jira] [Created] (SPARK-4834) Spark fails to clean up cache / lock files in local dirs

Marcelo Vanzin created SPARK-4834:
-------------------------------------

             Summary: Spark fails to clean up cache / lock files in local dirs
                 Key: SPARK-4834
                 URL: https://issues.apache.org/jira/browse/SPARK-4834
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.2.0
            Reporter: Marcelo Vanzin


This issue was caused by https://github.com/apache/spark/commit/7aacb7bfa.

That change shares downloaded jar / files among multiple executors running on the same host by using a lock file and a cache file for each file the executor needs to download. The problem is that these lock and cache files are never deleted.

On Yarn, the app's dir is automatically deleted when the app ends, so no files are left behind. But on standalone, there's no such thing as "the app's dir"; files will end up in "/tmp" or in whatever place the user configure in "SPARK_LOCAL_DIRS", and will eventually start to fill that volume.

We should add a way to clean up these files. It's not as simple as "hey, just call File.deleteOnExit()!" because we're talking about multiple processes accessing these files, so to maintain the efficiency gains of the original change, the files should only be deleted when the application is finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org