You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2014/12/13 00:02:13 UTC
[jira] [Commented] (HIVE-9017) Clean up temp files of RSC [Spark
Branch]
[ https://issues.apache.org/jira/browse/HIVE-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244944#comment-14244944 ]
Marcelo Vanzin commented on HIVE-9017:
--------------------------------------
These files are created by Spark when downloading resources for the app (e.g. application jars). In standalone mode, by default, these files will end up in /tmp (java.io.tmpdir). The problem is that the app doesn't clean up these files; in fact, it can't, because they are supposed to be shared in case multiple executors run on the same host - so one executor cannot unilaterally decide to delete them.
(That's not entirely true; I guess it could, but then it would cause other executors to re-download the file when needed, so more overhead.)
This is not a problem in Yarn mode, since the temp dir is under a Yarn-managed directory that is deleted when the app shuts down.
So, while I think of a clean way to fix this in Spark, the following can be done on the Hive side:
- create an app-specific temp directory before launching the Spark app
- set {{spark.local.dir}} to that location
- delete the directory when the client shuts down
> Clean up temp files of RSC [Spark Branch]
> -----------------------------------------
>
> Key: HIVE-9017
> URL: https://issues.apache.org/jira/browse/HIVE-9017
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Rui Li
>
> Currently RSC will leave a lot of temp files in {{/tmp}}, including {{*_lock}}, {{*_cache}}, {{spark-submit.*.properties}}, etc.
> We should clean up these files or it will exhaust disk space.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)