You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by pw...@apache.org on 2014/07/15 08:55:44 UTC

git commit: [SPARK-2390] Files in staging directory cannot be deleted and wastes the space of HDFS

Repository: spark
Updated Branches:
  refs/heads/master a2aa7beba -> c6d75745d


[SPARK-2390] Files in staging directory cannot be deleted and wastes the space of HDFS

When running jobs with YARN Cluster mode and using HistoryServer, the files in the Staging Directory (~/.sparkStaging on HDFS) cannot be deleted.
HistoryServer uses directory where event log is written, and the directory is represented as a instance of o.a.h.f.FileSystem created by using FileSystem.get.

On the other hand, ApplicationMaster has a instance named fs, which also created by using FileSystem.get.

FileSystem.get returns cached same instance when URI passed to the method represents same file system and the method is called by same user.
Because of the behavior, when the directory for event log is on HDFS, fs of ApplicationMaster and fileSystem of FileLogger is same instance.
When shutting down ApplicationMaster, fileSystem.close is called in FileLogger#stop, which is invoked by SparkContext#stop indirectly.

And ApplicationMaster#cleanupStagingDir also called by JVM shutdown hook. In this method, fs.delete(stagingDirPath) is invoked.
Because fs.delete in ApplicationMaster is called after fileSystem.close in FileLogger, fs.delete fails and results not deleting files in the staging directory.

I think, calling fileSystem.delete is not needed.

Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>

Closes #1326 from sarutak/SPARK-2390 and squashes the following commits:

10e1a88 [Kousuke Saruta] Removed fileSystem.close from FileLogger.scala not to prevent any other FileSystem operation


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c6d75745
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c6d75745
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c6d75745

Branch: refs/heads/master
Commit: c6d75745de58ff1445912bf72a58b6ad2b3f863c
Parents: a2aa7be
Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
Authored: Mon Jul 14 23:55:39 2014 -0700
Committer: Patrick Wendell <pw...@gmail.com>
Committed: Mon Jul 14 23:55:39 2014 -0700

----------------------------------------------------------------------
 core/src/main/scala/org/apache/spark/util/FileLogger.scala | 1 -
 1 file changed, 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/c6d75745/core/src/main/scala/org/apache/spark/util/FileLogger.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/util/FileLogger.scala b/core/src/main/scala/org/apache/spark/util/FileLogger.scala
index 6a95dc0..9dcdafd 100644
--- a/core/src/main/scala/org/apache/spark/util/FileLogger.scala
+++ b/core/src/main/scala/org/apache/spark/util/FileLogger.scala
@@ -196,6 +196,5 @@ private[spark] class FileLogger(
   def stop() {
     hadoopDataStream.foreach(_.close())
     writer.foreach(_.close())
-    fileSystem.close()
   }
 }