You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Philipp Hanslovsky (JIRA)" <ji...@apache.org> on 2016/04/14 23:08:25 UTC

[jira] [Created] (SPARK-14641) Specify worker log dir separately from scratch space dir

Philipp Hanslovsky created SPARK-14641:
------------------------------------------

             Summary: Specify worker log dir separately from scratch space dir
                 Key: SPARK-14641
                 URL: https://issues.apache.org/jira/browse/SPARK-14641
             Project: Spark
          Issue Type: Wish
         Environment: Spark standalone on Univa Grid Engine
            Reporter: Philipp Hanslovsky


According to

http://spark.apache.org/docs/latest/spark-standalone.html#monitoring-and-logging
SPARK_WORKER_DIR	Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work).

Spark scratch space and log files share the same directory. In our univa grid engine cluster configuration, we set SPARK_WORKER_DIR=/scratch/spark/work (local drive for each slave) and clean-up SPARK_WORKER_DIR on tear-down of the job to make sure there will be enough space on the drive for subsequent Spark jobs, i.e. regardless of success or fail, all files will be removed.

For the purpose of debugging, I would like to access the slave log files after tear-down. For that purpose, writing the log files into a location different from scratch space, e.g. nfs $HOME, would allow me to keep the log files after tear-down while scratch space could still be cleared.

Is it possible to specify the log dir separately from the scratch space dir? If it doesn't exist yet, I could imagine something like:

SPARK_WORKER_LOG_DIR  - directory for slave logs (default: SPARK_WORKER_DIR)

A (temporary) workaround would be to set SPARK_WORKER_DIR=$HOME, which in this case would be on a network file system instead of locally on the slaves. Do you think, performance would suffer from having non-local scratch space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org