You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Mike Dusenberry <du...@gmail.com> on 2016/03/31 19:58:32 UTC

Remove "Scratch Space" In Favor Of Temp Folder

Hi all,

Currently, SystemML makes use of a "scratch space" folder for temporary
files during execution.  This is currently set to a relative
"scratch_space" directory that will be placed relative to the execution
path (local mode) or in the user's directory on HDFS.  This works okay in
some cases, although it can cause confusion as to why the folder exists.
In other cases, such as on Databricks Cloud, a relative path for HDFS is
not allowed, and thus the user must change this "scratch space" folder to
an absolute path, or else a strange error message will occur.

Since this "scratch space" folder is just for temporary files during
execution, might it be better to simply query HDFS (which falls back to
local FS if need) for a temporary folder, and just use that?  If so, this
would remove the need to adjust this setting, thus making it easier to use
SystemML.

Thoughts?


- Mike

--

Michael W. Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Re: Remove "Scratch Space" In Favor Of Temp Folder

Posted by Frederick R Reiss <fr...@us.ibm.com>.
Back when I was new to the system, the scratch_space folder that kept
mysteriously appearing and disappearing in random places was a source of
puzzlement. The way that I figured out what that folder is for was when I
deleted it and my SystemML process crashed. I think it would be good to put
those temp files someplace more private, or to make the default name name
something that makes it clearer the directory belongs to SystemML.

Fred

Matthias Boehm/Almaden/IBM@IBMUS wrote on 04/02/2016 08:32:08 PM:

> From: Matthias Boehm/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 04/02/2016 08:32 PM
> Subject: Re: Remove "Scratch Space" In Favor Of Temp Folder
>
> just to clarify, the configuration 'scratch' (remote tmp working
> directory) is a user-defined configuration coming out of SystemML-
> config.xml with internal default set to ./scratch_space if not
> specified and it is always accessed as dfs (which depending on your
> hadoop configuration might use different file system
> implementations, i.e., hdfs, gpfs, fs, etc).
>
> From my perspective, we should definitely keep the ability to
> specify a path for both local and remote tmp working directories
> because it really simplifies debugging. This is especially true if
> driver/client and executors/tasks run under different users (e.g.,
> with LinuxTaskController, LinuxContainerExecutor, or Spark's yarn-
> client). Btw, these scenarios are indeed good use cases for absolute
> paths because a relative path (if not handled correctly) actually
> refers to different locations for driver/executors.
>
> I would be fine with renaming this configuration to something like
> 'remotetmpdir' (consistent with our 'localtmpdir') and automatically
> obtain temp working directories from hadoop if not specified.
>
> Regards,
> Matthias
>
> [image removed] Mike Dusenberry ---03/31/2016 10:58:44 AM---Hi all,
> Currently, SystemML makes use of a "scratch space" folder for temporary
>
> From: Mike Dusenberry <du...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 03/31/2016 10:58 AM
> Subject: Remove "Scratch Space" In Favor Of Temp Folder
>
>
>
> Hi all,
>
> Currently, SystemML makes use of a "scratch space" folder for temporary
> files during execution.  This is currently set to a relative
> "scratch_space" directory that will be placed relative to the execution
> path (local mode) or in the user's directory on HDFS.  This works okay in
> some cases, although it can cause confusion as to why the folder exists.
> In other cases, such as on Databricks Cloud, a relative path for HDFS is
> not allowed, and thus the user must change this "scratch space" folder to
> an absolute path, or else a strange error message will occur.
>
> Since this "scratch space" folder is just for temporary files during
> execution, might it be better to simply query HDFS (which falls back to
> local FS if need) for a temporary folder, and just use that?  If so, this
> would remove the need to adjust this setting, thus making it easier to
use
> SystemML.
>
> Thoughts?
>
>
> - Mike
>
> --
>
> Michael W. Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>

Re: Remove "Scratch Space" In Favor Of Temp Folder

Posted by Matthias Boehm <mb...@us.ibm.com>.
just to clarify, the configuration 'scratch' (remote tmp working directory)
is a user-defined configuration coming out of SystemML-config.xml with
internal default set to ./scratch_space if not specified and it is always
accessed as dfs (which depending on your hadoop configuration might use
different file system implementations, i.e., hdfs, gpfs, fs, etc).

>From my perspective, we should definitely keep the ability to specify a
path for both local and remote tmp working directories because it really
simplifies debugging. This is especially true if driver/client and
executors/tasks run under different users (e.g., with LinuxTaskController,
LinuxContainerExecutor, or Spark's yarn-client). Btw, these scenarios are
indeed good use cases for absolute paths because a relative path (if not
handled correctly) actually refers to different locations for
driver/executors.

I would be fine with renaming this configuration to something like
'remotetmpdir' (consistent with our 'localtmpdir') and automatically obtain
temp working directories from hadoop if not specified.

Regards,
Matthias



From:	Mike Dusenberry <du...@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	03/31/2016 10:58 AM
Subject:	Remove "Scratch Space" In Favor Of Temp Folder



Hi all,

Currently, SystemML makes use of a "scratch space" folder for temporary
files during execution.  This is currently set to a relative
"scratch_space" directory that will be placed relative to the execution
path (local mode) or in the user's directory on HDFS.  This works okay in
some cases, although it can cause confusion as to why the folder exists.
In other cases, such as on Databricks Cloud, a relative path for HDFS is
not allowed, and thus the user must change this "scratch space" folder to
an absolute path, or else a strange error message will occur.

Since this "scratch space" folder is just for temporary files during
execution, might it be better to simply query HDFS (which falls back to
local FS if need) for a temporary folder, and just use that?  If so, this
would remove the need to adjust this setting, thus making it easier to use
SystemML.

Thoughts?


- Mike

--

Michael W. Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry