You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Robert Justice (JIRA)" <ji...@apache.org> on 2012/10/24 14:24:13 UTC

[jira] [Created] (HADOOP-8970) Need a different environment variable or configuration that states where local temporary files are stored than hadoop.tmp.dir

Robert Justice created HADOOP-8970:
--------------------------------------

             Summary: Need a different environment variable or configuration that states where local temporary files are stored than hadoop.tmp.dir
                 Key: HADOOP-8970
                 URL: https://issues.apache.org/jira/browse/HADOOP-8970
             Project: Hadoop Common
          Issue Type: Improvement
          Components: conf
            Reporter: Robert Justice


I'm finding that hadoop.tmp.dir is used for a base directory in configuration of working directories for many other hadoop sub components (mapred, hdfs, hue, etc) and that it directs where the Hadoop client stores some local temporary files, as well as temporary files on HDFS.  

Users may be dealing with tight space in /tmp.  In order to move where job setup files, hive, hue files, etc, are locally stored, they have to create a new directory on HDFS (i.e. /temp) and local directories on another filesystem and make sure permissions are setup properly in HDFS and for the local filesystem across all the nodes across the cluster.

I'm wondering if it would be better to have a hadoop.local.tmp.dir that is configurable at the client level to say where local files are kept, and break that out from hadoop.tmp.dir?  Know this is a major change, but thought I would suggest it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8970) Need a different environment variable or configuration that states where local temporary files are stored than hadoop.tmp.dir

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484119#comment-13484119 ] 

Allen Wittenauer commented on HADOOP-8970:
------------------------------------------

A bit of history...

hadoop.tmp.dir defaults to /tmp to make it easier to run QA tests and to get something up quickly.  On "real" systems, this should be one of the first things changed.  

On the user side...

I think one of the fundamental problems is that end users see 'hadoop.tmp.dir' and think "Hey, I have some temporary files and I'm using Hadoop!  This must be the place!"  

I've been thinking more and more about changing hadoop.tmp.dir during task execution to be the same value as mapred.child.tmp, which is what users are supposed to use.  The other thing is that hadoop.tmp.dir should just get replaced with hadoop-daemon.tmp.dir so that it's perfectly clear what the intent of this variable actually is.

                
> Need a different environment variable or configuration that states where local temporary files are stored than hadoop.tmp.dir
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8970
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: conf
>            Reporter: Robert Justice
>
> I'm finding that hadoop.tmp.dir is used for a base directory in configuration of working directories for many other hadoop sub components (mapred, hdfs, hue, etc) and that it directs where the Hadoop client stores some local temporary files, as well as temporary files on HDFS.  
> Users may be dealing with tight space in /tmp.  In order to move where job setup files, hive, hue files, etc, are locally stored, they have to create a new directory on HDFS (i.e. /temp) and local directories on another filesystem and make sure permissions are setup properly in HDFS and for the local filesystem across all the nodes across the cluster.
> I'm wondering if it would be better to have a hadoop.local.tmp.dir that is configurable at the client level to say where local files are kept, and break that out from hadoop.tmp.dir?  Know this is a major change, but thought I would suggest it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira