You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chad Metcalf (JIRA)" <ji...@apache.org> on 2010/03/02 02:44:06 UTC

[jira] Created: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
------------------------------------------------------

                 Key: HADOOP-6606
                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
             Project: Hadoop Common
          Issue Type: Improvement
    Affects Versions: 0.20.2
            Reporter: Chad Metcalf
            Assignee: Chad Metcalf


/tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839981#action_12839981 ] 

Eli Collins commented on HADOOP-6606:
-------------------------------------

Having a configuration option for the pid dir is genuinely useful, as seen on the lists the default frequently bites new users. 
Other programs provide similar options, eg see http://httpd.apache.org/docs/1.3/mod/core.html#pidfile. 
What's the default pid directory on solaris? Googling doesn't seem to turn up one, if there's no standard across the operating systems than defaulting to a hadoop base dir, the same raitionale was used for the logs dir no?

> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840245#action_12840245 ] 

Allen Wittenauer commented on HADOOP-6606:
------------------------------------------

I agree that having this as a config option is useful.  But the more and more I think about this and the other jira's recently filed that much with hadoop-config.sh, the more convinced I am that these changes are more harmful than helpful.  Instead of a one-time config option, these changes basically do reconfig at *every* launch.  If we feel that users need a crutch to properly configure their systems, it would be MUCH better to have a shell script that could be run at install time that builds hadoop-env.sh for them.  Doing these config options at runtime is going to be a performance killer, especially for interactive commands that get used often like hadoop dfs..

> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840246#action_12840246 ] 

Allen Wittenauer commented on HADOOP-6606:
------------------------------------------

BTW, it is worth pointing out that all of these options are *already* in hadoop-env.sh.  This change and the others in the set just basically override the defaults in case they are not set.

> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Chad Metcalf (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841454#action_12841454 ] 

Chad Metcalf commented on HADOOP-6606:
--------------------------------------

.bq Doing these config options at runtime is going to be a performance killer.

Checking the existence of a env variable compared to the overhead of firing up a VM is negligible. Its a pretty far stretch to say its going to impact performance let alone kill it. 

> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841476#action_12841476 ] 

Allen Wittenauer commented on HADOOP-6606:
------------------------------------------

That for loop is a killer for java detection...

> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841461#action_12841461 ] 

Allen Wittenauer commented on HADOOP-6606:
------------------------------------------

I was speaking collectively of all of your proposed changes to hadoop-config.sh.

> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Chad Metcalf (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chad Metcalf updated HADOOP-6606:
---------------------------------

    Attachment: HADOOP-6606.patch

This patch applies to 0.20.2 only.

> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839976#action_12839976 ] 

Allen Wittenauer commented on HADOOP-6606:
------------------------------------------

Your Linux FHS docs bounce off of me, given I run Solaris. :)

This looks a lot like an OS specific fix that already has a tunable in place that could be changed at install time.  In fact, for Cloudera, it would seem to be smarter to specifically make that happen at RPM install time rather than burden the Hadoop base with a dir that will ultimately need special permissions.


> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Chad Metcalf (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841471#action_12841471 ] 

Chad Metcalf commented on HADOOP-6606:
--------------------------------------

So was I.



> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Chad Metcalf (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839972#action_12839972 ] 

Chad Metcalf commented on HADOOP-6606:
--------------------------------------

bq. If this was wikipedia, I'd add a "citation needed" flag.

There is a number of reasons why you don't put pids in /tmp. Example: tmpwatch reaping the pids for long running processes. RHEL's /etc/cron.daily/tmpwatch defaults to removing anything 10 days old

bq. What happens if hadoop-env.sh sets the pid dir, do we honor that?

We do. +HADOOP_PID_DIR="${HADOOP_PID_DIR:-$HADOOP_HOME/pids}"

bq. Why not use the logs dir rather than introduce another dir?

You are free to do so by setting a HADOOP_PID_DIR. Generally speaking pids are not kept with logs. Example: most distros file system policies want logs in /var/log and pids in /var/run. From http://tldp.org/LDP/Linux-Filesystem-Hierarchy

{noformat}
/var/run
Contains the process identification files (PIDs) of system services and other information about the system that is valid until the system is next booted. For example, /var/run/utmp contains information about users currently logged in.
{noformat}

{noformat}
/var/log
Log files from the system and various programs/services, especially login (/var/log/wtmp, which logs all logins and logouts into the system) and syslog (/var/log/messages, where all kernel and system program message are usually stored). Files in /var/log can often grow indefinitely, and may require cleaning at regular intervals. Something that is now normally managed via log rotation utilities such as 'logrotate'. This utility also allows for the automatic rotation compression, removal and mailing of log files. Logrotate can be set to handle a log file daily, weekly, monthly or when the log file gets to a certain size. Normally, logrotate runs as a daily cron job. This is a good place to start troubleshooting general technical problems.
{noformat}



> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6606) Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839961#action_12839961 ] 

Allen Wittenauer commented on HADOOP-6606:
------------------------------------------

> There is too high a likelihood that pid files could be altered or deleted.

If this was wikipedia, I'd add a "citation needed" flag. 

> A more reasonable default is $HADOOP_HOME/pids.

Same thing.  

Why not use the logs dir rather than introduce another dir?  What happens when the mapred framework and the hdfs framework are run as different users?  Does LinuxTaskController being on have any impact?  What happens if hadoop-env.sh sets the pid dir, do we honor that?  Are we building a "software fix" to work around what is ultimately an installation issue?



> Change the default HADOOP_PID_DIR to $HADOOP_HOME/pids
> ------------------------------------------------------
>
>                 Key: HADOOP-6606
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6606
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Chad Metcalf
>            Assignee: Chad Metcalf
>         Attachments: HADOOP-6606.patch
>
>
> /tmp should not be used as a pid directory. There is too high a likelihood that pid files could be altered or deleted. A more reasonable default is $HADOOP_HOME/pids. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.