You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2014/11/19 23:57:40 UTC

[Hadoop Wiki] Update of "UnixShellScriptProgrammingGuide" by SomeOtherAccount

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "UnixShellScriptProgrammingGuide" page has been changed by SomeOtherAccount:
https://wiki.apache.org/hadoop/UnixShellScriptProgrammingGuide?action=diff&rev1=14&rev2=15

Comment:
Update docs post-HADOOP-11208 which renamed the daemon variable

  
    b. Add $HADOOP_CLIENT_OPTS to $HADOOP_OPTS (or, for YARN apps, $YARN_CLIENT_OPTS to $YARN_OPTS) if this is an interactive application or for some other reason should have the user client settings applied.
  
-   c. For methods that can also be daemons, set `daemon=true`.  This will allow for the `--daemon` option to work.
+   c. For methods that can also be daemons, set `supportdaemonization=true`.  This will allow for the `--daemon` option to work. See more below.
  
    d. If it supports security, set `secure_service=true` and `secure_user` equal to the user that should run the daemon.
  
@@ -82, +82 @@

  
  * `HADOOP_SLAVE_NAMES` - This is the list of hosts the user passed via `--hostnames`.
  
+ 
+ = Daemonization =
+ 
+ n branch-2 and previous, "daemons" were handled via wrapping "standard" command lines. If we concentrate on the functionality (vs. the code rot...) this has some interesting (and inconsistent) results, especially around logging and pid files. If you run the *-daemon version, you got a pid file and hadoop.root.logger is set to be INFO,(something). When a daemon is run in non-daemon mode (e.g., straight up: 'hdfs namenode'), no pid file is generated and hadoop.root.logger is kept as INFO,console. With no pid file generated, it is possible to run, e.g. hdfs namenode, both in *-daemon.sh mode and in straight up mode again. It also means that one needs to pull apart the process list to safely determine the status of the daemon since pid files aren't always created. This made building custom init scripts fraught with danger. This inconsistency has been a point of frustration for many operations teams.
+ 
+ Post-HADOOP-9902, there is a slight change in the above functionality and one of the key reasons why this is an incompatible change. Sub-commands that were intended to run as daemons (either fully, e.g., namenode or partially, e.g. balancer) have all of this handling consolidated, helping to eliminate code rot as well as providing a consistent user experience across projects. daemon=true, which is a per-script local, but is consistent across the hadoop sub-projects, tells the latter parts of the shell code that this sub-command needs to have some extra-handling enabled beyond the normal commands. In particular, supportdaemonization=true's will always get pid and out files. They will prevent two being run on the same machine by the same user simultaneously (see footnote 1, however). They get some extra options on the java command line. Etc, etc.
+ 
+ So where does --daemon come in? The value of that is stored in a global called HADOOP_DAEMON_MODE. If the user doesn't set it specifically, it defaults to 'default'. This was done to allow the code to mostly replicate the behavior of branch-2 and previous when the *-daemon.sh code was NOT used. In other words, --daemon default (or no value provided), let's commands like hdfs namenode still run in the foreground, just now with pid and out files. --daemon start does the disown (previously a nohup), change the logging output from HADOOP_ROOT_LOGGER to HADOOP_DAEMON_ROOT_LOGGER, add some extra command line options, etc, etc similar to the *-daemon.sh commands.
+ 
+ What happens if daemon mode is set for all commands? The big thing is the pid and out file creation and the checks around it. A user would only ever be able to execute one 'hadoop fs' command at a time because of the pid file! Less than ideal.
+ 
+ To summarize, supportdaemonization=true tells the code that --daemon actually means something to the sub-command. Otherwise, --daemon is ignored.
+ 
+ 1-... unless HADOOP_IDENT_STRING is modified appropriately. This means that post-HADOOP-9902, it is now possible to run two secure datanodes on the same machine as the same user, since all of the logs, pids, and outs, take that into consideration! QA folks should be very happy.
+