You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Ossi <lo...@gmail.com> on 2011/09/26 13:30:10 UTC

About export HADOOP_NAMENODE_OPTS in hadoop-env.sh

hi,

on page http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html

there is a following instructions:
"For example, To configure Namenode to use parallelGC, the following
statement should be added in hadoop-env.sh:
 export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}""

Basically that's fine. But since hadoop-env.sh is sourced a few times (by
other scripts) while starting cluster with "start-all.sh", the
process looks even without any additional configs (please notice
multiple -Dcom.sun.management.jmxremote
options) like this:

hdfs     27039     1  0 12:56 pts/0    00:00:03 /usr/local/java/bin/java
-Dproc_namenode -Xmx1000m -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote -Dhadoop.log.dir=/logs/hadoop/logs... etc

And if one adds bunch of some configs for each service, the process list
begins to be quite lengthy...

So, is there any particular reason to not to leave out ${HADOOP_NAMENODE_OPTS}
from above example and default hadoop-env.sh which comes with vanilla and
Cloudera's hadoop?
Line would look like this in above example:
export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"

and in default hadoop-env.sh file:
export HADOOP_NAMENODE_OPTS=""

Same issue with these as well:
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_JOBTRACKER_OPTS"


Or am I missing some point here? :)

br, Ossi