You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2006/08/24 04:24:06 UTC

[Lucene-hadoop Wiki] Update of "GettingStartedWithHadoop" by mahadevkonar

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by mahadevkonar:
http://wiki.apache.org/lucene-hadoop/GettingStartedWithHadoop

New page:
= Setting Up a Cluster using Hadoop scripts =
This section explains how to set up a Hadoop cluster running Hadoop DFS and Hadoop Mapreduce. The startup scripts are in hadoop/bin. The file that contains all the slave nodes that would join the DFS and map reduce cluster is the slaves file in hadoop/conf. Edit the slaves file to add nodes to your cluster. You need to edit the slaves file only on the machines you plan to run the Jobtracker and Namenode. Next edit the file hadoop-env.sh in the hadoop/conf directory. Make sure JAVA_HOME is set correctly. You can change the other environment variables as per your requirements. HADOOP_HOME is automatically determined depending on where you run your hadoop scripts from.

== Starting up DFS ==
=== Formatting the Namenode ===
* You are required to format the Namenode for your first installation. This is true only for your first installation. Do not format a Namenode which was already running Hadoop. It will clear up your DFS. Run bin/hadoop namenode -format on the node you plan to run as the Namenode.

=== Environment Variables ===
* The only environment variable that you may need to specify is HADOOP_CONF_DIR. Set this variable to your configure directory which contains hadoop-site.xml, hadoop-env.sh.
* You can get rid of this environment variable by specifying the configure directory as a --config option.
=== Starting up the cluster ===
* After formatting the namenode run bin/start-dfs.sh on the Namenode. This will bring up the Namenode and Datanodes on the machines listed in the slaves file mentioned above.
* Run bin/start-mapred.sh on the machine you plan to run the Jobtracker on. This will bring up the map reduce cluster with Jobtracker running on the machine you ran the command on and Tasktrackers running on machines listed in the slaves file.
* In case you have not set the HADOOP_CONF_DIR variable, you can use bin/start-mapred.sh --config configure_directory.
* Try executing bin/hadoop dfs -lsr / to see if it is working.

=== Stopping the cluster ===
* You can stop the cluster by running bin/stop-mapred.sh and then bin/stop-dfs.sh. You can specify the configure directory by using the --config option.