You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2006/09/18 02:31:41 UTC

[Lucene-hadoop Wiki] Update of "GettingStartedWithHadoop" by SameerParanjpye

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by SameerParanjpye:
http://wiki.apache.org/lucene-hadoop/GettingStartedWithHadoop

------------------------------------------------------------------------------
= Downloading and installing Hadoop =
- Hadoop can be downloaded from [http://www.apache.org/dyn/closer.cgi/lucene/hadoop/ here]. You may also download a nightly build from [http://cvs.apache.org/dist/lucene/hadoop/nightly/ here] or check out the code from [http://lucene.apache.org/hadoop/version_control.html subversion] and build it with [http://ant.apache.org Ant]. Select a directory to install Hadoop under (let's call it <installdir>) and untar the tarball in that directory. This will create a directory called hadoop-<version> under <installdir>. All scripts and tools needed to run Hadoop are present in the directory hadoop-<version>/bin. This directory will subsequently be referred to as "hadoop/bin" in this document.
+
+ Hadoop can be downloaded from [http://www.apache.org/dyn/closer.cgi/lucene/hadoop/ here]. You may also
+ download a nightly build from [http://cvs.apache.org/dist/lucene/hadoop/nightly/ here] or check out the
+ code from [http://lucene.apache.org/hadoop/version_control.html subversion] and build it with
+ [http://ant.apache.org Ant]. Select a directory to install Hadoop under (let's call it ''hadoop-install'')
+ and untar the tarball in that directory. If you downloaded version ''<ver>'' of Hadoop, untarring will
+ create a directory called ''hadoop-<ver>'' in the ''hadoop-install'' directory. All scripts and tools
+ used to run Hadoop will be present in the directory ''hadoop-<ver>/bin''. All configuration files for
+ Hadoop will be present in the directory ''hadoop-<ver>/conf''. These directories will subsequently be
+ referred to as ''hadoop/bin'' and ''hadoop/conf'' respectively in this document.
+
+ == Startup scripts ==
+
+ The ''hadoop/bin'' directory contains some scripts used to launch Hadoop DFS and Hadoop Map/Reduce daemons. These
+ are:
+
+ * ''start-all.sh'' - Starts all Hadoop daemons, the namenode, datanodes, the jobtracker and tasktrackers.
+ * ''stop-all.sh'' - Stops all Hadoop daemons.
+ * ''start-mapred.sh'' - Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers.
+ * ''stop-mapred.sh'' - Stops the Hadoop Map/Reduce daemons.
+ * ''start-dfs.sh'' - Starts the Hadoop DFS daemons, the namenode and datanodes.
+ * ''stop-dfs.sh'' - Stops the Hadoop DFS daemons.
+
+ == Configuration files ==
+
+ The ''hadoop/conf'' directory contains some configuration files for Hadoop. These are:
+
+ * ''hadoop-env.sh'' - This file contains some environment variable settings used by Hadoop. You can use these to affect some aspects of Hadoop daemon behavior, such as where log files are stored, the maximum amount of heap used etc. The only variable you should need to change in this file is JAVA_HOME, which specifies the path to the Java installation used by Hadoop.
+ * ''slaves'' - This file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers) will run. By default this contains the single entry ''localhost''
+ * ''hadoop-default.xml'' - This file contains generic default settings for Hadoop daemons and Map/Reduce jobs. '''Do not modify this file.'''
+ * ''mapred-default.xml'' - This file contains site specific settings for the Hadoop Map/Reduce daemons and jobs. The file is empty by default. Putting configuration properties in this file will override Map/Reduce settings in the ''hadoop-default.xml'' file. Use this file to tailor the behavior of Map/Reduce on your site.
+ * ''hadoop-site.xml'' - This file contains site specific settings for all Hadoop daemons and Map/Reduce jobs. This file is empty by default. Settings in this file override the settings in ''hadoop-default.xml'' and ''mapred-default.xml''. This file should contain settings that must be respected by all servers and clients in a Hadoop installation, for instance, the location of the namenode and the jobtracker.
+
+ More details on configuration can be found on the HowToConfigure page.

= Starting Hadoop using Hadoop scripts =
This section explains how to set up a Hadoop cluster running Hadoop DFS and Hadoop Mapreduce. The startup scripts are in hadoop/bin. The file that contains all the slave nodes that would join the DFS and map reduce cluster is the slaves file in hadoop/conf. Edit the slaves file to add nodes to your cluster. You need to edit the slaves file only on the machines you plan to run the Jobtracker and Namenode on. In case you want to run a single node cluster you do not have to edit the slaves file. Next edit the file hadoop-env.sh in the hadoop/conf directory. Make sure JAVA_HOME is set correctly. You can change the other environment variables as per your requirements. HADOOP_HOME is automatically determined depending on where you run your hadoop scripts from.