You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by bu...@apache.org on 2013/11/21 12:38:46 UTC

svn commit: r887513 - in /websites/staging/mahout/trunk/content: ./ users/emr/mahout-on-amazon-ec2.html

Author: buildbot
Date: Thu Nov 21 11:38:45 2013
New Revision: 887513

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/emr/mahout-on-amazon-ec2.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Nov 21 11:38:45 2013
@@ -1 +1 @@
-1544132
+1544133

Modified: websites/staging/mahout/trunk/content/users/emr/mahout-on-amazon-ec2.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/emr/mahout-on-amazon-ec2.html (original)
+++ websites/staging/mahout/trunk/content/users/emr/mahout-on-amazon-ec2.html Thu Nov 21 11:38:45 2013
@@ -381,7 +381,8 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <p>Amazon EC2 is a compute-on-demand platform sold by Amazon.com that allows
+    <h1 id="mahout-on-ec2">Mahout on EC2</h1>
+<p>Amazon EC2 is a compute-on-demand platform sold by Amazon.com that allows
 users to purchase one or more host machines on an hourly basis and execute
 applications.  Since Hadoop can run on EC2, it is also possible to run
 Mahout on EC2.  The following sections will detail how to create a Hadoop
@@ -415,15 +416,14 @@ DNS name></p>
 </li>
 <li>
 <p>In the root home directory evaluate:</p>
-<h1 id="apt-get-update">apt-get update</h1>
-<h1 id="apt-get-upgrade-this-is-optional-but-probably-advisable-since-the">apt-get upgrade  // This is optional, but probably advisable since the</h1>
-<p>AMI is over a year old.</p>
-<h1 id="apt-get-install-python-setuptools">apt-get install python-setuptools</h1>
-<h1 id="easy_install-simplejson209">easy_install "simplejson==2.0.9"</h1>
-<h1 id="easy_install-boto18d">easy_install "boto==1.8d"</h1>
-<h1 id="apt-get-install-ant">apt-get install ant</h1>
-<h1 id="apt-get-install-subversion">apt-get install subversion</h1>
-<h1 id="apt-get-install-maven2">apt-get install maven2</h1>
+<p>apt-get update
+apt-get upgrade  // This is optional, but probably advisable since the AMI is over a year old.
+apt-get install python-setuptools
+easy_install "simplejson==2.0.9"
+easy_install "boto==1.8d"
+apt-get install ant
+apt-get install subversion
+apt-get install maven2</p>
 </li>
 <li>
 <p>Add the following to your .profile</p>
@@ -441,18 +441,18 @@ available on the Hadoop site. You can do
 <p>scp -i <gsg-keypair.pem>  <where>/hadoop-0.20.2.tar.gz root@<instance
 public DNS name>:.</p>
 </blockquote>
-<h1 id="tar-xzf-hadoop-0202targz">tar -xzf hadoop-0.20.2.tar.gz</h1>
-<h1 id="mv-hadoop-0202-usrlocal">mv hadoop-0.20.2 /usr/local/.</h1>
+<p>tar -xzf hadoop-0.20.2.tar.gz
+mv hadoop-0.20.2 /usr/local/.</p>
 </li>
 <li>
 <p>Configure Hadoop for temporary single node operation</p>
 </li>
 <li>
-<h1 id="add-the-following-to-hadoop_homeconfhadoop-envsh">add the following to $HADOOP_HOME/conf/hadoop-env.sh</h1>
-<h1 id="the-java-implementation-to-use-required">The java implementation to use.  Required.</h1>
-<p>export JAVA_HOME=/usr/lib/jvm/java-6-sun</p>
-<h1 id="the-maximum-amount-of-heap-to-use-in-mb-default-is-1000">The maximum amount of heap to use, in MB. Default is 1000.</h1>
-<p>export HADOOP_HEAPSIZE=2000</p>
+<p>add the following to $HADOOP_HOME/conf/hadoop-env.sh</p>
+<p>// The java implementation to use.  Required.
+export JAVA_HOME=/usr/lib/jvm/java-6-sun</p>
+<p>// The maximum amount of heap to use, in MB. Default is 1000.
+export HADOOP_HEAPSIZE=2000</p>
 </li>
 <li>
 <h1 id="add-the-following-to-hadoop_homeconfcore-sitexml-and-also">add the following to $HADOOP_HOME/conf/core-site.xml and also</h1>
@@ -475,33 +475,32 @@ public DNS name>:.</p>
 </configuration></p>
 </li>
 <li>
-<h1 id="set-up-authorized-keys-for-localhost-login-wo-passwords-and-format-your">set up authorized keys for localhost login w/o passwords and format your</h1>
-<p>name node</p>
-<h1 id="ssh-keygen-t-dsa-p-f-sshid_dsa">ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa</h1>
-<h1 id="cat-sshid_dsapub-sshauthorized_keys">cat ~/.ssh/id_dsa.pub &gt;&gt; ~/.ssh/authorized_keys</h1>
-<h1 id="hadoop_homebinhadoop-namenode-format">$HADOOP_HOME/bin/hadoop namenode -format</h1>
+<p>set up authorized keys for localhost login w/o passwords and format your
+name node</p>
+<p>ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
+cat ~/.ssh/id_dsa.pub &gt;&gt; ~/.ssh/authorized_keys
+$HADOOP_HOME/bin/hadoop namenode -format</p>
 </li>
 <li>
 <p>Checkout and build Mahout from trunk. Alternatively, you can upload a
 Mahout release tarball and install it as we did with the Hadoop tarball
 (Don't forget to update your .profile accordingly).</p>
-<h1 id="svn-co-httpsvnapacheorgreposasfmahouttrunk-mahout">svn co http://svn.apache.org/repos/asf/mahout/trunk mahout</h1>
-<h1 id="cd-mahout">cd mahout</h1>
-<h1 id="mvn-clean-install">mvn clean install</h1>
-<h1 id="cd">cd ..</h1>
-<h1 id="mv-mahout-usrlocalmahout-04">mv mahout /usr/local/mahout-0.4</h1>
+<p>svn co http://svn.apache.org/repos/asf/mahout/trunk mahout 
+cd mahout
+mvn clean install
+cd ..
+mv mahout /usr/local/mahout-0.4</p>
 </li>
 <li>
 <p>Run Hadoop, just to prove you can, and test Mahout by building the
 Reuters dataset on it. Finally, delete the files and shut it down.</p>
-<h1 id="hadoop_homebinhadoop-namenode-format_1">$HADOOP_HOME/bin/hadoop namenode -format</h1>
-<h1 id="hadoop_homebinstart-allsh">$HADOOP_HOME/bin/start-all.sh</h1>
-<h1 id="jps-you-should-see-all-5-hadoop-processes-namenode">jps     // you should see all 5 Hadoop processes (NameNode,</h1>
-<p>SecondaryNameNode, DataNode, JobTracker, TaskTracker)</p>
-<h1 id="cd-mahout_home">cd $MAHOUT_HOME</h1>
-<h1 id="examplesbinbuild-reuterssh">./examples/bin/build-reuters.sh</h1>
-<h1 id="hadoop_homebinstop-allsh">$HADOOP_HOME/bin/stop-all.sh</h1>
-<h1 id="rm-rf-tmp-delete-the-hadoop-files">rm -rf /tmp/*           // delete the Hadoop files</h1>
+<p>$HADOOP_HOME/bin/hadoop namenode -format
+$HADOOP_HOME/bin/start-all.sh
+jps   // you should see all 5 Hadoop processes (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker)
+cd $MAHOUT_HOME
+./examples/bin/build-reuters.sh</p>
+<p>$HADOOP_HOME/bin/stop-all.sh
+rm -rf /tmp/*         // delete the Hadoop files</p>
 </li>
 <li>
 <p>Remove the single-host stuff you added to $HADOOP_HOME/conf/core-site.xml
@@ -526,18 +525,18 @@ instance (you don't want to leave these 
 <p>scp -i <gsg-keypair.pem> <your AWS cert directory>/*.pem root@<instance
 public DNS name>:/mnt/.</p>
 </blockquote>
-<h1 id="note-that-ec2-bundle-vol-may-fail-if-ec2_home-is-set-so-you-may-want-to">Note that ec2-bundle-vol may fail if EC2_HOME is set.  So you may want to</h1>
-<p>temporarily unset EC2_HOME before running the bundle command.  However the
-shell will need to have the correct value of EC2_HOME set before running
-the ec2-register step.</p>
-<h1 id="ec2-bundle-vol-k-mntpkpem-c-mntcertpem-u-wzxhzdk10-d">ec2-bundle-vol -k /mnt/pk<em>.pem -c /mnt/cert</em>.pem -u <your-AWS-user_id> -d</h1>
-<p>/mnt -p mahout</p>
-<h1 id="ec2-upload-bundle-b-wzxhzdk11-m-mntmahoutmanifestxml-a">ec2-upload-bundle -b <your-s3-bucket> -m /mnt/mahout.manifest.xml -a</h1>
-<p><your-AWS-access_key> -s <your-AWS-secret_key> </p>
-<h1 id="ec2-register-k-mntpk-pem-c-mntcert-pem">ec2-register -K /mnt/pk-<em>.pem -C /mnt/cert-</em>.pem</h1>
-<p><your-s3-bucket>/mahout.manifest.xml</p>
 </li>
 </ol>
+<p>Note that ec2-bundle-vol may fail if EC2_HOME is set.  So you may want to
+temporarily unset EC2_HOME before running the bundle command.  However the
+shell will need to have the correct value of EC2_HOME set before running
+the ec2-register step.</p>
+<div class="codehilite"><pre><span class="n">ec2</span><span class="o">-</span><span class="n">bundle</span><span class="o">-</span><span class="n">vol</span> <span class="o">-</span><span class="n">k</span> <span class="o">/</span><span class="n">mnt</span><span class="o">/</span><span class="n">pk</span><span class="o">*</span><span class="p">.</span><span class="n">pem</span> <span class="o">-</span><span class="n">c</span> <span class="o">/</span><span class="n">mnt</span><span class="o">/</span><span class="n">cert</span><span class="o">*</span><span class="p">.</span><span class="n">pem</span> <span class="o">-</span><span class="n">u</span> <span class="o">&lt;</span><span class="n">your</span><span class="o">-</span><span class="n">AWS</span><span class="o">-</span><span class="n">user_id</span><span class="o">&gt;</span> <span class="o">-</span><span class="n">d</span> <span class="o">/</span><span class="n">mnt</span> <span class="o">-</span><span class="n">p</span> <span 
 class="n">mahout</span>
+<span class="n">ec2</span><span class="o">-</span><span class="n">upload</span><span class="o">-</span><span class="n">bundle</span> <span class="o">-</span><span class="n">b</span> <span class="o">&lt;</span><span class="n">your</span><span class="o">-</span><span class="n">s3</span><span class="o">-</span><span class="n">bucket</span><span class="o">&gt;</span> <span class="o">-</span><span class="n">m</span> <span class="o">/</span><span class="n">mnt</span><span class="o">/</span><span class="n">mahout</span><span class="p">.</span><span class="n">manifest</span><span class="p">.</span><span class="n">xml</span> <span class="o">-</span><span class="n">a</span> <span class="o">&lt;</span><span class="n">your</span><span class="o">-</span><span class="n">AWS</span><span class="o">-</span><span class="n">access_key</span><span class="o">&gt;</span> <span class="o">-</span><span class="n">s</span> <span class="o">&lt;</span><span class="n">your</span><span class="o">-</span><span cl
 ass="n">AWS</span><span class="o">-</span><span class="n">secret_key</span><span class="o">&gt;</span> 
+<span class="n">ec2</span><span class="o">-</span><span class="n">register</span> <span class="o">-</span><span class="n">K</span> <span class="o">/</span><span class="n">mnt</span><span class="o">/</span><span class="n">pk</span><span class="o">-*</span><span class="p">.</span><span class="n">pem</span> <span class="o">-</span><span class="n">C</span> <span class="o">/</span><span class="n">mnt</span><span class="o">/</span><span class="n">cert</span><span class="o">-*</span><span class="p">.</span><span class="n">pem</span> <span class="o">&lt;</span><span class="n">your</span><span class="o">-</span><span class="n">s3</span><span class="o">-</span><span class="n">bucket</span><span class="o">&gt;/</span><span class="n">mahout</span><span class="p">.</span><span class="n">manifest</span><span class="p">.</span><span class="n">xml</span>
+</pre></div>
+
+
 <p><a name="MahoutonAmazonEC2-GettingStarted"></a></p>
 <h1 id="getting-started">Getting Started</h1>
 <ol>
@@ -547,14 +546,13 @@ single instance of your image. Once this
 connect to it and test it by re-running the test code.  If you removed the
 single host configuration added in step 6(b) above, you will need to re-add
 it before you can run this test.  To test run (again):</p>
-<h1 id="hadoop_homebinhadoop-namenode-format_2">$HADOOP_HOME/bin/hadoop namenode -format</h1>
-<h1 id="hadoop_homebinstart-allsh_1">$HADOOP_HOME/bin/start-all.sh</h1>
-<h1 id="jps-you-should-see-all-5-hadoop-processes-namenode_1">jps     // you should see all 5 Hadoop processes (NameNode,</h1>
-<p>SecondaryNameNode, DataNode, JobTracker, TaskTracker)</p>
-<h1 id="cd-mahout_home_1">cd $MAHOUT_HOME</h1>
-<h1 id="examplesbinbuild-reuterssh_1">./examples/bin/build-reuters.sh</h1>
-<h1 id="hadoop_homebinstop-allsh_1">$HADOOP_HOME/bin/stop-all.sh</h1>
-<h1 id="rm-rf-tmp-delete-the-hadoop-files_1">rm -rf /tmp/*           // delete the Hadoop files</h1>
+<p>$HADOOP_HOME/bin/hadoop namenode -format
+$HADOOP_HOME/bin/start-all.sh
+jps   // you should see all 5 Hadoop processes (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker)
+cd $MAHOUT_HOME
+./examples/bin/build-reuters.sh</p>
+<p>$HADOOP_HOME/bin/stop-all.sh
+rm -rf /tmp/*         // delete the Hadoop files</p>
 </li>
 <li>
 <p>Now that you have a working Mahout-ready AMI, follow <a href="http://wiki.apache.org/hadoop/AmazonEC2">Hadoop's instructions</a>
@@ -569,31 +567,26 @@ S3_BUCKET
 (and perhaps others depending upon your environment)</p>
 </li>
 <li>
-<h1 id="edit-binlaunch-hadoop-master-and-binlaunch-hadoop-slaves-setting">edit bin/launch-hadoop-master and bin/launch-hadoop-slaves, setting:</h1>
+<p>edit bin/launch-hadoop-master and bin/launch-hadoop-slaves, setting:</p>
 <p>AMI_IMAGE</p>
 </li>
 <li>
-<h1 id="finally-launch-your-cluster-and-log-in">finally, launch your cluster and log in</h1>
-<blockquote>
+<p>finally, launch your cluster and log in</p>
 <p>bin/hadoop-ec2 launch-cluster test-cluster 2
-bin/hadoop-ec2 login test-cluster</p>
-</blockquote>
-<h1 id="_1">...</h1>
-<h1 id="exit">exit</h1>
-<blockquote>
-<p>bin/hadoop-ec2 terminate-cluster test-cluster     // when you are done
-with it</p>
-</blockquote>
+bin/hadoop-ec2 login test-cluster
+...<br />
+exit
+bin/hadoop-ec2 terminate-cluster test-cluster     // when you are done with it</p>
 </li>
 </ol>
 <p><a name="MahoutonAmazonEC2-RunningtheExamples"></a></p>
-<h1 id="running-the-examples">Running the Examples</h1>
+<h2 id="running-the-examples">Running the Examples</h2>
 <ol>
 <li>
 <p>Submit the Reuters test job</p>
-<h1 id="cd-mahout_home_2">cd $MAHOUT_HOME</h1>
-<h1 id="examplesbinbuild-reuterssh_2">./examples/bin/build-reuters.sh</h1>
-<p>// the warnings about configuration files do not seem to matter</p>
+<p>cd $MAHOUT_HOME
+./examples/bin/build-reuters.sh
+// the warnings about configuration files do not seem to matter</p>
 </li>
 <li>
 <p>See the Mahout <a href="quickstart.html">Quickstart</a>