You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by bu...@apache.org on 2013/11/21 12:23:35 UTC
svn commit: r887503 - in /websites/staging/mahout/trunk/content: ./ users/clustering/clustering-of-synthetic-control-data.html

Author: buildbot
Date: Thu Nov 21 11:23:35 2013
New Revision: 887503

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/clustering/clustering-of-synthetic-control-data.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Nov 21 11:23:35 2013
@@ -1 +1 @@
-1544122
+1544124

Modified: websites/staging/mahout/trunk/content/users/clustering/clustering-of-synthetic-control-data.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/clustering/clustering-of-synthetic-control-data.html (original)
+++ websites/staging/mahout/trunk/content/users/clustering/clustering-of-synthetic-control-data.html Thu Nov 21 11:23:35 2013
@@ -381,7 +381,8 @@
 
   <div id="content-wrap" class="clearfix">
    <div id="main">
-    <ul>
+    <h1 id="example-synthetic-control-data">Example: Synthetic control data</h1>
+<ul>
 <li><a href="#Clusteringofsyntheticcontroldata-Introduction">Introduction</a></li>
 <li><a href="#Clusteringofsyntheticcontroldata-Problemdescription">Problem description</a></li>
 <li><a href="#Clusteringofsyntheticcontroldata-Pre-Prep">Pre-Prep</a></li>
@@ -395,7 +396,7 @@ time series. <a href="http://en.wikipedi
  are tools used to determine whether or not a manufacturing or business
 process is in a state of statistical control. Such control charts are
 generated / simulated over equal time interval and available for use in UCI
-machine learning database. The data is described [here |http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data.html]
+machine learning database. The data is described <a href="http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data.html">here</a>
 .</p>
 <p><a name="Clusteringofsyntheticcontroldata-Problemdescription"></a></p>
 <h1 id="problem-description">Problem description</h1>
@@ -425,28 +426,33 @@ Normal data. Rows from 101 - 200 contain
 <tr><td> 35.5351 </td><td> 41.7067 </td><td> 39.1705 </td><td> 48.3964 </td><td> .. </td><td> 38.6103 </td></tr>
 <tr><td> 24.2104 </td><td> 41.7679 </td><td> 45.2228 </td><td> 43.7762 </td><td> .. </td><td> 48.8175 </td></tr>
 ..
-..
-1. Setup Hadoop
-1. # Assuming that you have installed the latest compatible Hadooop, start
+..</p>
+<ol>
+<li>Setup Hadoop</li>
+<li>
+<h1 id="assuming-that-you-have-installed-the-latest-compatible-hadooop-start">Assuming that you have installed the latest compatible Hadooop, start</h1>
 the daemons using {code}$HADOOP_HOME/bin/start-all.sh {code} If you have
-issues starting Hadoop, please reference the <a href="http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html">Hadoop quick start guide</a>
-1. # Copy the input to HDFS using </p>
-<div class="codehilite"><pre>$<span class="n">HADOOP_HOME</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">hadoop</span> <span class="n">fs</span> <span class="o">-</span><span class="n">mkdir</span> <span class="n">testdata</span>
-$<span class="n">HADOOP_HOME</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">hadoop</span> <span class="n">fs</span> <span class="o">-</span><span class="n">put</span> <span class="o">&lt;</span><span class="n">PATH</span> <span class="n">TO</span> <span class="n">synthetic_control</span><span class="p">.</span><span class="n">data</span><span class="o">&gt;</span> <span class="n">testdata</span>
-</pre></div>
-
-
-<p>(HDFS input directory name should be testdata)
-1. Mahout Example job
+issues starting Hadoop, please reference the <a href="http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html">Hadoop quick start guide</a></li>
+<li>
+<h1 id="copy-the-input-to-hdfs-using">Copy the input to HDFS using</h1>
+<p>$HADOOP_HOME/bin/hadoop fs -mkdir testdata
+$HADOOP_HOME/bin/hadoop fs -put <PATH TO synthetic_control.data> testdata</p>
+</li>
+</ol>
+<p>(HDFS input directory name should be testdata)</p>
+<ol>
+<li>Mahout Example job
 Mahout's mahout-examples-$MAHOUT_VERSION.job does the actual clustering
-task and so it needs to be created. This can be done as
-1. # cd $MAHOUT_HOME
-1. # </p>
-<div class="codehilite"><pre><span class="n">mvn</span> <span class="n">clean</span> <span class="n">install</span>          <span class="o">//</span> <span class="n">full</span> <span class="n">build</span> <span class="n">including</span> <span class="n">all</span> <span class="n">unit</span> <span class="n">tests</span>
-<span class="n">mvn</span> <span class="n">clean</span> <span class="n">install</span> <span class="o">-</span><span class="n">DskipTests</span><span class="p">=</span><span class="n">true</span> <span class="o">//</span> <span class="n">fast</span> <span class="n">build</span> <span class="n">without</span> <span class="n">running</span> <span class="n">unit</span> <span class="n">tests</span>
-</pre></div>
-
-
+task and so it needs to be created. This can be done as</li>
+<li>
+<h1 id="cd-mahout_home">cd $MAHOUT_HOME</h1>
+</li>
+<li>
+<h1></h1>
+<p>mvn clean install          // full build including all unit tests
+mvn clean install -DskipTests=true // fast build without running unit tests</p>
+</li>
+</ol>
 <p>You will see BUILD SUCCESSFUL once all the corresponding tasks are through.
 The job will be generated in $MAHOUT_HOME/examples/target/ and it's name
 will contain the $MAHOUT_VERSION number. For example, when using Mahout 0.4
@@ -455,49 +461,40 @@ This completes the pre-requisites to per
 Mahout.</p>
 <p><a name="Clusteringofsyntheticcontroldata-PerformClustering"></a></p>
 <h1 id="perform-clustering">Perform Clustering</h1>
-<p>With all the pre-work done, clustering the control data gets real simple.
-1. Depending on which clustering technique to use, you can invoke the
-corresponding job as below
-1. # For <a href="canopy-clustering.html">canopy </a>
-:</p>
-<div class="codehilite"><pre><span class="c">## For [kmeans |K-Means Clustering]</span>
-</pre></div>
-
-
-<p>:</p>
+<p>With all the pre-work done, clustering the control data gets real simple.</p>
 <ol>
+<li>Depending on which clustering technique to use, you can invoke the
+corresponding job as below</li>
+<li>For <a href="canopy-clustering.html">canopy </a></li>
+<li>For <a href="K-Means%20Clustering">kmeans</a></li>
+<li>For <a href="fuzzy-k-means.html">fuzzykmeans </a></li>
+<li>For <a href="Dirichlet%20Process%20Clustering">dirichlet</a></li>
 <li>
-<h1 id="for-fuzzykmeans">For <a href="fuzzy-k-means.html">fuzzykmeans </a></h1>
-:<h2 id="for-dirichlet-dirichlet-process-clustering">For [dirichlet |Dirichlet Process Clustering]</h2>
-<p>:</p>
+<p>For <a href="mean-shift-clustering.html">meanshift</a> respectively:</p>
+<p>$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.${clustering.type}.Job</p>
 </li>
 <li>
-<h1 id="for-meanshift">For <a href="mean-shift-clustering.html">meanshift </a></h1>
-<dl>
-<dd>{code}  $MAHOUT_HOME/bin/mahout
-org.apache.mahout.clustering.syntheticcontrol.meanshift.Job {code}</dd>
-</dl>
-</li>
-<li>Get the data out of HDFS{footnote}See <a href="-http://hadoop.apache.org/core/docs/current/hdfs_shell.html.html">HDFS Shell </a>
-{footnote}{footnote}The output directory is cleared when a new run starts
+<p>Get the data out of HDFS (see <a href="http://hadoop.apache.org/core/docs/current/hdfs_shell.html.html">HDFS Shell</a>
+The output directory is cleared when a new run starts
 so the results must be retrieved before a new run{footnote} and have a
-look{footnote}All jobs run ClusterDump after clustering with output data
-sent to the console{footnote} by following the below steps:</li>
+look. All jobs run ClusterDump after clustering with output data
+sent to the console by following the below steps.</p>
+</li>
 </ol>
 <p><a name="Clusteringofsyntheticcontroldata-Read/AnalyzeOutput"></a></p>
 <h1 id="read-analyze-output">Read / Analyze Output</h1>
 <p>In order to read/analyze the output, you can use <a href="cluster-dumper.html">clusterdump</a>
  utility provided by Mahout. If you want to just read the output, follow
-the below steps. 
-1. Use {code}$HADOOP_HOME/bin/hadoop fs -lsr output {code}to view all
-outputs.
-1. Use {code}$HADOOP_HOME/bin/hadoop fs -get output $MAHOUT_HOME/examples
-{code} to copy them all to your local machine and the output data points
+the below steps. </p>
+<ol>
+<li>Use <code>$HADOOP_HOME/bin/hadoop fs -lsr output</code> to view all
+outputs.</li>
+<li>Use <code>$HADOOP_HOME/bin/hadoop fs -get output $MAHOUT_HOME/examples</code> to copy them all to your local machine and the output data points
 are in vector format. This creates an output folder inside examples
-directory.
-1. Computed clusters are contained in <em>output/clusters-i</em>
-1. All result clustered points are placed into <em>output/clusteredPoints</em></p>
-<p>{display-footnotes}</p>
+directory.</li>
+<li>Computed clusters are contained in <em>output/clusters-i</em></li>
+<li>All result clustered points are placed into <em>output/clusteredPoints</em></li>
+</ol>
    </div>
   </div>     
 </div>