You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by is...@apache.org on 2013/11/21 11:33:34 UTC

svn commit: r1544091 - /mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext

Author: isabel
Date: Thu Nov 21 10:33:33 2013
New Revision: 1544091

URL: http://svn.apache.org/r1544091
Log:
MAHOUT-1245 - reformat cluster dumper page

Modified:
    mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext?rev=1544091&r1=1544090&r2=1544091&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext Thu Nov 21 10:33:33 2013
@@ -1,6 +1,7 @@
 Title: Cluster Dumper
+
 <a name="ClusterDumper-Introduction"></a>
-# Introduction
+# Cluster Dumper - Introduction
 
 Clustering tasks in Mahout will output data in the format of a SequenceFile
 (Text, Cluster) and the Text is a cluster identifier string. To analyze
@@ -12,11 +13,12 @@ format and this is achieved using the cl
 
 After you've executed a clustering tasks (either examples or real-world),
 you can run clusterdumper in 2 modes.
-1. [Hadoop Environment](-#hadoopenvironment.html)
-1. [Standalone Java Program ](-#standalonejavaprogram.html)
+
+1. [Hadoop Environment](#hadoopenvironment.html)
+1. [Standalone Java Program ](#standalonejavaprogram.html)
 
 <a name="ClusterDumper-HadoopEnvironment{anchor:HadoopEnvironment}"></a>
-### Hadoop Environment {anchor:HadoopEnvironment}
+### Hadoop Environment
 
 If you have setup your HADOOP_HOME environment variable, you can use the
 command line utility "mahout" to execute the ClusterDumper on Hadoop. In
@@ -27,21 +29,19 @@ executed the [synthetic control example 
  and want to analyze the output, you can execute
 
     
-    h3. Standalone Java Program {anchor:StandaloneJavaProgram}
+### Standalone Java Program {anchor:StandaloneJavaProgram}
     
-    ClusterDumper can be run using CLI. If your HADOOP_HOME environment
+ClusterDumper can be run using CLI. If your HADOOP_HOME environment
 variable is not set, you can execute ClusterDumper using "mahout" command
 line utility.
-    # get the output data from hadoop into your local machine. For example, in
+
+Get the output data from hadoop into your local machine. For example, in
 the case where you've executed a clustering example use
 
 This will create a folder called output inside your $MAHOUT_HOME/examples
 and will have sub-folders for each cluster outputs and ClusteredPoints
-1. Run the clusterdump utility as follows
 
-    h5. Standalone Java Program through Eclipse
-    If you are using eclipse, setup mahout-utils as a project as specified in [Working with Maven in Eclipse|BuildingMahout#mahout_maven_eclipse]
-.
+Run the clusterdump utility as follows as a standalone Java Program through Eclipse - if you are using eclipse, setup mahout-utils as a project as specified in [Working with Maven in Eclipse](../developers/buildingmahout.html).
     To execute ClusterDumper.java,
     
     * Under mahout-utils, Right-Click on ClusterDumper.java
@@ -49,20 +49,22 @@ and will have sub-folders for each clust
     * On the left menu, click on Java Application
     * On the top-bar click on "New Launch Configuration"
     * A new launch should be automatically created with project as
-"mahout-utils" and Main Class as
-"org.apache.mahout.utils.clustering.ClusterDumper"
+
+    "mahout-utils" and Main Class as "org.apache.mahout.utils.clustering.ClusterDumper"
+
     * In the arguments tab, specify the below arguments
-    \--seqFileDir <MAHOUT_HOME>/examples/output/clusters-10 \--pointsDir
-<MAHOUT_HOME>/examples/output/clusteredPoints \--output
-<MAHOUT_HOME>/examples/output/clusteranalyze.txt
+
+    --seqFileDir <MAHOUT_HOME>/examples/output/clusters-10 
+    --pointsDir <MAHOUT_HOME>/examples/output/clusteredPoints 
+    --output <MAHOUT_HOME>/examples/output/clusteranalyze.txt
+
     replace <MAHOUT_HOME> with the actual path of your $MAHOUT_HOME
-    * Hit run to execute the ClusterDumper using Eclipse.
-    Setting breakpoints etc should just work fine.
+
+    * Hit run to execute the ClusterDumper using Eclipse. Setting breakpoints etc should just work fine.
     
-    h3. Reading the output file
+Reading the output file
     
-    This will output the clusters into a file called clusteranalyze.txt inside
-$MAHOUT_HOME/examples/output
+    This will output the clusters into a file called clusteranalyze.txt inside $MAHOUT_HOME/examples/output
     Sample data will look like
 
 CL-0 { n=116 c=[29.922, 30.407, 30.373, 30.094, 29.886, 29.937, 29.751, 30.054, 30.039, 30.126, 29.764, 29.835, 30.503, 29.876, 29.990, 29.605, 29.379, 30.120, 29.882, 30.161, 29.825, 30.074, 30.001, 30.421, 29.867, 29.736, 29.760, 30.192, 30.134, 30.082, 29.962, 29.512, 29.736, 29.594, 29.493, 29.761, 29.183, 29.517, 29.273, 29.161, 29.215, 29.731, 29.154, 29.113, 29.348, 28.981, 29.543, 29.192, 29.479, 29.406, 29.715, 29.344, 29.628, 29.074, 29.347, 29.812, 29.058, 29.177, 29.063, 29.607](29.922,-30.407,-30.373,-30.094,-29.886,-29.937,-29.751,-30.054,-30.039,-30.126,-29.764,-29.835,-30.503,-29.876,-29.990,-29.605,-29.379,-30.120,-29.882,-30.161,-29.825,-30.074,-30.001,-30.421,-29.867,-29.736,-29.760,-30.192,-30.134,-30.082,-29.962,-29.512,-29.736,-29.594,-29.493,-29.761,-29.183,-29.517,-29.273,-29.161,-29.215,-29.731,-29.154,-29.113,-29.348,-28.981,-29.543,-29.192,-29.479,-29.406,-29.715,-29.344,-29.628,-29.074,-29.347,-29.812,-29.058,-29.177,-29.063,-29.607.html)
@@ -74,6 +76,7 @@ CL-0 { n=116 c=[29.922, 30.407, 30.373, 
 4.672, 4.577, 5.035, 5.241, 4.731, 4.688, 4.685, 4.657, 4.912, 4.300] }
 
     and on...
+
     where CL-0 is the Cluster 0 and n=116 refers to the number of points observed by this cluster and c = \[29.922 ...\]
  refers to the center of Cluster as a vector and r = \[3.463 ..\] refers to
-the radius of the cluster as a vector.
+the radius of the cluster as a vector.
\ No newline at end of file