You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by is...@apache.org on 2013/11/21 11:33:34 UTC
svn commit: r1544091 -
/mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext
Author: isabel
Date: Thu Nov 21 10:33:33 2013
New Revision: 1544091
URL: http://svn.apache.org/r1544091
Log:
MAHOUT-1245 - reformat cluster dumper page
Modified:
mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext
Modified: mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext?rev=1544091&r1=1544090&r2=1544091&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/clustering/cluster-dumper.mdtext Thu Nov 21 10:33:33 2013
@@ -1,6 +1,7 @@
Title: Cluster Dumper
+
<a name="ClusterDumper-Introduction"></a>
-# Introduction
+# Cluster Dumper - Introduction
Clustering tasks in Mahout will output data in the format of a SequenceFile
(Text, Cluster) and the Text is a cluster identifier string. To analyze
@@ -12,11 +13,12 @@ format and this is achieved using the cl
After you've executed a clustering tasks (either examples or real-world),
you can run clusterdumper in 2 modes.
-1. [Hadoop Environment](-#hadoopenvironment.html)
-1. [Standalone Java Program ](-#standalonejavaprogram.html)
+
+1. [Hadoop Environment](#hadoopenvironment.html)
+1. [Standalone Java Program ](#standalonejavaprogram.html)
<a name="ClusterDumper-HadoopEnvironment{anchor:HadoopEnvironment}"></a>
-### Hadoop Environment {anchor:HadoopEnvironment}
+### Hadoop Environment
If you have setup your HADOOP_HOME environment variable, you can use the
command line utility "mahout" to execute the ClusterDumper on Hadoop. In
@@ -27,21 +29,19 @@ executed the [synthetic control example
and want to analyze the output, you can execute
- h3. Standalone Java Program {anchor:StandaloneJavaProgram}
+### Standalone Java Program {anchor:StandaloneJavaProgram}
- ClusterDumper can be run using CLI. If your HADOOP_HOME environment
+ClusterDumper can be run using CLI. If your HADOOP_HOME environment
variable is not set, you can execute ClusterDumper using "mahout" command
line utility.
- # get the output data from hadoop into your local machine. For example, in
+
+Get the output data from hadoop into your local machine. For example, in
the case where you've executed a clustering example use
This will create a folder called output inside your $MAHOUT_HOME/examples
and will have sub-folders for each cluster outputs and ClusteredPoints
-1. Run the clusterdump utility as follows
- h5. Standalone Java Program through Eclipse
- If you are using eclipse, setup mahout-utils as a project as specified in [Working with Maven in Eclipse|BuildingMahout#mahout_maven_eclipse]
-.
+Run the clusterdump utility as follows as a standalone Java Program through Eclipse - if you are using eclipse, setup mahout-utils as a project as specified in [Working with Maven in Eclipse](../developers/buildingmahout.html).
To execute ClusterDumper.java,
* Under mahout-utils, Right-Click on ClusterDumper.java
@@ -49,20 +49,22 @@ and will have sub-folders for each clust
* On the left menu, click on Java Application
* On the top-bar click on "New Launch Configuration"
* A new launch should be automatically created with project as
-"mahout-utils" and Main Class as
-"org.apache.mahout.utils.clustering.ClusterDumper"
+
+ "mahout-utils" and Main Class as "org.apache.mahout.utils.clustering.ClusterDumper"
+
* In the arguments tab, specify the below arguments
- \--seqFileDir <MAHOUT_HOME>/examples/output/clusters-10 \--pointsDir
-<MAHOUT_HOME>/examples/output/clusteredPoints \--output
-<MAHOUT_HOME>/examples/output/clusteranalyze.txt
+
+ --seqFileDir <MAHOUT_HOME>/examples/output/clusters-10
+ --pointsDir <MAHOUT_HOME>/examples/output/clusteredPoints
+ --output <MAHOUT_HOME>/examples/output/clusteranalyze.txt
+
replace <MAHOUT_HOME> with the actual path of your $MAHOUT_HOME
- * Hit run to execute the ClusterDumper using Eclipse.
- Setting breakpoints etc should just work fine.
+
+ * Hit run to execute the ClusterDumper using Eclipse. Setting breakpoints etc should just work fine.
- h3. Reading the output file
+Reading the output file
- This will output the clusters into a file called clusteranalyze.txt inside
-$MAHOUT_HOME/examples/output
+ This will output the clusters into a file called clusteranalyze.txt inside $MAHOUT_HOME/examples/output
Sample data will look like
CL-0 { n=116 c=[29.922, 30.407, 30.373, 30.094, 29.886, 29.937, 29.751, 30.054, 30.039, 30.126, 29.764, 29.835, 30.503, 29.876, 29.990, 29.605, 29.379, 30.120, 29.882, 30.161, 29.825, 30.074, 30.001, 30.421, 29.867, 29.736, 29.760, 30.192, 30.134, 30.082, 29.962, 29.512, 29.736, 29.594, 29.493, 29.761, 29.183, 29.517, 29.273, 29.161, 29.215, 29.731, 29.154, 29.113, 29.348, 28.981, 29.543, 29.192, 29.479, 29.406, 29.715, 29.344, 29.628, 29.074, 29.347, 29.812, 29.058, 29.177, 29.063, 29.607](29.922,-30.407,-30.373,-30.094,-29.886,-29.937,-29.751,-30.054,-30.039,-30.126,-29.764,-29.835,-30.503,-29.876,-29.990,-29.605,-29.379,-30.120,-29.882,-30.161,-29.825,-30.074,-30.001,-30.421,-29.867,-29.736,-29.760,-30.192,-30.134,-30.082,-29.962,-29.512,-29.736,-29.594,-29.493,-29.761,-29.183,-29.517,-29.273,-29.161,-29.215,-29.731,-29.154,-29.113,-29.348,-28.981,-29.543,-29.192,-29.479,-29.406,-29.715,-29.344,-29.628,-29.074,-29.347,-29.812,-29.058,-29.177,-29.063,-29.607.html)
@@ -74,6 +76,7 @@ CL-0 { n=116 c=[29.922, 30.407, 30.373,
4.672, 4.577, 5.035, 5.241, 4.731, 4.688, 4.685, 4.657, 4.912, 4.300] }
and on...
+
where CL-0 is the Cluster 0 and n=116 refers to the number of points observed by this cluster and c = \[29.922 ...\]
refers to the center of Cluster as a vector and r = \[3.463 ..\] refers to
-the radius of the cluster as a vector.
+the radius of the cluster as a vector.
\ No newline at end of file