You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by mwkhan <wa...@gmail.com> on 2011/03/01 22:03:12 UTC
Cluster Dumper no output not shown
Hi,
First I ran k-means algorithm using the article Introduction to Apache
Mahout with following arguments:
<java classname="org.apache.mahout.clustering.kmeans.KMeansDriver"
fork="true" maxmemory="738m">
<classpath refid="runtime.classpath"/>
<arg value="--input"/>
<arg value="${wiki.dir}/n2/part-full.txt"/>
<arg value="--clusters"/>
<arg value="${wiki.dir}/n2/k-output/clusters-in"/>
<arg value="--k"/>
<arg value="10"/>
<arg value="--output"/>
<arg value="${wiki.dir}/n2/k-output"/>
<arg value="--distance"/>
<arg value="org.apache.mahout.utils.CosineDistanceMeasure"/>
<arg value="--convergence"/>
<arg value="0.01"/>
<arg value="--overwrite"/>
</java>
Now i have the following directories in my "k-output" folder on local
machine: clusters-0,clusters-1,clusters-2,clusters-3,clusters-4,clusters-in
and points
Then when i am trying to run cluster-dumper utility using Standalone Java
program:
$ bin/mahout clusterdump --seqFileDir
/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/clusters-10/
--pointsDir
/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/points/
i got the following output:
no HADOOP_HOME set, running locally
Mar 1, 2011 8:57:49 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--dictionaryType=text, --endPhase=2147483647,
--pointsDir=/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k
-output/points/,
--seqFileDir=/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/clusters-10/,
--startPhase=0, --tempDir=temp}
Mar 1, 2011 8:57:49 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 332 ms
Why i am not getting clustering data as output???
I am running this commands through cygwin installed on windows machine.
--
View this message in context: http://lucene.472066.n3.nabble.com/Cluster-Dumper-no-output-not-shown-tp2606470p2606470.html
Sent from the Mahout User List mailing list archive at Nabble.com.
RE: Cluster Dumper no output not shown
Posted by Jeff Eastman <je...@Narus.com>.
You need to add the -cl (--clustering) option to get your input points classified (clustered) by the clusters in your final clusters-n directory. This output will appear in a "clusteredPoints" directory. (Since this classification step is not always desired and can take a while it is optional). The clusterdumper should then give you the output you are seeking.
-----Original Message-----
From: mwkhan [mailto:wasim.khan@gmail.com]
Sent: Tuesday, March 01, 2011 1:03 PM
To: mahout-user@lucene.apache.org
Subject: Cluster Dumper no output not shown
Hi,
First I ran k-means algorithm using the article Introduction to Apache
Mahout with following arguments:
<java classname="org.apache.mahout.clustering.kmeans.KMeansDriver"
fork="true" maxmemory="738m">
<classpath refid="runtime.classpath"/>
<arg value="--input"/>
<arg value="${wiki.dir}/n2/part-full.txt"/>
<arg value="--clusters"/>
<arg value="${wiki.dir}/n2/k-output/clusters-in"/>
<arg value="--k"/>
<arg value="10"/>
<arg value="--output"/>
<arg value="${wiki.dir}/n2/k-output"/>
<arg value="--distance"/>
<arg value="org.apache.mahout.utils.CosineDistanceMeasure"/>
<arg value="--convergence"/>
<arg value="0.01"/>
<arg value="--overwrite"/>
</java>
Now i have the following directories in my "k-output" folder on local
machine: clusters-0,clusters-1,clusters-2,clusters-3,clusters-4,clusters-in
and points
Then when i am trying to run cluster-dumper utility using Standalone Java
program:
$ bin/mahout clusterdump --seqFileDir
/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/clusters-10/
--pointsDir
/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/points/
i got the following output:
no HADOOP_HOME set, running locally
Mar 1, 2011 8:57:49 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--dictionaryType=text, --endPhase=2147483647,
--pointsDir=/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k
-output/points/,
--seqFileDir=/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/clusters-10/,
--startPhase=0, --tempDir=temp}
Mar 1, 2011 8:57:49 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 332 ms
Why i am not getting clustering data as output???
I am running this commands through cygwin installed on windows machine.
--
View this message in context: http://lucene.472066.n3.nabble.com/Cluster-Dumper-no-output-not-shown-tp2606470p2606470.html
Sent from the Mahout User List mailing list archive at Nabble.com.