You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by mwkhan <wa...@gmail.com> on 2011/03/01 22:03:12 UTC

Cluster Dumper no output not shown

Hi,

First I ran k-means algorithm using the article Introduction to Apache
Mahout with following arguments:

<java classname="org.apache.mahout.clustering.kmeans.KMeansDriver"
          fork="true" maxmemory="738m">
      <classpath refid="runtime.classpath"/>
      <arg value="--input"/>
      <arg value="${wiki.dir}/n2/part-full.txt"/>
      <arg value="--clusters"/>
      <arg value="${wiki.dir}/n2/k-output/clusters-in"/>
      <arg value="--k"/>
      <arg value="10"/>
      <arg value="--output"/>
      <arg value="${wiki.dir}/n2/k-output"/>
      <arg value="--distance"/>
      <arg value="org.apache.mahout.utils.CosineDistanceMeasure"/>
      <arg value="--convergence"/>
      <arg value="0.01"/>
      <arg value="--overwrite"/>
    </java>

Now i have the following directories in my "k-output" folder on local
machine: clusters-0,clusters-1,clusters-2,clusters-3,clusters-4,clusters-in
and points

Then when i am trying to run cluster-dumper utility using Standalone Java
program:

$ bin/mahout clusterdump --seqFileDir
/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/clusters-10/
--pointsDir
/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/points/

i got the following output:

no HADOOP_HOME set, running locally

Mar 1, 2011 8:57:49 PM org.slf4j.impl.JCLLoggerAdapter info

INFO: Command line arguments: {--dictionaryType=text, --endPhase=2147483647,
--pointsDir=/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k
-output/points/,
--seqFileDir=/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/clusters-10/,
--startPhase=0, --tempDir=temp}

Mar 1, 2011 8:57:49 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 332 ms

Why i am not getting clustering data as output???

I am running this commands through cygwin installed on windows machine.



-- 
View this message in context: http://lucene.472066.n3.nabble.com/Cluster-Dumper-no-output-not-shown-tp2606470p2606470.html
Sent from the Mahout User List mailing list archive at Nabble.com.

RE: Cluster Dumper no output not shown

Posted by Jeff Eastman <je...@Narus.com>.
You need to add the -cl (--clustering) option to get your input points classified (clustered) by the clusters in your final clusters-n directory. This output will appear in a "clusteredPoints" directory. (Since this classification step is not always desired and can take a while it is optional). The clusterdumper should then give you the output you are seeking. 

-----Original Message-----
From: mwkhan [mailto:wasim.khan@gmail.com] 
Sent: Tuesday, March 01, 2011 1:03 PM
To: mahout-user@lucene.apache.org
Subject: Cluster Dumper no output not shown

Hi,

First I ran k-means algorithm using the article Introduction to Apache
Mahout with following arguments:

<java classname="org.apache.mahout.clustering.kmeans.KMeansDriver"
          fork="true" maxmemory="738m">
      <classpath refid="runtime.classpath"/>
      <arg value="--input"/>
      <arg value="${wiki.dir}/n2/part-full.txt"/>
      <arg value="--clusters"/>
      <arg value="${wiki.dir}/n2/k-output/clusters-in"/>
      <arg value="--k"/>
      <arg value="10"/>
      <arg value="--output"/>
      <arg value="${wiki.dir}/n2/k-output"/>
      <arg value="--distance"/>
      <arg value="org.apache.mahout.utils.CosineDistanceMeasure"/>
      <arg value="--convergence"/>
      <arg value="0.01"/>
      <arg value="--overwrite"/>
    </java>

Now i have the following directories in my "k-output" folder on local
machine: clusters-0,clusters-1,clusters-2,clusters-3,clusters-4,clusters-in
and points

Then when i am trying to run cluster-dumper utility using Standalone Java
program:

$ bin/mahout clusterdump --seqFileDir
/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/clusters-10/
--pointsDir
/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/points/

i got the following output:

no HADOOP_HOME set, running locally

Mar 1, 2011 8:57:49 PM org.slf4j.impl.JCLLoggerAdapter info

INFO: Command line arguments: {--dictionaryType=text, --endPhase=2147483647,
--pointsDir=/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k
-output/points/,
--seqFileDir=/cygdrive/c/users/wasim/Downloads/apache-mahout-examples/wikipedia/n2/k-output/clusters-10/,
--startPhase=0, --tempDir=temp}

Mar 1, 2011 8:57:49 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 332 ms

Why i am not getting clustering data as output???

I am running this commands through cygwin installed on windows machine.



-- 
View this message in context: http://lucene.472066.n3.nabble.com/Cluster-Dumper-no-output-not-shown-tp2606470p2606470.html
Sent from the Mahout User List mailing list archive at Nabble.com.