You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by 万代豊 <20...@gmail.com> on 2013/03/03 14:06:00 UTC

Use of ClusterLabel in Mahout-0.7

Hi
Is this feature already unsupported in Mahout-0.7?
Some topics tells that this has moved to WeightedVectorWritables, however
still not sure how I can
pull out labels from clusters other than top terms from ClusterDumper....

K-Means clustering from vectors created from Lucene index (vi Mahout
Lucene.Vectors) itself went well.
I have intentionally giving minClusterSize as "1" since this exercise uses
only 37 documents and 20 clusters
have been generated.

[hadoop@localhost mahout-distribution-0.7]$ $MAHOUT_HOME/bin/mahout
org.apache.mahout.utils.vectors.lucene.ClusterLabels --dir
/home/hadoop/lia2e/indexes/MeetLucene/ --field contents --idField id
--seqFileDir JAText-kmeans-clusters08/clusters-2-final --pointsDir
JAText-kmeans-clusters08/clusteredPoints --minClusterSize 1 --maxLabels 5
Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.7-job.jar
13/03/03 21:54:59 WARN driver.MahoutDriver: No
org.apache.mahout.utils.vectors.lucene.ClusterLabels.props found on
classpath, will use command-line arguments only
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 0 with
size: 2
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 2 with
size: 2
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 5 with
size: 5
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 6 with
size: 2
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 9 with
size: 1
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 10 with
size: 2
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 13 with
size: 1
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 15 with
size: 1
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 18 with
size: 1
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 19 with
size: 3
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 20 with
size: 1
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 23 with
size: 1
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 24 with
size: 1
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 29 with
size: 2
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 30 with
size: 1
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 31 with
size: 2
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 32 with
size: 2
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 34 with
size: 4
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 35 with
size: 1
13/03/03 21:55:00 INFO lucene.ClusterLabels: Skipping small cluster 36 with
size: 2
13/03/03 21:55:00 INFO driver.MahoutDriver: Program took 911 ms (Minutes:
0.015183333333333333)
[hadoop@localhost mahout-distribution-0.7]$

This is my combinational exercise from Taming Text with Lucene in Action.
Regards,,,
Y.Mandai