You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by David Scarlatti <d_...@yahoo.es> on 2012/09/14 10:49:44 UTC

Help with sample Clustering of synthetic control data

HI, I'm trying the  Clustering of synthetic control data sample (
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
)

I've uploaded to hadoop the  synthetic_control.data file and launched the
canopy clustering with:

$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job

The mapreduce job works and acording to the console output it found 6
clusters as expected.

However in /output/ it created this dirs:

clusteredPoints
clusters-0-final
data

(Not the clusters-10 expected acording to the sample)

and when I run the clusterdumper it works (see output) but the exit is a 0
length file  clusteranalyze.txt

hduser@DELLT54007:~/output$ $MAHOUT_HOME/bin/mahout clusterdump --input
clusters-0-final --pointsDir clusteredPoints --output clusteranalyze.txt
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /usr/local/hadoop/bin/hadoop and
HADOOP_CONF_DIR=/usr/local/hadoop/conf
MAHOUT-JOB: /opt/mahout/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
Warning: $HADOOP_HOME is deprecated.

12/09/13 15:34:23 INFO common.AbstractJob: Command line arguments:
{--dictionaryType=[text],
--distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
--endPhase=[2147483647], --input=[clusters-0-final],
--output=[clusteranalyze.txt], --outputFormat=[TEXT],
--pointsDir=[clusteredPoints], --startPhase=[0], --tempDir=[temp]}
12/09/13 15:34:24 INFO clustering.ClusterDumper: Wrote 0 clusters
12/09/13 15:34:24 INFO driver.MahoutDriver: Program took 582 ms (Minutes:
0.0097)


Any idea what was wrong??

Thanks in advance.


-- 
-----
David.