You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/08/09 16:05:36 UTC

Clustering on EMR

Has anyone run Clustering (Kmeans) on EMR lately, per https://cwiki.apache.org/confluence/display/MAHOUT/Mahout+on+Elastic+MapReduce?  

Here's what I ran, using the CLI, 
./elastic-mapreduce -j j-31BXNQA7ATCCV  --jar s3://news-vecs/mahout-core-0.4-SNAPSHOT.job  --main-class org.apache.mahout.clustering.kmeans.KMeansDriver --arg "--input" --arg "s3://news-vecs/part-out.vec" --arg "--clusters" --arg s3://news-vecs/kmeans/clusters/ --arg "--k" --arg 10 --arg "--output" --arg s3://news-vecs/out/ --arg "--distanceMeasure" --arg  "org.apache.mahout.common.distance.CosineDistanceMeasure" --arg "--convergenceDelta" --arg 0.001 --arg "--overwrite" --arg "--maxIter" --arg 50 --arg "--clustering"

It seems to run, but I don't see anything useful done and the out directory is definitely not created.

Anyone have insight?

Thanks,
Grant