You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Keren Ouaknine <ke...@gmail.com> on 2011/12/27 23:26:20 UTC

KMeans - getting gibrish output and running options

Hello,

I am running the KMeans sample:
$MAHOUT_HOME/bin/mahout
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
Following:
https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html

I uploaded the sythetic_data and this example runs fine, however I get
gibrish output while looking at the output dir.
Also, I would like to find out how many centroids are in this sample, and
what is their initial locations?
I tried to dig into the code, but src files dont seem to be included in the
distribution

Thanks,
Keren


-- 
Keren Ouaknine
Web: www.kereno.com

Re: KMeans - getting gibrish output and running options

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Mahout in general uses sequence files for input and output. These are 
binary encoded files that can only be read by a compatible program. If 
you are trying to e.g. less .../part-xxx then you won't see much that is 
human readable. You can run the ClusterDumper to get human readable 
output from running any clustering job.

On 12/27/11 3:26 PM, Keren Ouaknine wrote:
> Hello,
>
> I am running the KMeans sample:
> $MAHOUT_HOME/bin/mahout
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
> Following:
> https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html
>
> I uploaded the sythetic_data and this example runs fine, however I get
> gibrish output while looking at the output dir.
> Also, I would like to find out how many centroids are in this sample, and
> what is their initial locations?
> I tried to dig into the code, but src files dont seem to be included in the
> distribution
>
> Thanks,
> Keren
>
>