You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Liang Chenmin <li...@gmail.com> on 2009/11/25 01:17:16 UTC

A question about the naming of the cluster and points in synthetic data cluster

Hi all,
    I am a newbie to Mahout. I have a question about how to incorporate some
naming for cluster and points in the synthetic data cluster example.

    After getting the output of the synthetic data cluster, we have 6
clusters, and each one looks like:

###First is the information of the cluster
0:name::{"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\":[0,1,2...59],\"values\":[29.58838112577385,...],\"numMappings\":60},\"cardinality\":60,\"lengthSquared\":-1.0,\"name\":\"\"}"}

###And then follow by points belong to this cluster:
Points:
{"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\":[0,1,2,...,59],\"values\":[28.7812,34.4632,......
],],\"numMappings\":60},\"cardinality\":60,\"lengthSquared\":-1.0,\"name\":\"\"}"},

{"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\"
....


Is there a way for me to specify the name of the cluster? And more
importantly, if I actually have ID for each point, how could I show the ID
for each point in the final result? I want to see clearly the IDs in each
cluster. I have used my own data also, and the output is similar to the ones
above, although the indices are not the same as my matrix are sparse. And as
my data set is large, getting the IDs is quite important for me.

Thanks,
Mandy

Re: A question about the naming of the cluster and points in synthetic data cluster

Posted by Shashikant Kore <sh...@gmail.com>.
Check out ClusterDumper in utils
(utils/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java).
This utility will print cluster ID and the associated vector IDs.

--shashi

On Wed, Nov 25, 2009 at 5:47 AM, Liang Chenmin <li...@gmail.com> wrote:
> Hi all,
>    I am a newbie to Mahout. I have a question about how to incorporate some
> naming for cluster and points in the synthetic data cluster example.
>
>    After getting the output of the synthetic data cluster, we have 6
> clusters, and each one looks like:
>
> ###First is the information of the cluster
> 0:name::{"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\":[0,1,2...59],\"values\":[29.58838112577385,...],\"numMappings\":60},\"cardinality\":60,\"lengthSquared\":-1.0,\"name\":\"\"}"}
>
> ###And then follow by points belong to this cluster:
> Points:
> {"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\":[0,1,2,...,59],\"values\":[28.7812,34.4632,......
> ],],\"numMappings\":60},\"cardinality\":60,\"lengthSquared\":-1.0,\"name\":\"\"}"},
>
> {"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\"
> ....
>
>
> Is there a way for me to specify the name of the cluster? And more
> importantly, if I actually have ID for each point, how could I show the ID
> for each point in the final result? I want to see clearly the IDs in each
> cluster. I have used my own data also, and the output is similar to the ones
> above, although the indices are not the same as my matrix are sparse. And as
> my data set is large, getting the IDs is quite important for me.
>
> Thanks,
> Mandy
>