You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Wasim <wa...@gmail.com> on 2011/03/16 14:05:16 UTC

Interpreting output from Cluster Dumper

Hi,

I ran two clustering examples, from mahout wiki pages, synthetic data
control and reuters data set example. Sample output after running cluster
dumper on reuters example is shown:

CL-15495{n=117 c=[0:0.069, 79:0.080, 110:0.079, 122:0.080 ...] r=[0:0.739,
79:0.606, 110:0.855, 122:0.605, ...]}

Weight:  Point:
    1.0: [1142:2.946, 1388:9.285, 1544:2.876, 1983:4.021, ...]

how to interpret this output? In short i am looking for document ids which
belong to a particular cluster.

does 79:0.080 means "79" is document id which belongs to this cluster? I
already have read on mahout wiki-pages what CL, n, c and r means. But can
someone please explain them to me better or points to a resource where it is
explained a bit more in detail? Also what does "Point" means in the above
output? How can i use the center and radius to know which documents belongs
to one particular cluster?

Sorry, if i am asking some stupid questions, but i am a newbie wih apache
mahout and using it as part of my course assignment for clustering.
-- 
Thank you & Regards
Muhammad Wasimullah Khan
Mobile:+46 72 03 29 205
Alt.Telephone: +92 345 21 98 451
Email: mwkhan@kth.se
Skype: muhammad.wasim.khan

Re: Interpreting output from Cluster Dumper

Posted by Geek Gamer <ge...@gmail.com>.
On Wed, Mar 16, 2011 at 6:35 PM, Wasim <wa...@gmail.com> wrote:

> Hi,
>
> I ran two clustering examples, from mahout wiki pages, synthetic data
> control and reuters data set example. Sample output after running cluster
> dumper on reuters example is shown:
>
> CL-15495{n=117 c=[0:0.069, 79:0.080, 110:0.079, 122:0.080 ...] r=[0:0.739,
> 79:0.606, 110:0.855, 122:0.605, ...]}
>
> Weight:  Point:
>    1.0: [1142:2.946, 1388:9.285, 1544:2.876, 1983:4.021, ...]
>
>
the line following Weight: Point above is the point vector , you need to use
namedvectors to be able to get id of the points in the cluster dump.


how to interpret this output? In short i am looking for document ids which
> belong to a particular cluster.
>
> does 79:0.080 means "79" is document id which belongs to this cluster? I
> already have read on mahout wiki-pages what CL, n, c and r means. But can
> someone please explain them to me better or points to a resource where it
> is
> explained a bit more in detail? Also what does "Point" means in the above
> output? How can i use the center and radius to know which documents belongs
> to one particular cluster?
>
> Sorry, if i am asking some stupid questions, but i am a newbie wih apache
> mahout and using it as part of my course assignment for clustering.
> --
> Thank you & Regards
> Muhammad Wasimullah Khan
> Mobile:+46 72 03 29 205
> Alt.Telephone: +92 345 21 98 451
> Email: mwkhan@kth.se
> Skype: muhammad.wasim.khan
>