You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by William Moran <ec...@gmail.com> on 2013/08/12 23:12:12 UTC
Question about clusterdump
Hi,
What exactly are the numbers next to these terms? (this is an example
clusterdump from the Mahout in Action book, but my clusterdumps look
similar).
Top Terms:
Shania Twain => 1.126984126984127
Garth Brooks => 0.746031746031746
Sara Evans => 0.6031746031746031
Lonestar => 0.5238095238095238
Sorry if this is an obvious question but I find it hard to find details on
these specifics.
Many thanks,
Will
Re: Question about clusterdump
Posted by Ritwik Kumar <li...@gmail.com>.
I am not 100% on how Mahout implementation of KMeans algorithm does this,
but in general, cluster center is the centroid of all the points that
belong to that cluster. In the simplest case, it will just be the average
of all the points that belong to that cluster. Next, it could be an actual
point that is closest to the centroid.
On Thu, Aug 22, 2013 at 6:58 AM, Grant Ingersoll <gs...@apache.org>wrote:
>
> On Aug 12, 2013, at 5:12 PM, William Moran <ec...@gmail.com> wrote:
>
> > Hi,
> >
> > What exactly are the numbers next to these terms? (this is an example
> > clusterdump from the Mahout in Action book, but my clusterdumps look
> > similar).
>
> They are the weights assigned to each of the terms. They are likely the
> TF/IDF values, but I believe they may be other things depending on how your
> dictionary/vectors were created.
>
> >
> > Top Terms:
> >
> > Shania Twain => 1.126984126984127
> > Garth Brooks => 0.746031746031746
> > Sara Evans => 0.6031746031746031
> > Lonestar => 0.5238095238095238
> >
> > Sorry if this is an obvious question but I find it hard to find details
> on
> > these specifics.
> >
> > Many thanks,
> >
> > Will
>
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>
>
Re: Question about clusterdump
Posted by Grant Ingersoll <gs...@apache.org>.
On Aug 12, 2013, at 5:12 PM, William Moran <ec...@gmail.com> wrote:
> Hi,
>
> What exactly are the numbers next to these terms? (this is an example
> clusterdump from the Mahout in Action book, but my clusterdumps look
> similar).
They are the weights assigned to each of the terms. They are likely the TF/IDF values, but I believe they may be other things depending on how your dictionary/vectors were created.
>
> Top Terms:
>
> Shania Twain => 1.126984126984127
> Garth Brooks => 0.746031746031746
> Sara Evans => 0.6031746031746031
> Lonestar => 0.5238095238095238
>
> Sorry if this is an obvious question but I find it hard to find details on
> these specifics.
>
> Many thanks,
>
> Will
--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com