You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Donni Khan <pr...@googlemail.com> on 2014/11/17 14:01:30 UTC

How to choose the intioal clusters for K-mean from Tf-IDF vectors

Hi All,

I'm working with text clustering. I want to select specific documents(as a
vectors) to be centroIDs fo k-means.
I have created the TF-IDF for my dataset by using Mahout, and I would like
to choose the initioal clusters from TFIDF vectors.

Anyone has an idea Hw I can do it by Mahout?

Many thanks in advance.
Donni

Re: How to choose the intioal clusters for K-mean from Tf-IDF vectors

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Canopy is deprecated. 

Depends on what you want to do, are you trying to find the closest documents to some set that you have hand classified?

All Canopy does is seed centroids for kmeans to start with. Kmeans then iterates towards “better" ones. You can do the same with your docs but they won’t remain the centroids after iteration.

On Nov 17, 2014, at 2:11 PM, Sean Farrell <dr...@gmail.com> wrote:

Hi Donni,

I believe that the canopy clustering algorithm will do what you want,
though I haven't played around with it myself yet. The clustering chapter
in the 'Mahout in Action' book covers this fairly well.

Cheers,

Sean




*----------------------Dr Sean FarrellData Scientist*

On Tue, Nov 18, 2014 at 12:01 AM, Donni Khan <pr...@googlemail.com>
wrote:

> Hi All,
> 
> I'm working with text clustering. I want to select specific documents(as a
> vectors) to be centroIDs fo k-means.
> I have created the TF-IDF for my dataset by using Mahout, and I would like
> to choose the initioal clusters from TFIDF vectors.
> 
> Anyone has an idea Hw I can do it by Mahout?
> 
> Many thanks in advance.
> Donni
> 


Re: How to choose the intioal clusters for K-mean from Tf-IDF vectors

Posted by Sean Farrell <dr...@gmail.com>.
Hi Donni,

I believe that the canopy clustering algorithm will do what you want,
though I haven't played around with it myself yet. The clustering chapter
in the 'Mahout in Action' book covers this fairly well.

Cheers,

Sean




*----------------------Dr Sean FarrellData Scientist*

On Tue, Nov 18, 2014 at 12:01 AM, Donni Khan <pr...@googlemail.com>
wrote:

> Hi All,
>
> I'm working with text clustering. I want to select specific documents(as a
> vectors) to be centroIDs fo k-means.
> I have created the TF-IDF for my dataset by using Mahout, and I would like
> to choose the initioal clusters from TFIDF vectors.
>
> Anyone has an idea Hw I can do it by Mahout?
>
> Many thanks in advance.
> Donni
>