You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mark <st...@gmail.com> on 2011/06/26 21:29:05 UTC

KMeans and Canopies

Should canopy generation and KMeans clustering typically use the same 
distance calculation or is possible to mix and match? Any reason why 
some would mix?

Thanks

Re: KMeans and Canopies

Posted by Mark <st...@gmail.com>.
What if one was to use Tanimoto distance measure with KMeans.. would the 
same reasoning apply?

On 6/26/11 1:54 PM, Christoph Brücke wrote:
> Hi Mark,
>
> you typically choose a somewhat cheaper distance metric for the canopy clustering, if used as a preprocessing step for KMeans. A simple example would be Manhattan distance (d = |x1 - x2| + |y1 - x2|) for Canopy clustering and Squared Euclidean distance [d = sqrt( (x1 - x2) ^2 + (y1 - y2) ^ 2) )] for KMeans. This way you got a cheap approximation for your initial cluster centers.
> I hope this was helpful.
>
> Regard,
> Christoph
>
>
> Am 26.06.2011 um 21:29 schrieb Mark:
>
>> Should canopy generation and KMeans clustering typically use the same distance calculation or is possible to mix and match? Any reason why some would mix?
>>
>> Thanks
>>
> Christoph Brücke
> christoph.bruecke@campus.tu-berlin.de
>
>
>

Re: KMeans and Canopies

Posted by Christoph Brücke <ch...@campus.tu-berlin.de>.
Hi Mark,

you typically choose a somewhat cheaper distance metric for the canopy clustering, if used as a preprocessing step for KMeans. A simple example would be Manhattan distance (d = |x1 - x2| + |y1 - x2|) for Canopy clustering and Squared Euclidean distance [d = sqrt( (x1 - x2) ^2 + (y1 - y2) ^ 2) )] for KMeans. This way you got a cheap approximation for your initial cluster centers.
I hope this was helpful.

Regard,
Christoph


Am 26.06.2011 um 21:29 schrieb Mark:

> Should canopy generation and KMeans clustering typically use the same distance calculation or is possible to mix and match? Any reason why some would mix?
> 
> Thanks
> 

Christoph Brücke
christoph.bruecke@campus.tu-berlin.de