You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Donni Khan <pr...@googlemail.com> on 2014/09/10 10:35:22 UTC

Understanding Conopy with Cosinesimilarity

Hi all,

I‘m using  Canopy clustering with cosine similarity measure as input to
kmenas clustering.  I’m wondering how the similarity between documents is
calculated with respect to t1 and t2 parameters.
  Let me say t1=0.8 and t2=0.5. For the cosine similarity if s(d1,d2)>0.8
that means they are much similar, and if s(d1,d2)<0.5,  they are less (not)
similar.

In Canopy algorithm if s(d1,d1)<t2  then assign them(d1 and d2) to the same
canopy.  But In cosine similarity the distance s(d1,d2)<t2-value  which is
0.5  means there is no similarity.

Here I’m asking for clarification that point, May I’m wrong but I would
like to understand that.

Please, if anyone tell me How the cosine similarity is computed wrt t1 and
t2 parameters?

Thanks in advance
Doni