You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Benson Margulies <bi...@gmail.com> on 2011/04/08 17:52:50 UTC

Clustering when you can't tune parameters

Canopy+kmeans requires some tweaking and tuning to get useful values
of T1 and T2, if nothing else.

What if the goal were to get some useful clustering of a doc set on a
single attempt with no tuning. (Or only automatable tuning.)

Any ideas?

Re: Clustering when you can't tune parameters

Posted by Lance Norskog <go...@gmail.com>.

I have a JIRA that is the code scaffolding for that, but the algorithm
is not good. I did not understand T1 and T2.

https://issues.apache.org/jira/browse/MAHOUT-563

As I now understand it, a canopy cluster has three distances:
a) the nearest is the "gravity well"
b) the farthest is "some other cluster", and
c) the in-between distance is "throw it away"

T1 is a) and T2 is b). T2-T1 is c)

On Fri, Apr 8, 2011 at 8:52 AM, Benson Margulies <bi...@gmail.com> wrote:
> Canopy+kmeans requires some tweaking and tuning to get useful values
> of T1 and T2, if nothing else.
>
> What if the goal were to get some useful clustering of a doc set on a
> single attempt with no tuning. (Or only automatable tuning.)
>
> Any ideas?
>



-- 
Lance Norskog
goksron@gmail.com