You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Stefan Kreuzer <st...@aol.de> on 2013/01/31 23:37:42 UTC

Re: Figuring out good values for t1 and t2 for canopy

Hi Chris,

I am also experimenting with CC. For me chosing CosineDistanceMeasure 
and values very close to 1 (>0.96) with T2 being only a little smaller 
than T1 led to reasonable values for k. Although this puzzles me too, I 
just asked a another question because of this.


-----Ursprüngliche Mitteilung-----
Von: Chris Harrington <ch...@heystaks.com>
An: user <us...@mahout.apache.org>
Verschickt: Do, 31 Jan 2013 7:22 pm
Betreff: Figuring out good values for t1 and t2 for canopy


Hi all,

I'm trying to run canopy clustering before means and I can't seem to 
get a value
for t1 and t2 that give me any results.
No matter what values I use it results in no clusters.

This is probably due to a severe lack of knowledge on the subject on my 
part so
can anyone point me toward some good resources to read up on the topic 
of
choosing a distance measure and a t1 and t2 for that measure?



  

Re: Figuring out good values for t1 and t2 for canopy

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
I know of no reliable ways to avoid some iteration in setting the T 
values for Canopy but T1 really has no impact on the number of clusters 
so setting T1==T2 and experimenting with T2 will reduce your search space.


On 2/1/13 6:29 AM, Chris Harrington wrote:
> Seems my lack of any clusters what so ever was my own fault, wasn't pointing at the correct directory.
>
> Though I would still like to find some good material on this topic of figuring out t1 and t2, is it just trial and error or are there specific features of my data set that I can look at to infer at least marginally good values as a starting point?
>
>
> On 31 Jan 2013, at 22:37, Stefan Kreuzer wrote:
>
>> Hi Chris,
>>
>> I am also experimenting with CC. For me chosing CosineDistanceMeasure and values very close to 1 (>0.96) with T2 being only a little smaller than T1 led to reasonable values for k. Although this puzzles me too, I just asked a another question because of this.
>>
>>
>> -----Ursprüngliche Mitteilung-----
>> Von: Chris Harrington <ch...@heystaks.com>
>> An: user <us...@mahout.apache.org>
>> Verschickt: Do, 31 Jan 2013 7:22 pm
>> Betreff: Figuring out good values for t1 and t2 for canopy
>>
>>
>> Hi all,
>>
>> I'm trying to run canopy clustering before means and I can't seem to get a value
>> for t1 and t2 that give me any results.
>> No matter what values I use it results in no clusters.
>>
>> This is probably due to a severe lack of knowledge on the subject on my part so
>> can anyone point me toward some good resources to read up on the topic of
>> choosing a distance measure and a t1 and t2 for that measure?
>>
>>
>>
>>
>
>


Re: Figuring out good values for t1 and t2 for canopy

Posted by Rajesh Nikam <ra...@gmail.com>.
I feel having t1/t2 closer to 0 means very similar instances and 1 mean no
similarity.
You could try with 0.1 and going on either side until you get satisfactory
clusters.

Thanks
Rajesh


On Fri, Feb 1, 2013 at 4:59 PM, Chris Harrington <ch...@heystaks.com> wrote:

> Seems my lack of any clusters what so ever was my own fault, wasn't
> pointing at the correct directory.
>
> Though I would still like to find some good material on this topic of
> figuring out t1 and t2, is it just trial and error or are there specific
> features of my data set that I can look at to infer at least marginally
> good values as a starting point?
>
>
> On 31 Jan 2013, at 22:37, Stefan Kreuzer wrote:
>
> > Hi Chris,
> >
> > I am also experimenting with CC. For me chosing CosineDistanceMeasure
> and values very close to 1 (>0.96) with T2 being only a little smaller than
> T1 led to reasonable values for k. Although this puzzles me too, I just
> asked a another question because of this.
> >
> >
> > -----Ursprüngliche Mitteilung-----
> > Von: Chris Harrington <ch...@heystaks.com>
> > An: user <us...@mahout.apache.org>
> > Verschickt: Do, 31 Jan 2013 7:22 pm
> > Betreff: Figuring out good values for t1 and t2 for canopy
> >
> >
> > Hi all,
> >
> > I'm trying to run canopy clustering before means and I can't seem to get
> a value
> > for t1 and t2 that give me any results.
> > No matter what values I use it results in no clusters.
> >
> > This is probably due to a severe lack of knowledge on the subject on my
> part so
> > can anyone point me toward some good resources to read up on the topic of
> > choosing a distance measure and a t1 and t2 for that measure?
> >
> >
> >
> >
>
>

Re: Figuring out good values for t1 and t2 for canopy

Posted by Chris Harrington <ch...@heystaks.com>.
Seems my lack of any clusters what so ever was my own fault, wasn't pointing at the correct directory. 

Though I would still like to find some good material on this topic of figuring out t1 and t2, is it just trial and error or are there specific features of my data set that I can look at to infer at least marginally good values as a starting point?


On 31 Jan 2013, at 22:37, Stefan Kreuzer wrote:

> Hi Chris,
> 
> I am also experimenting with CC. For me chosing CosineDistanceMeasure and values very close to 1 (>0.96) with T2 being only a little smaller than T1 led to reasonable values for k. Although this puzzles me too, I just asked a another question because of this.
> 
> 
> -----Ursprüngliche Mitteilung-----
> Von: Chris Harrington <ch...@heystaks.com>
> An: user <us...@mahout.apache.org>
> Verschickt: Do, 31 Jan 2013 7:22 pm
> Betreff: Figuring out good values for t1 and t2 for canopy
> 
> 
> Hi all,
> 
> I'm trying to run canopy clustering before means and I can't seem to get a value
> for t1 and t2 that give me any results.
> No matter what values I use it results in no clusters.
> 
> This is probably due to a severe lack of knowledge on the subject on my part so
> can anyone point me toward some good resources to read up on the topic of
> choosing a distance measure and a t1 and t2 for that measure?
> 
> 
> 
>