You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by jung hoon sohn <js...@gmail.com> on 2012/10/07 14:08:40 UTC

Clusterdump Output Question

Hello,
I used k-means algorithm to cluster the text terms in the documents
according to the cosine distance measure.
It ran successfully and when we ran the clusterdump utility to see the top
terms per each clusters,
I get the output such as

      Top Terms:

            hello    =>     21.8977799999
            you     =>     11.9284304939
            ....

I am guessing the value next to the each terms are cosine distance values
but not very sure about it.
Does anyone know specifically what does the value represent?

Thanks.

Jung Hoon Sohn

Re: Clusterdump Output Question

Posted by paritosh ranjan <pa...@gmail.com>.
I don't see any issue in top terms having similar frequencies. Cosine
distance measure is considered to be a good distance measure for text data.

On Mon, Oct 8, 2012 at 10:35 AM, jung hoon sohn <js...@gmail.com> wrote:

> Thank you for the information.
> Following your answer, the top terms from the clusters have similar
> frequencies.
> As I used the cosine distance as the measure is this correct result?
>
> Thank You.
>
> Jung Hoon Sohn
>
> On Sun, Oct 7, 2012 at 9:35 PM, paritosh ranjan
> <pa...@gmail.com>wrote:
>
> > The top terms come from the centroid of the cluster. These values are the
> > term frequencies.
> >
> > On Sun, Oct 7, 2012 at 5:38 PM, jung hoon sohn <js...@gmail.com>
> wrote:
> >
> > > Hello,
> > > I used k-means algorithm to cluster the text terms in the documents
> > > according to the cosine distance measure.
> > > It ran successfully and when we ran the clusterdump utility to see the
> > top
> > > terms per each clusters,
> > > I get the output such as
> > >
> > >       Top Terms:
> > >
> > >             hello    =>     21.8977799999
> > >             you     =>     11.9284304939
> > >             ....
> > >
> > > I am guessing the value next to the each terms are cosine distance
> values
> > > but not very sure about it.
> > > Does anyone know specifically what does the value represent?
> > >
> > > Thanks.
> > >
> > > Jung Hoon Sohn
> > >
> >
>

Re: Clusterdump Output Question

Posted by jung hoon sohn <js...@gmail.com>.
Thank you for the information.
Following your answer, the top terms from the clusters have similar
frequencies.
As I used the cosine distance as the measure is this correct result?

Thank You.

Jung Hoon Sohn

On Sun, Oct 7, 2012 at 9:35 PM, paritosh ranjan
<pa...@gmail.com>wrote:

> The top terms come from the centroid of the cluster. These values are the
> term frequencies.
>
> On Sun, Oct 7, 2012 at 5:38 PM, jung hoon sohn <js...@gmail.com> wrote:
>
> > Hello,
> > I used k-means algorithm to cluster the text terms in the documents
> > according to the cosine distance measure.
> > It ran successfully and when we ran the clusterdump utility to see the
> top
> > terms per each clusters,
> > I get the output such as
> >
> >       Top Terms:
> >
> >             hello    =>     21.8977799999
> >             you     =>     11.9284304939
> >             ....
> >
> > I am guessing the value next to the each terms are cosine distance values
> > but not very sure about it.
> > Does anyone know specifically what does the value represent?
> >
> > Thanks.
> >
> > Jung Hoon Sohn
> >
>

Re: Clusterdump Output Question

Posted by paritosh ranjan <pa...@gmail.com>.
The top terms come from the centroid of the cluster. These values are the
term frequencies.

On Sun, Oct 7, 2012 at 5:38 PM, jung hoon sohn <js...@gmail.com> wrote:

> Hello,
> I used k-means algorithm to cluster the text terms in the documents
> according to the cosine distance measure.
> It ran successfully and when we ran the clusterdump utility to see the top
> terms per each clusters,
> I get the output such as
>
>       Top Terms:
>
>             hello    =>     21.8977799999
>             you     =>     11.9284304939
>             ....
>
> I am guessing the value next to the each terms are cosine distance values
> but not very sure about it.
> Does anyone know specifically what does the value represent?
>
> Thanks.
>
> Jung Hoon Sohn
>