You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Miguel Angel Martin junquera <mi...@gmail.com> on 2015/01/14 11:41:12 UTC

boost selected dimensions in kmeans clustering

hi all,


I am clustering using kmeans several text documents from distintct sources
and I have  generated the sparse vectors of each document yet.
I want to boost some dimensions in the sparse vectors.

what is the best way to do this ?

is it a good idea  load the vectors  and find the dimensions values of tf
or tf-idf and boost this values?


Thanks in advance and regards

Re: boost selected dimensions in kmeans clustering

Posted by Ted Dunning <te...@gmail.com>.
On Thu, Jan 15, 2015 at 5:23 AM, Miguel Angel Martin junquera <
mianmarjun.mailinglist@gmail.com> wrote:

> My question is:..
>  Is it  better to scale up these dimensions  directly in the tf-idf
> sequence final mix  file using this correction factors  OR first do scale
> up   in each  tf-vectors and then mix vectors and  recalculate the  tf-idf
> final  to minimize  errors or desviations   in a  subsequent clustering
> from this tf-idf final mix vectors.
>

Mathematically it doesn't matter whether you scale the vectors at
generation time or before computing distance or by scaling during the
distance computation.

Different places for the change may be more or less easy in terms of
programming.  The two easiest places tend to be at the beginning (if you
know the weights) since you have to write that code anyway, or at the end
since there are provisions for changing the metric in some programs.

Re: boost selected dimensions in kmeans clustering

Posted by Miguel Angel Martin junquera <mi...@gmail.com>.
hi Ted,

Yes. I was considering various possibilities. one of them was this. ( scale
up these dimensions, for example,multiplying by a configurable factor
correction.)

 I really want  to mix two different vectors from the same documents
 with different lengths and dictionaries , (perhaps some terms of
dictionaries are the same). Then I will be  multiplying    dimension of
each vector  by a configurable factor correction.

My question is:..
 Is it  better to scale up these dimensions  directly in the tf-idf
sequence final mix  file using this correction factors  OR first do scale
up   in each  tf-vectors and then mix vectors and  recalculate the  tf-idf
final  to minimize  errors or desviations   in a  subsequent clustering
from this tf-idf final mix vectors.

Thanks in advance for your help.

One last note:

I am bass player and  701q AKG  with fiio E12+E09K is a perfect
combination!!


;-)






2015-01-14 20:12 GMT+01:00 Ted Dunning <te...@gmail.com>:

> The easiest way is to scale those dimensions up.
>
>
>
> On Wed, Jan 14, 2015 at 2:41 AM, Miguel Angel Martin junquera <
> mianmarjun.mailinglist@gmail.com> wrote:
>
> > hi all,
> >
> >
> > I am clustering using kmeans several text documents from distintct
> sources
> > and I have  generated the sparse vectors of each document yet.
> > I want to boost some dimensions in the sparse vectors.
> >
> > what is the best way to do this ?
> >
> > is it a good idea  load the vectors  and find the dimensions values of tf
> > or tf-idf and boost this values?
> >
> >
> > Thanks in advance and regards
> >
>

Re: boost selected dimensions in kmeans clustering

Posted by Ted Dunning <te...@gmail.com>.
The easiest way is to scale those dimensions up.



On Wed, Jan 14, 2015 at 2:41 AM, Miguel Angel Martin junquera <
mianmarjun.mailinglist@gmail.com> wrote:

> hi all,
>
>
> I am clustering using kmeans several text documents from distintct sources
> and I have  generated the sparse vectors of each document yet.
> I want to boost some dimensions in the sparse vectors.
>
> what is the best way to do this ?
>
> is it a good idea  load the vectors  and find the dimensions values of tf
> or tf-idf and boost this values?
>
>
> Thanks in advance and regards
>