You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Bogdan Vatkov <bo...@gmail.com> on 2010/01/14 01:22:15 UTC

Norm in text vectors?

What is the practical meaning of --norm parameter in the text-to-vector (
http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html) process?

Best regards,
Bogdan

Re: Norm in text vectors?

Posted by Jake Mannix <ja...@gmail.com>.
norm 2 and CosineDistanceMeasure are a good, fairly standard, choice.  The
L1
norm is useful for some things too, but you can use any positive integer or
"INF"
for L_infinity normalization.

  -jake

On Wed, Jan 13, 2010 at 4:32 PM, Bogdan Vatkov <bo...@gmail.com>wrote:

> Is it related to the distance calculation done
> by org.apache.mahout.common.distance.CosineDistanceMeasure for example?
> I am currently using --norm 2 in combination
> with org.apache.mahout.common.distance.CosineDistanceMeasure, is it ok,
> what
> other options I have for the --norm value?
>
> On Thu, Jan 14, 2010 at 2:28 AM, Jake Mannix <ja...@gmail.com>
> wrote:
>
> > It makes sure your vectors are all unit length (according to the norm you
> > choose - L2 norm
> > means: make sure each vector satisfies v.dot(v) == 1.0, for example)
> >
> > This makes sure that when you want to compare vectors to each other, a
> nice
> > "distance"
> > function is just distance(u, v) = 1 - u.dot(v)
> >
> >  -jake
> >
> > On Wed, Jan 13, 2010 at 4:22 PM, Bogdan Vatkov <bogdan.vatkov@gmail.com
> > >wrote:
> >
> > > What is the practical meaning of --norm parameter in the text-to-vector
> (
> > > http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html)
> process?
> > >
> > > Best regards,
> > > Bogdan
> > >
> >
>
>
>
> --
> Best regards,
> Bogdan
>

Re: Norm in text vectors?

Posted by Bogdan Vatkov <bo...@gmail.com>.
Is it related to the distance calculation done
by org.apache.mahout.common.distance.CosineDistanceMeasure for example?
I am currently using --norm 2 in combination
with org.apache.mahout.common.distance.CosineDistanceMeasure, is it ok, what
other options I have for the --norm value?

On Thu, Jan 14, 2010 at 2:28 AM, Jake Mannix <ja...@gmail.com> wrote:

> It makes sure your vectors are all unit length (according to the norm you
> choose - L2 norm
> means: make sure each vector satisfies v.dot(v) == 1.0, for example)
>
> This makes sure that when you want to compare vectors to each other, a nice
> "distance"
> function is just distance(u, v) = 1 - u.dot(v)
>
>  -jake
>
> On Wed, Jan 13, 2010 at 4:22 PM, Bogdan Vatkov <bogdan.vatkov@gmail.com
> >wrote:
>
> > What is the practical meaning of --norm parameter in the text-to-vector (
> > http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html) process?
> >
> > Best regards,
> > Bogdan
> >
>



-- 
Best regards,
Bogdan

Re: Norm in text vectors?

Posted by Jake Mannix <ja...@gmail.com>.
It makes sure your vectors are all unit length (according to the norm you
choose - L2 norm
means: make sure each vector satisfies v.dot(v) == 1.0, for example)

This makes sure that when you want to compare vectors to each other, a nice
"distance"
function is just distance(u, v) = 1 - u.dot(v)

  -jake

On Wed, Jan 13, 2010 at 4:22 PM, Bogdan Vatkov <bo...@gmail.com>wrote:

> What is the practical meaning of --norm parameter in the text-to-vector (
> http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html) process?
>
> Best regards,
> Bogdan
>