You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Ted Dunning <te...@gmail.com> on 2013/03/01 17:24:36 UTC

Re: Vector distance within a cluster

What Sean says is just right, except that I was (telegraphically) getting
at a slightly different point with L_1:

On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <ch...@heystaks.com>wrote:

> Is L_1 regularization the same as manhattan distance?
>

L_1 metric is manhattan distance, yes.

L_1 regularization of k-means refers to something a little bit different.

The idea with regularization is that you add some sort of penalty to the
function you are optimizing.  This penalty pushes the optimization toward a
solution that you would prefer on some other grounds than just the
optimization alone.  Regularization often helps in solving underdetermined
systems where there are an infinite number of solutions and we have to pick
a preferred solution.

There isn't anything that says that you have to be optimizing the same kind
of function as the regularization.  Thus k-means, which is inherently
optimizing squared error can quite reasonably be regularized with L_1 (sum
of the absolute value of the centroids' coefficients).

I haven't tried this at all seriously yet.  L_1 regularization tends to
help drive toward sparsity, but it is normally used in convex problems
where we can guarantee a findable global optimum.  The k-means problem,
however, is not convex so adding the regularization may screw things up in
practice.  For text-like data, I have a strong intuition that the idealized
effect of L_1 should be very good, but the pragmatic effect may not be so
useful.

Re: Vector distance within a cluster

Posted by Ted Dunning <te...@gmail.com>.
If you get near a good solution (which is what the streaming k-means tries
to make very likely) then the local convexity should not be spoiled by the
L_1 regularization.  This is behind the guaranteed convergence of SGD and
the reason that step-wise regression so often fails ... stepwise is
equivalent to L_0 regularization which breaks convexity.  L_1 does nearly
the same amount of good while leaving convexity intact.

On Mon, Mar 4, 2013 at 2:44 PM, Sean Owen <sr...@gmail.com> wrote:

> So this is constraining the centroids themselves to be 0 in many
> dimensions... interesting. I assume this comes into play when you move
> centroids -- you're not minimizing squared distance anymore but that plus
> an additional penalty. I'd be interested to hear your results Chris if you
> try it -- versus simply using the L1 distance alone. I agree this does seem
> to make it all the more important to run it many times and pick the best
> clustering due to local minima.
>
>
> On Mon, Mar 4, 2013 at 7:18 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > Always good to test, but I think that euclidean distance with L_1
> > regularization is probably more interesting.
> >
> > On Mon, Mar 4, 2013 at 12:00 PM, Chris Harrington <chris@heystaks.com
> > >wrote:
> >
> > > So if I'm understanding what you are saying is, simply put, that I
> should
> > > investigate the use L_1 as my distance measure during my measuring of
> > > vector distance within a cluster?
> > >
> > >
> > > On 1 Mar 2013, at 16:24, Ted Dunning wrote:
> > >
> > > > What Sean says is just right, except that I was (telegraphically)
> > getting
> > > > at a slightly different point with L_1:
> > > >
> > > > On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <
> chris@heystaks.com
> > > >wrote:
> > > >
> > > >> Is L_1 regularization the same as manhattan distance?
> > > >>
> > > >
> > > > L_1 metric is manhattan distance, yes.
> > > >
> > > > L_1 regularization of k-means refers to something a little bit
> > different.
> > > >
> > > > The idea with regularization is that you add some sort of penalty to
> > the
> > > > function you are optimizing.  This penalty pushes the optimization
> > > toward a
> > > > solution that you would prefer on some other grounds than just the
> > > > optimization alone.  Regularization often helps in solving
> > > underdetermined
> > > > systems where there are an infinite number of solutions and we have
> to
> > > pick
> > > > a preferred solution.
> > > >
> > > > There isn't anything that says that you have to be optimizing the
> same
> > > kind
> > > > of function as the regularization.  Thus k-means, which is inherently
> > > > optimizing squared error can quite reasonably be regularized with L_1
> > > (sum
> > > > of the absolute value of the centroids' coefficients).
> > > >
> > > > I haven't tried this at all seriously yet.  L_1 regularization tends
> to
> > > > help drive toward sparsity, but it is normally used in convex
> problems
> > > > where we can guarantee a findable global optimum.  The k-means
> problem,
> > > > however, is not convex so adding the regularization may screw things
> up
> > > in
> > > > practice.  For text-like data, I have a strong intuition that the
> > > idealized
> > > > effect of L_1 should be very good, but the pragmatic effect may not
> be
> > so
> > > > useful.
> > >
> > >
> >
>

Re: Vector distance within a cluster

Posted by Sean Owen <sr...@gmail.com>.
So this is constraining the centroids themselves to be 0 in many
dimensions... interesting. I assume this comes into play when you move
centroids -- you're not minimizing squared distance anymore but that plus
an additional penalty. I'd be interested to hear your results Chris if you
try it -- versus simply using the L1 distance alone. I agree this does seem
to make it all the more important to run it many times and pick the best
clustering due to local minima.


On Mon, Mar 4, 2013 at 7:18 PM, Ted Dunning <te...@gmail.com> wrote:

> Always good to test, but I think that euclidean distance with L_1
> regularization is probably more interesting.
>
> On Mon, Mar 4, 2013 at 12:00 PM, Chris Harrington <chris@heystaks.com
> >wrote:
>
> > So if I'm understanding what you are saying is, simply put, that I should
> > investigate the use L_1 as my distance measure during my measuring of
> > vector distance within a cluster?
> >
> >
> > On 1 Mar 2013, at 16:24, Ted Dunning wrote:
> >
> > > What Sean says is just right, except that I was (telegraphically)
> getting
> > > at a slightly different point with L_1:
> > >
> > > On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <chris@heystaks.com
> > >wrote:
> > >
> > >> Is L_1 regularization the same as manhattan distance?
> > >>
> > >
> > > L_1 metric is manhattan distance, yes.
> > >
> > > L_1 regularization of k-means refers to something a little bit
> different.
> > >
> > > The idea with regularization is that you add some sort of penalty to
> the
> > > function you are optimizing.  This penalty pushes the optimization
> > toward a
> > > solution that you would prefer on some other grounds than just the
> > > optimization alone.  Regularization often helps in solving
> > underdetermined
> > > systems where there are an infinite number of solutions and we have to
> > pick
> > > a preferred solution.
> > >
> > > There isn't anything that says that you have to be optimizing the same
> > kind
> > > of function as the regularization.  Thus k-means, which is inherently
> > > optimizing squared error can quite reasonably be regularized with L_1
> > (sum
> > > of the absolute value of the centroids' coefficients).
> > >
> > > I haven't tried this at all seriously yet.  L_1 regularization tends to
> > > help drive toward sparsity, but it is normally used in convex problems
> > > where we can guarantee a findable global optimum.  The k-means problem,
> > > however, is not convex so adding the regularization may screw things up
> > in
> > > practice.  For text-like data, I have a strong intuition that the
> > idealized
> > > effect of L_1 should be very good, but the pragmatic effect may not be
> so
> > > useful.
> >
> >
>

Re: Vector distance within a cluster

Posted by Ted Dunning <te...@gmail.com>.
Always good to test, but I think that euclidean distance with L_1
regularization is probably more interesting.

On Mon, Mar 4, 2013 at 12:00 PM, Chris Harrington <ch...@heystaks.com>wrote:

> So if I'm understanding what you are saying is, simply put, that I should
> investigate the use L_1 as my distance measure during my measuring of
> vector distance within a cluster?
>
>
> On 1 Mar 2013, at 16:24, Ted Dunning wrote:
>
> > What Sean says is just right, except that I was (telegraphically) getting
> > at a slightly different point with L_1:
> >
> > On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <chris@heystaks.com
> >wrote:
> >
> >> Is L_1 regularization the same as manhattan distance?
> >>
> >
> > L_1 metric is manhattan distance, yes.
> >
> > L_1 regularization of k-means refers to something a little bit different.
> >
> > The idea with regularization is that you add some sort of penalty to the
> > function you are optimizing.  This penalty pushes the optimization
> toward a
> > solution that you would prefer on some other grounds than just the
> > optimization alone.  Regularization often helps in solving
> underdetermined
> > systems where there are an infinite number of solutions and we have to
> pick
> > a preferred solution.
> >
> > There isn't anything that says that you have to be optimizing the same
> kind
> > of function as the regularization.  Thus k-means, which is inherently
> > optimizing squared error can quite reasonably be regularized with L_1
> (sum
> > of the absolute value of the centroids' coefficients).
> >
> > I haven't tried this at all seriously yet.  L_1 regularization tends to
> > help drive toward sparsity, but it is normally used in convex problems
> > where we can guarantee a findable global optimum.  The k-means problem,
> > however, is not convex so adding the regularization may screw things up
> in
> > practice.  For text-like data, I have a strong intuition that the
> idealized
> > effect of L_1 should be very good, but the pragmatic effect may not be so
> > useful.
>
>

Re: Vector distance within a cluster

Posted by Chris Harrington <ch...@heystaks.com>.
So if I'm understanding what you are saying is, simply put, that I should investigate the use L_1 as my distance measure during my measuring of vector distance within a cluster?


On 1 Mar 2013, at 16:24, Ted Dunning wrote:

> What Sean says is just right, except that I was (telegraphically) getting
> at a slightly different point with L_1:
> 
> On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <ch...@heystaks.com>wrote:
> 
>> Is L_1 regularization the same as manhattan distance?
>> 
> 
> L_1 metric is manhattan distance, yes.
> 
> L_1 regularization of k-means refers to something a little bit different.
> 
> The idea with regularization is that you add some sort of penalty to the
> function you are optimizing.  This penalty pushes the optimization toward a
> solution that you would prefer on some other grounds than just the
> optimization alone.  Regularization often helps in solving underdetermined
> systems where there are an infinite number of solutions and we have to pick
> a preferred solution.
> 
> There isn't anything that says that you have to be optimizing the same kind
> of function as the regularization.  Thus k-means, which is inherently
> optimizing squared error can quite reasonably be regularized with L_1 (sum
> of the absolute value of the centroids' coefficients).
> 
> I haven't tried this at all seriously yet.  L_1 regularization tends to
> help drive toward sparsity, but it is normally used in convex problems
> where we can guarantee a findable global optimum.  The k-means problem,
> however, is not convex so adding the regularization may screw things up in
> practice.  For text-like data, I have a strong intuition that the idealized
> effect of L_1 should be very good, but the pragmatic effect may not be so
> useful.