You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Aniruddha Basak <t-...@expedia.com> on 2012/08/01 03:00:15 UTC

Mahout LanczosSolver explanation

Hi,
I am working on Spectral Kmeans which involves an eigen-decomposition step
using Lanczos. As I did not get exact similar results as expected, I tried to understand the
implementation.
I have one question about "LanczosSolver .java" :
In the "solve" method, while building the "tridiagonal matrix" there is a step just
after the multiplication job (performed on Hadoop as TimesSquaredJob) as
------------
nextVector.assign(new Scale(1.0 / state.getScaleFactor()));
-----------
I could not understand why this scaling is performed?

( When I compared the results on a small matrix to an equivalent Matlab script,
I found the results are exactly similar WITHOUT this scaling. Including this scaling
makes the results different from the Matlab results.  )


Thanks,
Aniruddha

Re: Mahout LanczosSolver explanation

Posted by Ted Dunning <te...@gmail.com>.

Stochastic SVD and alternating least squares would both be better, I think.

On Tue, Jul 31, 2012 at 7:28 PM, Aniruddha Basak <t-...@expedia.com>wrote:

> I did not choose Lanczos. It found it being used as a part of
> Spectral-Kmeans
> implemented in Mahout.
> Can u please recommend some better alternative of Lanczos.
>
> Thanks,
> Aniruddha
>
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Tuesday, July 31, 2012 6:24 PM
> To: user@mahout.apache.org
> Cc: Jake Mannix
> Subject: Re: Mahout LanczosSolver explanation
>
> WHy are you using Lanczos?  Why not use something more recent?
>
> On Tue, Jul 31, 2012 at 7:00 PM, Aniruddha Basak <t-abasak@expedia.com
> >wrote:
>
> > Hi,
> > I am working on Spectral Kmeans which involves an eigen-decomposition
> > step using Lanczos. As I did not get exact similar results as
> > expected, I tried to understand the implementation.
> > I have one question about "LanczosSolver .java" :
> > In the "solve" method, while building the "tridiagonal matrix" there
> > is a step just after the multiplication job (performed on Hadoop as
> > TimesSquaredJob) as
> > ------------
> > nextVector.assign(new Scale(1.0 / state.getScaleFactor()));
> > -----------
> > I could not understand why this scaling is performed?
> >
> > ( When I compared the results on a small matrix to an equivalent
> > Matlab script, I found the results are exactly similar WITHOUT this
> > scaling. Including this scaling makes the results different from the
> > Matlab results.  )
> >
> >
> > Thanks,
> > Aniruddha
> >
> >
>

RE: Mahout LanczosSolver explanation

Posted by Aniruddha Basak <t-...@expedia.com>.

I did not choose Lanczos. It found it being used as a part of Spectral-Kmeans
implemented in Mahout.
Can u please recommend some better alternative of Lanczos.

Thanks,
Aniruddha


-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Tuesday, July 31, 2012 6:24 PM
To: user@mahout.apache.org
Cc: Jake Mannix
Subject: Re: Mahout LanczosSolver explanation

WHy are you using Lanczos?  Why not use something more recent?

On Tue, Jul 31, 2012 at 7:00 PM, Aniruddha Basak <t-...@expedia.com>wrote:

> Hi,
> I am working on Spectral Kmeans which involves an eigen-decomposition 
> step using Lanczos. As I did not get exact similar results as 
> expected, I tried to understand the implementation.
> I have one question about "LanczosSolver .java" :
> In the "solve" method, while building the "tridiagonal matrix" there 
> is a step just after the multiplication job (performed on Hadoop as 
> TimesSquaredJob) as
> ------------
> nextVector.assign(new Scale(1.0 / state.getScaleFactor()));
> -----------
> I could not understand why this scaling is performed?
>
> ( When I compared the results on a small matrix to an equivalent 
> Matlab script, I found the results are exactly similar WITHOUT this 
> scaling. Including this scaling makes the results different from the 
> Matlab results.  )
>
>
> Thanks,
> Aniruddha
>
>

Re: Mahout LanczosSolver explanation

Posted by Ted Dunning <te...@gmail.com>.

WHy are you using Lanczos?  Why not use something more recent?

On Tue, Jul 31, 2012 at 7:00 PM, Aniruddha Basak <t-...@expedia.com>wrote:

> Hi,
> I am working on Spectral Kmeans which involves an eigen-decomposition step
> using Lanczos. As I did not get exact similar results as expected, I tried
> to understand the
> implementation.
> I have one question about "LanczosSolver .java" :
> In the "solve" method, while building the "tridiagonal matrix" there is a
> step just
> after the multiplication job (performed on Hadoop as TimesSquaredJob) as
> ------------
> nextVector.assign(new Scale(1.0 / state.getScaleFactor()));
> -----------
> I could not understand why this scaling is performed?
>
> ( When I compared the results on a small matrix to an equivalent Matlab
> script,
> I found the results are exactly similar WITHOUT this scaling. Including
> this scaling
> makes the results different from the Matlab results.  )
>
>
> Thanks,
> Aniruddha
>
>

Re: Mahout LanczosSolver explanation

Posted by Jake Mannix <ja...@gmail.com>.

On Tue, Jul 31, 2012 at 6:00 PM, Aniruddha Basak <t-...@expedia.com>wrote:

> Hi,****
>
> I am working on Spectral Kmeans which involves an eigen-decomposition step
> ****
>
> using Lanczos. As I did not get exact similar results as expected, I tried
> to understand the****
>
> implementation.****
>
> I have one question about “LanczosSolver .java” :****
>
> In the “solve” method, while building the “tridiagonal matrix” there is a
> step just****
>
> after the multiplication job (performed on Hadoop as TimesSquaredJob) as**
> **
>
> ------------****
>
> nextVector.assign(*new* Scale(1.0 / state.getScaleFactor()));****
>
> -----------****
>
> I could not understand why this scaling is performed?
>

It's to help with floating point overflow issues in naive Lanczos
implementations:
each time you multiply the basis vectors by the matrix^2, it grows roughly
like the
largest singular of the matrix (squared).  Do this a fifty or sixty times
and you
blow over 10^308 and you're screwed.  Scaling the basis vectors down after
each iteration is equivalent to normalizing the input matrix to have
smaller singular
values (uniformly), and turns floating point overflow problems into
underflow
problems.  Thankfully, when you have entries in your basis vectors which are
close to zero, it's ok to have them be *actually* zero.  Unless you do it
to all of
them, naturally.

Either way, uniform scaling doesn't change the output singular vectors, so
the
result is unchanged if you don't end up running into floating point
precision
issues.

> ****
>
> **(When I compared the results on a small matrix to an equivalent Matlab
> script,
>
> **
>
> I found the results are exactly similar WITHOUT this scaling. Including
> this scaling****
>
> makes the results different from the Matlab results.  )****
>
> ** **
>
> ** **
>
> Thanks,****
>
> Aniruddha****
>
> ** **
>

-- 

  -jake