You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Dmitriy Lyubimov <dl...@gmail.com> on 2014/10/10 23:10:06 UTC

QR variations and a better QR for o.a.m.math.decompositions

Please look at this survey [1]. I am currently reading thru it. Quite
comprehensive.

Where it fits us: ssvd, dssvd depend on QR accuracy quite a bit.

What we have right now in "new" algebra DSL is shown there as algorithm 9,
"Cholesky QR".

The benefit is that it is extremely easy and parallelizable. The problem
with that guy is that it is quite a bit more unstable compared to
Householder's or Given's (you can test it even with non-distributed
matrices, the difference quite apparent, especially in cases that start
apporaching singularity).

What we had in MR version was something close to what they describe as
TSQR. Our approach is detailed in Nathan Halko's dissertation (reference is
given on SSVD page). Our "older" MR version is much better in accuracy.

It is different from TSQR in quite a few things.

First, it is using reordered Given's QR for both reductions and blocking
QR. using givens QR for blocks allows for sequential processing of input.
Householder requires pretty much entire matrix loaded first.

Second, it develops algorithm for N-way merges instead of binary merges as
defined by TSQR. I haven't finish reading implementation details on TSQR,
but basically what it seems to mean is that TSQR requires dynamic
scheduling for merge tasks to stay at maximum parallelization efficiency.
N-way formulation allows to run more than 2 tasks before each reduction
step, hence, the depth of the tree could be much more shallow (log base N
instead of log base 2) of number of blocks.

There's also a CAQR algorithm there which i haven't even touched. I think
it is worth to read thru that.

Bottom line, what i think is that

(1) we need to implement a well parallelized variation of one of TSQR, CAQR
or Mahout MR N-way Givens QR as a more numerically stable alternative to
CholeskyQR (aka "thinQR" in the sparkbindings algebra);

(2) I think some of approaches in CAQR must lead to parallelization of
non-thin decompositions;

(3) they also consider approaches to distributed LU stuff which can open a
door to solving "non-thin" Ridge regressions and such.

Looking for takers and opinions on CAQR and Mahout's n-way Givens for the
sparkbindings.

[1] http://arxiv.org/pdf/0806.2159.pdf

Re: QR variations and a better QR for o.a.m.math.decompositions

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

For reference on Mahout's Given's N-way QR, see [2]

[2]
http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf
Sections
4.6 ... 4.7

On Fri, Oct 10, 2014 at 2:10 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Please look at this survey [1]. I am currently reading thru it. Quite
> comprehensive.
>
> Where it fits us: ssvd, dssvd depend on QR accuracy quite a bit.
>
> What we have right now in "new" algebra DSL is shown there as algorithm 9,
> "Cholesky QR".
>
> The benefit is that it is extremely easy and parallelizable. The problem
> with that guy is that it is quite a bit more unstable compared to
> Householder's or Given's (you can test it even with non-distributed
> matrices, the difference quite apparent, especially in cases that start
> apporaching singularity).
>
> What we had in MR version was something close to what they describe as
> TSQR. Our approach is detailed in Nathan Halko's dissertation (reference is
> given on SSVD page). Our "older" MR version is much better in accuracy.
>
> It is different from TSQR in quite a few things.
>
> First, it is using reordered Given's QR for both reductions and blocking
> QR. using givens QR for blocks allows for sequential processing of input.
> Householder requires pretty much entire matrix loaded first.
>
> Second, it develops algorithm for N-way merges instead of binary merges as
> defined by TSQR. I haven't finish reading implementation details on TSQR,
> but basically what it seems to mean is that TSQR requires dynamic
> scheduling for merge tasks to stay at maximum parallelization efficiency.
> N-way formulation allows to run more than 2 tasks before each reduction
> step, hence, the depth of the tree could be much more shallow (log base N
> instead of log base 2) of number of blocks.
>
> There's also a CAQR algorithm there which i haven't even touched. I think
> it is worth to read thru that.
>
> Bottom line, what i think is that
>
> (1) we need to implement a well parallelized variation of one of TSQR,
> CAQR or Mahout MR N-way Givens QR as a more numerically stable alternative
> to CholeskyQR (aka "thinQR" in the sparkbindings algebra);
>
> (2) I think some of approaches in CAQR must lead to parallelization of
> non-thin decompositions;
>
> (3) they also consider approaches to distributed LU stuff which can open a
> door to solving "non-thin" Ridge regressions and such.
>
> Looking for takers and opinions on CAQR and Mahout's n-way Givens for the
> sparkbindings.
>
> [1] http://arxiv.org/pdf/0806.2159.pdf
>

Re: QR variations and a better QR for o.a.m.math.decompositions

Posted by Ted Dunning <te...@gmail.com>.

interesting analysis.

I haven't worked through the paper yet, but it is well known that the
Cholesky approach can be problematic near singularity so this all makes
lots of sense.



On Fri, Oct 10, 2014 at 2:10 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Please look at this survey [1]. I am currently reading thru it. Quite
> comprehensive.
>
> Where it fits us: ssvd, dssvd depend on QR accuracy quite a bit.
>
> What we have right now in "new" algebra DSL is shown there as algorithm 9,
> "Cholesky QR".
>
> The benefit is that it is extremely easy and parallelizable. The problem
> with that guy is that it is quite a bit more unstable compared to
> Householder's or Given's (you can test it even with non-distributed
> matrices, the difference quite apparent, especially in cases that start
> apporaching singularity).
>
> What we had in MR version was something close to what they describe as
> TSQR. Our approach is detailed in Nathan Halko's dissertation (reference is
> given on SSVD page). Our "older" MR version is much better in accuracy.
>
> It is different from TSQR in quite a few things.
>
> First, it is using reordered Given's QR for both reductions and blocking
> QR. using givens QR for blocks allows for sequential processing of input.
> Householder requires pretty much entire matrix loaded first.
>
> Second, it develops algorithm for N-way merges instead of binary merges as
> defined by TSQR. I haven't finish reading implementation details on TSQR,
> but basically what it seems to mean is that TSQR requires dynamic
> scheduling for merge tasks to stay at maximum parallelization efficiency.
> N-way formulation allows to run more than 2 tasks before each reduction
> step, hence, the depth of the tree could be much more shallow (log base N
> instead of log base 2) of number of blocks.
>
> There's also a CAQR algorithm there which i haven't even touched. I think
> it is worth to read thru that.
>
> Bottom line, what i think is that
>
> (1) we need to implement a well parallelized variation of one of TSQR, CAQR
> or Mahout MR N-way Givens QR as a more numerically stable alternative to
> CholeskyQR (aka "thinQR" in the sparkbindings algebra);
>
> (2) I think some of approaches in CAQR must lead to parallelization of
> non-thin decompositions;
>
> (3) they also consider approaches to distributed LU stuff which can open a
> door to solving "non-thin" Ridge regressions and such.
>
> Looking for takers and opinions on CAQR and Mahout's n-way Givens for the
> sparkbindings.
>
> [1] http://arxiv.org/pdf/0806.2159.pdf
>