You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sergey Chickin (JIRA)" <ji...@apache.org> on 2008/04/18 19:18:21 UTC

[jira] Created: (MAHOUT-45) Matrix QR decomposition

Matrix QR decomposition
-----------------------

                 Key: MAHOUT-45
                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
             Project: Mahout
          Issue Type: Improvement
          Components: Matrix
            Reporter: Sergey Chickin
            Priority: Minor


Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780216#action_12780216 ] 

Jake Mannix commented on MAHOUT-45:
-----------------------------------

Jake's soon-to-be-recent work.  It's not in Mahout yet! :) 

But on this very topic - decomposer doesn't specifically have GS, Householder, or Givens-based traditional QR decompositions (parallellized on Hadoop or otherwise), so if those are still wanted, they won't come with the decomposer contribution (which is primarily SVD and eigen-decomposition (and approximations thereof) via Lanczos, GHA, and probabilistic methods).

> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590608#action_12590608 ] 

Ted Dunning commented on MAHOUT-45:
-----------------------------------


The numerical instability of GS is due to loss of orthogonality, not so much errors in normalization.

Householder and Givens approaches are about equal AFAIK, but the wikipedia article claims Givens is easier to parallelize.  I am not sure that applies for Map-Reduce.  Givens is definitely easier to apply in a sparse case since you don't have to deal with an entire (mostly zero) row or column at a time.

Golub and van Loan, p 219 (2nd edition) has a discussion that shows that the error for *modified* Gram-Schmidt is about k u where k is the condition number of the matrix and u is a single ULP while Householder and Givens approaches have error about the same as u.  This implies that hilbert matrices might be good test cases.



> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Sergey Chickin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590581#action_12590581 ] 

Sergey Chickin commented on MAHOUT-45:
--------------------------------------

So should I try implementing Housholder method? You have to normalize vectors in it also

> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Sergey Chickin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590686#action_12590686 ] 

Sergey Chickin commented on MAHOUT-45:
--------------------------------------

For Givens method if we have n-dimension matrix we have to calculate n(n-1)/2 rotation matrices each needs 5 trigonometric functions. And then multiply all those matrices by given to get Q matrix. This could be quite time complex comparing to other methods...

> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (MAHOUT-45) Matrix QR decomposition

Posted by "Sergey Chickin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591245#action_12591245 ] 

corr edited comment on MAHOUT-45 at 4/22/08 3:50 AM:
---------------------------------------------------------------

Seems like Givens rotations can be easily parallelized cause each rotation affects only 2 rows. Should I try implementing it for Hadoop? It could be useful for huge matrices

Givens rotations are not more difficult to implement than Gram-Schmidt. I've just implemented it and it seems to get more accurate values than Gram-Schmidt. Still, what is the best practice to calculate Q matrix? From A=QR?

      was (Author: corr):
    Seems like Givens rotations can be easily parallelized cause each rotation affects only 2 rows. Should I try implementing it for Hadoop? It could be useful for huge matrices
  
> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779873#action_12779873 ] 

Sean Owen commented on MAHOUT-45:
---------------------------------

Same question, any momentum for committing / integrating / maintaining this?

> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590549#action_12590549 ] 

Ted Dunning commented on MAHOUT-45:
-----------------------------------


Sergey,

Did you use the Gram-Schmidt, Givens or Householder method?



> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Sergey Chickin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591245#action_12591245 ] 

Sergey Chickin commented on MAHOUT-45:
--------------------------------------

Seems like Givens rotations can be easily parallelized cause each rotation affects only 2 rows. Should I try implementing it for Hadoop? It could be useful for huge matrices

> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590563#action_12590563 ] 

Ted Dunning commented on MAHOUT-45:
-----------------------------------


Gram-schmidt is definitely the easiest to implement, but it has some real problems with round-off.



> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590719#action_12590719 ] 

Ted Dunning commented on MAHOUT-45:
-----------------------------------


Not trig functions are need for Given's method.  The coefficients are indeed sines and cosines of a theoretical rotation angle, but you needn't ever compute the angle since you can compute the coefficients directly.

See http://www.cse.uiuc.edu/courses/cs554/notes/ for a good exposition, especially section 11.  Golub and van Loan's Matrix computation is also an excellent description.

Also, it is common to not actually construct the Q matrix explicitly.


> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-45) Matrix QR decomposition

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-45.
-----------------------------

    Resolution: Won't Fix

Sounds like the patch at hand is obsoleted by the recent matrix changes, and, largely superseded by it.

> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Sergey Chickin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590553#action_12590553 ] 

Sergey Chickin commented on MAHOUT-45:
--------------------------------------

Gram-Schmidt one

> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Sergey Chickin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590540#action_12590540 ] 

Sergey Chickin commented on MAHOUT-45:
--------------------------------------

That is the error. I just thought if there's some way to avoid normalizing vectors, etc. Still, I just wanted to warn everyone about possible mysterious results

> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591347#action_12591347 ] 

Ted Dunning commented on MAHOUT-45:
-----------------------------------


Sounds right to me.  Note that there may be some ordering constraints ... but I think you are right that Givens is considered easier to parallelize.

Block householder with k x k blocks may be easy to parallelize across k processes.  The idea would be that instead of distributing the different rotations to different processors, you would do the single block reflection in parallel.  The decomposition of each block could then be done completely in parallel as well.

Probably it would be good to outline the approach for both algorithms and see how it looks.



> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-45) Matrix QR decomposition

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780209#action_12780209 ] 

Ted Dunning commented on MAHOUT-45:
-----------------------------------


This is largely superseded by Jake's recent work.

> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (MAHOUT-45) Matrix QR decomposition

Posted by Ted Dunning <td...@veoh.com>.

Taking the vector norm using the naïve method of adding up the squares and
then taking the square root is known to be problematic in terms of overflow
for very large arguments.  There are always some small problems when using
square-roots since it can be hard to get accuracy to 1ULPS without standing
on your head.

Are you saying that this is the problem?

Can you say more about the error you see?


On 4/18/08 10:26 AM, "Sergey Chickin (JIRA)" <ji...@apache.org> wrote:

> I get an error caused by storing doubles in memory and resulting in
> calculations error(actually taking square root when calculating vector norm)


[jira] Updated: (MAHOUT-45) Matrix QR decomposition

Posted by "Sergey Chickin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Chickin updated MAHOUT-45:
---------------------------------

    Attachment: MAHOUT-45.patch

I've made an implementation according to http://en.wikipedia.org/wiki/QR_decomposition but I still have a problem: I get an error caused by storing doubles in memory and resulting in calculations error(actually taking square root when calculating vector norm). So we normally get non-integer determinant for matrix of integers(still it's close to desired value)... This can actually result in significant error for big matrix

> Matrix QR decomposition
> -----------------------
>
>                 Key: MAHOUT-45
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-45
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Sergey Chickin
>            Priority: Minor
>         Attachments: MAHOUT-45.patch
>
>
> Matrix QR decomposition and appropriate determinant calculator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.