You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org> on 2011/08/29 17:32:42 UTC

[jira] [Created] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
-----------------------------------------------------------------

                 Key: MAHOUT-797
                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
             Project: Mahout
          Issue Type: Improvement
            Reporter: Dmitriy Lyubimov


Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 

I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167396#comment-13167396 ] 

Dmitriy Lyubimov commented on MAHOUT-797:
-----------------------------------------

Thank you. 

rank deficiency of the input, yes, that's what i thought. but i can't figure how it may end up being singular, given that i generated it randomly. random input is virtually guaranteed to be non-singular. Ok must be something with my test program.

I guess i am also a little bit confused about this whole positive definite thing. Y'Y is guaranteed to be Hermitian but is it guaranteed to be positive definite? At least in my experiments Y'Y doesn't seem to ever end up positive definite.

                
> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5, 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: Backlog
>
>         Attachments: MAHOUT-797.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-797:
------------------------------------

    Fix Version/s: Backlog
    
> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5, 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: Backlog
>
>         Attachments: MAHOUT-797.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167396#comment-13167396 ] 

Dmitriy Lyubimov edited comment on MAHOUT-797 at 12/12/11 8:54 AM:
-------------------------------------------------------------------

Thank you. 

rank deficiency of the input, yes, that's what i thought. but i can't figure how it may end up being singular, given that i generated it randomly. random input is virtually guaranteed to be non-singular. Ok must be something with my test program.

I guess i am also a little bit confused about this whole positive definite thing. Y'Y is guaranteed to be Hermitian but is it guaranteed to be positive definite? At least in my experiments Y'Y doesn't seem to ever end up positive definite.

You are right, some of singular values were zero. So now it runs ok if i step thru all steps it but still gives me degenerate singular system error if i call it as a routine. I am still missing something.

                
      was (Author: dlyubimov):
    Thank you. 

rank deficiency of the input, yes, that's what i thought. but i can't figure how it may end up being singular, given that i generated it randomly. random input is virtually guaranteed to be non-singular. Ok must be something with my test program.

I guess i am also a little bit confused about this whole positive definite thing. Y'Y is guaranteed to be Hermitian but is it guaranteed to be positive definite? At least in my experiments Y'Y doesn't seem to ever end up positive definite.

                  
> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5, 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: Backlog
>
>         Attachments: MAHOUT-797.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-797:
------------------------------------

    Description: 
Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 

Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.

I also want to fix what's left unfixed in MAHOUT-638.

  was:
Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 

I also want to fix what's left unfixed in MAHOUT-638.


> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Dmitriy Lyubimov
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on MAHOUT-797 started by Dmitriy Lyubimov.

> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov reassigned MAHOUT-797:
---------------------------------------

    Assignee: Dmitriy Lyubimov

> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-797:
------------------------------------

        Fix Version/s: 0.6
    Affects Version/s: 0.5

> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159469#comment-13159469 ] 

Dmitriy Lyubimov edited comment on MAHOUT-797 at 11/29/11 7:38 PM:
-------------------------------------------------------------------

so what i think is that using R'^-1YA route is more promising overall for B_0 pipeline. 

The reason being that the version that uses explicit Q must pull in both Q and A while the version that would use R^-1 would only go over A and Omega, but Omega is imaginary. So we trade slightly more flops for less network I/O. This route may turn out significantly more beneficial if we believe that size of Q >> size of A. I imagine this is not too common though, but i have high hopes. 

we may eventually even deploy some automatic default based on subsampling A for sparsity. 

For power iterations pipeline perhaps looses some of it since we don't have random matrix in the works anymore. It seems possible that i might be able to do both Y_i = AB'_i-1 and Y'_i Y_i   in one MR huge step and then pull in Y_i as a side info, but it would require producing Y output the same way as A is split which is technically possible (similarly how it is done with map-side first qr step). 

But all of that changes the flow significantly enough to practically build a completely new set of jobs (which is ok). 
 
                
      was (Author: dlyubimov):
    so what i think is that using R'^-1YA route is more promising overall for B_0 pipeline. 

The reason being that the version that uses explicit Q must pull in both Q and A while the version that would use R^-1 would only go over A and Omega, but Omega is imaginary. So we trade slightly more flops for less network I/O. This route may turn out significantly more beneficial if we believe that size of Q >> size of A. I imagine this is not too common though, but i have high hopes. 

we may eventually even deploy some automatic default based on subsampling A for sparsity. 

For power iterations pipeline perhaps looses some of it since we don't have random matrix in the works anymore. It seems possible that i might be able to do both Y_i = AB'_i-1 and Y'_i Y_i   in one MR step there without ever forming AB' and that would be a huge plus instead. 

But all of that changes the flow significantly enough to practically build a completely new set of jobs (which is ok). 
 
                  
> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5, 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: Backlog
>
>         Attachments: MAHOUT-797.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092917#comment-13092917 ] 

Ted Dunning commented on MAHOUT-797:
------------------------------------

OK.  So I need to push out MAHOUT-790 and MAHOUT-792

> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Dmitriy Lyubimov
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-797:
------------------------------------

    Attachment: MAHOUT-797.pdf

WIP on execution plan. 

Ted, would be nice if you committed CholeskyDecomposition class with its dependencies form MAHOUT-792, I can't seem to find it in the trunk.

> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-797.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092922#comment-13092922 ] 

Dmitriy Lyubimov commented on MAHOUT-797:
-----------------------------------------

er... i keep rebasing this on your new-stochastic branch. so no hurry. If you have critical fixes, just push it to github/tdunning/new-stochastic, i will pick them up. This is forked off your new-stochastic branch (to include both790 and 792 already). 

It would be good if you could push it to github, but no need to push to svn yet.

> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Dmitriy Lyubimov
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-797:
------------------------------------

    Affects Version/s: 0.6
        Fix Version/s:     (was: 0.6)

removed 0.6 as a target. Per conversation on jira on the list.
                
> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5, 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>         Attachments: MAHOUT-797.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159469#comment-13159469 ] 

Dmitriy Lyubimov commented on MAHOUT-797:
-----------------------------------------

so what i think is that using R'^-1YA route is more promising overall for B_0 pipeline. 

The reason being that the version that uses explicit Q must pull in both Q and A while the version that would use R^-1 would only go over A and Omega, but Omega is imaginary. So we trade slightly more flops for less network I/O. This route may turn out significantly more beneficial if we believe that size of Q >> size of A. I imagine this is not too common though, but i have high hopes. 

we may eventually even deploy some automatic default based on subsampling A for sparsity. 

For power iterations pipeline perhaps looses some of it since we don't have random matrix in the works anymore. It seems possible that i might be able to do both Y_i = AB'_i-1 and Y'_i Y_i   in one MR step there without ever forming AB' and that would be a huge plus instead. 

But all of that changes the flow significantly enough to practically build a completely new set of jobs (which is ok). 
 
                
> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5, 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: Backlog
>
>         Attachments: MAHOUT-797.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167379#comment-13167379 ] 

Ted Dunning commented on MAHOUT-797:
------------------------------------

The problem you are having is that a rank deficient matrix doesn't have an inverse because some of the singular values are zero.

In the Cholesky decomposition of a rank deficient matrix, each step proceeds by dividing by the next remaining diagonal element.  If that element is zero, then things cannot proceed.  If you are lucky, you already have the basis formed and could just quit.  Otherwise, there is more work to do, but that work depends on doing the step with the zero diagonal.  With pivoting, you work on the columns/rows of the remaining matrix out of order so that you guarantee to complete the basis.

In general, you should never compute the inverse of any matrix.  Instead, you should use some sort of solver.  In the case of a triangular matrix, the solver is a back-substitution.
                
> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5, 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: Backlog
>
>         Attachments: MAHOUT-797.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

Posted by "Dmitriy Lyubimov (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167327#comment-13167327 ] 

Dmitriy Lyubimov commented on MAHOUT-797:
-----------------------------------------

Hm... can't seem to get R to compute either Cholesky or inverse. Here is the function . 

When i run it without something called pivot, i get "matrix not positive definite". with pivot the computation seems to proceed with warning but the system is reported as singular at attempt to obtain the inverse: 

{quote}
Error in solve.default(R) : 
  system is computationally singular: reciprocal condition number = 1.91281e-25
In addition: Warning messages:
1: In chol.default(yty, pivot = T) : matrix not positive definite
2: In chol.default(yty, pivot = T) : matrix not positive definite
{quote}

{code:title=svd.R}

#SSVD with Q=YR^-1 substitute.
# this is just a simulation, because it is suboptimal to verify the actual result
ssvd.svd1 <- function(x, k, p=25, qiter=0 ) { 

a <- as.matrix(x)
m <- nrow(a)
n <- ncol(a)
p <- min( min(m,n)-k,p)
r <- k+p

omega <- matrix ( rnorm(r*n), nrow=n, ncol=r)

# in reality we of course don't need to form and persist y
# but this is just verification
y <- a %*% omega

yty <- t(y) %*% y
R <- chol(yty, pivot = T)
q <- y %*% solve(R)

b<- t( q ) %*% a   

#power iterations
for ( i in 1:qiter ) { 
  y <- a %*% t(b)

  yty <- t(y) %*% y
  R <- chol(yty, pivot = T)
  q <- y %*% solve(R)
  b <- t(q) %*% a
}

bbt <- b %*% t(b)

e <- eigen(bbt, symmetric=T)

res <- list()

res$svalues <- sqrt(e$values)[1:k]
uhat=e$vectors[1:k,1:k]

res$u <- (q %*% e$vectors)[,1:k]
res$v <- (t(b) %*% e$vectors %*% diag(1/e$values))[,1:k]

return(res)
}
{code}
                
> MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A
> -----------------------------------------------------------------
>
>                 Key: MAHOUT-797
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-797
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5, 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: Backlog
>
>         Attachments: MAHOUT-797.pdf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Since alternative flow using Cholesky decomposition is extremely easy to add to existing computations of BB', I am thinking of just adding an option that chooses between B-pipeline with QR step and B-pipeline with Y'Y-Cholesky step. 
> Ongoing work and some initial code (Y'Y step) is here https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-797.
> I also want to fix what's left unfixed in MAHOUT-638.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira