You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Derek O'Callaghan (JIRA)" <ji...@apache.org> on 2011/03/01 17:04:36 UTC

[jira] Commented: (MAHOUT-369) Issues with DistributedLanczosSolver output

    [ https://issues.apache.org/jira/browse/MAHOUT-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000929#comment-13000929 ] 

Derek O'Callaghan commented on MAHOUT-369:
------------------------------------------

Hi Danny,

I've tried out your testLanczosSolver2() test, but I get different output to yours as the eigenvalues are in the reverse order to what you got, i.e. (I've added a line to LanczosSolver to also print the realEigen eigenvector):

INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal auxiliary matrix.
01-Mar-2011 15:56:56 org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector 0 found with eigenvalue 0.0
01-Mar-2011 15:56:56 org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector 0 - {0:0.5593439330819562,1:0.7330112630790516,2:0.3870773213760546}
01-Mar-2011 15:56:56 org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector 1 found with eigenvalue 0.03137295830774178
01-Mar-2011 15:56:56 org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector 1 - {0:-0.8010370751145115,1:0.35784374789842055,2:0.47988275274487646}
01-Mar-2011 15:56:56 org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector 2 found with eigenvalue 42.617610634772475
01-Mar-2011 15:56:56 org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector 2 - {0:-0.21324626331168514,1:0.5784827916967494,2:-0.7873269275811279}
01-Mar-2011 15:56:56 org.slf4j.impl.JCLLoggerAdapter info
INFO: LanczosSolver finished.

When I debug, I see that eigenVals contains [0.0, 0.03137295830774178, 42.617610634772475, 131.25526355941963]. I wanted to check if you'd made a change to LanczosSolver to reverse the order of the eigenvalues, before you generated the test output in your last comment? I don't see any changes to this file in the patch file attached here.

Thanks,

Derek

> Issues with DistributedLanczosSolver output
> -------------------------------------------
>
>                 Key: MAHOUT-369
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-369
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3, 0.4
>            Reporter: Danny Leshem
>            Assignee: Jake Mannix
>              Labels: DistributedLanczosSolver, decomposer
>             Fix For: 0.5
>
>         Attachments: MAHOUT-369.patch
>
>
> DistributedLanczosSolver (line 99) claims to persist eigenVectors.numRows() vectors.
> {code}
>     log.info("Persisting " + eigenVectors.numRows() + " eigenVectors and eigenValues to: " + outputPath);
> {code}
> However, a few lines later (line 106) we have
> {code}
>     for(int i=0; i<eigenVectors.numRows() - 1; i++) {
>         ...
>     }
> {code}
> which only persists eigenVectors.numRows()-1 vectors.
> Seems like the most significant eigenvector (i.e. the one with the largest eigenvalue) is omitted... off by one bug?
> Also, I think it would be better if the eigenvectors are persisted in *reverse* order, meaning the most significant vector is marked "0", the 2nd most significant is marked "1", etc.
> This, for two reasons:
> 1) When performing another PCA on the same corpus (say, with more principal componenets), corresponding eigenvalues can be easily matched and compared.  
> 2) Makes it easier to discard the least significant principal components, which for Lanczos decomposition are usually garbage.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira