You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org> on 2011/09/01 09:00:11 UTC

[jira] [Commented] (MAHOUT-796) Modified power iterations in existing SSVD code

    [ https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095146#comment-13095146 ] 

Dmitriy Lyubimov commented on MAHOUT-796:
-----------------------------------------

Ok first implementation with QR solvers is ready, added -q parameter. (all in git remote git@github.com:dlyubimov/mahout-commits branch MAHOUT-796)

Did not have time to test distributed version and larger inputs. on a toy input 1000x2000, k=3, p=10 (optimal 10,4,1,(0.1) ): 

q=0: 
--SSVD solver singular values:
svs: 9.998472  3.993542  0.990456  0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  

q=1: (+2 more sequential steps):
--SSVD solver singular values:
svs: 10.000000  4.000000  0.999999  0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  

So, much better (although much slower as well). 
I of course understand that each run exhibit noise, so to prove it works better consistently i need to run more than just 2 attempts. But that's encouraging. it worked (and actually at first attempt)!

I tried some optimization to handle sparse cases a little better as well, i guess it taxes densier cases a little bit.

So this will be put on hold until i add Cholesky option and then i will have to return to this issue to enable the same schema but Y'Y+ Cholesky path.

I refactored QR steps into standalone OutputCollector implementations so that they can now be more easily be pipelined inside mappers and reducers so code is much more readable now. 

So after a few tests and final fixes i think it is a commit worthy but it has dependency on Ted's refactoring MAHOUT-790 pushed to trunk.



> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
>                 Key: MAHOUT-796
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-796
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.5
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>              Labels: SSVD
>             Fix For: 0.6
>
>
> Nathan Halko contacted me and pointed out importance of availability of power iterations and their significant effect on accuracy of smaller eigenvalues and noise attenuation. 
> Essentially, we would like to introduce yet another job parameter, q, that governs amount of optional power iterations. The suggestion how to modify the algorithm is outlined here : https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper in the sense that additional orthogonalization performed after each iteration. Nathan points out that that improves errors in smaller eigenvalues a lot (If i interpret it right). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira