You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by "ahmed.nagy" <ah...@hotmail.com> on 2011/01/27 15:30:46 UTC

Preprocess Matrix in SSVD with random projections

I am working on matrix factorization and I read the the paper
http://arxiv.org/abs/0909.4061 about random projections. I am implementing
distributed NMF.   I tried to understand work of Dimitry Liubimov
https://issues.apache.org/jira/browse/MAHOUT-376 about SSVD I think it is
very relevant in the part where he pre process the matrix before starting
the decomposition. Could anybody help me pre process the input matrix to
produce a smaller one after doing the random projections to decrease its
size. Also how could i map back the results from the decomposed matrices to
the real data since they will be in another space. Is there a class in
Mahout that i can use that will do that. 
Regards
Ahmed Nagy

-----
Ahmed Nagy
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Preprocess-Matrix-in-SSVD-with-random-projections-tp2362847p2362847.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Preprocess Matrix in SSVD with random projections

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

Ahmed,

I can certainly try to answer your questions regarding preprocessing steps.
There are basically two mods/additions there compared to original method:

first is to use eigen decomposition instead of SVD on a small hermitian
matrix of small dimensions k+p (i.e. just 500x500 symmetric) .

Second trick to enable the scale is to produce Y=QR decomposition using
blocking and map reduce for matrices exceeding what one could put in RAM.
Essentially it is a streaming process with some hierarchical merges. This
algorithm builds by induction.

Regarding the results, singluar values are actual singular values of the
original decomposition so they don't require any postprocessing (aside from
taking a sqrt).

U and V matrices require simple matrix multiplication, there's not much new
here, Mahout's DistributedRowMatrix does it too (except in case of SSVD,
this can be done as map-only process due to extremely small size of
 singular vector matrix of the reduced matrix). The forumulas used are all
there, i can certainly try to interpret them for you, i just need to
understand where you got stuck.

-d
On Thu, Jan 27, 2011 at 6:30 AM, ahmed.nagy <ah...@hotmail.com>wrote:

>
>