You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by ak...@apache.org on 2017/02/02 23:36:41 UTC
svn commit: r1781487 - in
/mahout/site/mahout_cms/trunk/content/users/algorithms: d-als.mdtext
d-qr.mdtext d-spca.mdtext
Author: akm
Date: Thu Feb 2 23:36:41 2017
New Revision: 1781487
URL: http://svn.apache.org/viewvc?rev=1781487&view=rev
Log:
MAHOUT-1682 and 1686: SPCA and ALS pages.
Added:
mahout/site/mahout_cms/trunk/content/users/algorithms/d-als.mdtext
- copied unchanged from r1781457, mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext
mahout/site/mahout_cms/trunk/content/users/algorithms/d-spca.mdtext
- copied, changed from r1781457, mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext
Modified:
mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext
Modified: mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext?rev=1781487&r1=1781486&r2=1781487&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext Thu Feb 2 23:36:41 2017
@@ -3,11 +3,11 @@
## Intro
-Mahout has a distributed implementation of QR decomposition for tall thin matricies[1].
+Mahout has a distributed implementation of QR decomposition for tall thin matrices[1].
## Algorithm
-For the classic QR decomposition of the form `\(\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n}\)` a distributed version is fairly easily achieved if `\(\mathbf{A}\)` is tall and thin such that `\(\mathbf{A}^{\top}\mathbf{A}\)` fits in memory, i.e. *m* is large but *n* < ~5000 Under such circumstances, only `\(\mathbf{A}\)` and `\(\mathbf{Q}\)` are distributed matricies and `\(\mathbf{A^{\top}A}\)` and `\(\mathbf{R}\)` are in-core products. We just compute the in-core version of the Cholesky decomposition in the form of `\(\mathbf{LL}^{\top}= \mathbf{A}^{\top}\mathbf{A}\)`. After that we take `\(\mathbf{R}= \mathbf{L}^{\top}\)` and `\(\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}\)`. The latter is easily achieved by multiplying each verticle block of `\(\mathbf{A}\)` by `\(\left(\mathbf{L}^{\top}\right)^{-1}\)`. (There is no actual matrix inversion happening).
+For the classic QR decomposition of the form `\(\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n}\)` a distributed version is fairly easily achieved if `\(\mathbf{A}\)` is tall and thin such that `\(\mathbf{A}^{\top}\mathbf{A}\)` fits in memory, i.e. *m* is large but *n* < ~5000 Under such circumstances, only `\(\mathbf{A}\)` and `\(\mathbf{Q}\)` are distributed matrices and `\(\mathbf{A^{\top}A}\)` and `\(\mathbf{R}\)` are in-core products. We just compute the in-core version of the Cholesky decomposition in the form of `\(\mathbf{LL}^{\top}= \mathbf{A}^{\top}\mathbf{A}\)`. After that we take `\(\mathbf{R}= \mathbf{L}^{\top}\)` and `\(\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}\)`. The latter is easily achieved by multiplying each vertical block of `\(\mathbf{A}\)` by `\(\left(\mathbf{L}^{\top}\right)^{-1}\)`. (There is no actual matrix inversion happening).
Copied: mahout/site/mahout_cms/trunk/content/users/algorithms/d-spca.mdtext (from r1781457, mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext)
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/algorithms/d-spca.mdtext?p2=mahout/site/mahout_cms/trunk/content/users/algorithms/d-spca.mdtext&p1=mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext&r1=1781457&r2=1781487&rev=1781487&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/algorithms/d-qr.mdtext (original)
+++ mahout/site/mahout_cms/trunk/content/users/algorithms/d-spca.mdtext Thu Feb 2 23:36:41 2017
@@ -1,14 +1,13 @@
-# Distributed Cholesky QR
+# Distributed Stochastic PCA
## Intro
-Mahout has a distributed implementation of QR decomposition for tall thin matricies[1].
+Mahout has a distributed implementation of Stochastic PCA
-## Algorithm
-
-For the classic QR decomposition of the form `\(\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n}\)` a distributed version is fairly easily achieved if `\(\mathbf{A}\)` is tall and thin such that `\(\mathbf{A}^{\top}\mathbf{A}\)` fits in memory, i.e. *m* is large but *n* < ~5000 Under such circumstances, only `\(\mathbf{A}\)` and `\(\mathbf{Q}\)` are distributed matricies and `\(\mathbf{A^{\top}A}\)` and `\(\mathbf{R}\)` are in-core products. We just compute the in-core version of the Cholesky decomposition in the form of `\(\mathbf{LL}^{\top}= \mathbf{A}^{\top}\mathbf{A}\)`. After that we take `\(\mathbf{R}= \mathbf{L}^{\top}\)` and `\(\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}\)`. The latter is easily achieved by multiplying each verticle block of `\(\mathbf{A}\)` by `\(\left(\mathbf{L}^{\top}\right)^{-1}\)`. (There is no actual matrix inversion happening).
+## Motivation
+Stochastic SVD method in Mahout produces reduced-rank Singular Value Decomposition output in its strict mathematical definition: `\(\mathbf{A}\approx\mathbf{UΣV}\)`, i.e. it creates outputs for matrices `\(\mathbf{U},\mathbf{V}, and \mathbf{Σ}\)`, each of which may be requested individually. The desired rank of decomposition, henceforth denoted as *k*`\(\in\mathbb{N}_1\)`, is a parameter of the algorithm. The singular values inside diagonal matrix `\(\Sigma\)` satisfyσi+1≤σi∀i∈[1,k−1], i.e. sorted from biggest tosmallest. Cases of rank deficiency rank(A)< karehandled by producing 0s in singular value positionsonce deficiency takes place.
## Implementation