You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by ap...@apache.org on 2015/03/28 00:01:11 UTC
svn commit: r1669702 - in /mahout/site/mahout_cms/trunk:
content/users/algorithms/d-ssvd.mdtext templates/standard.html
Author: apalumbo
Date: Fri Mar 27 23:01:11 2015
New Revision: 1669702
URL: http://svn.apache.org/r1669702
Log:
add new d-ssvd page
Added:
mahout/site/mahout_cms/trunk/content/users/algorithms/d-ssvd.mdtext
Modified:
mahout/site/mahout_cms/trunk/templates/standard.html
Added: mahout/site/mahout_cms/trunk/content/users/algorithms/d-ssvd.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/algorithms/d-ssvd.mdtext?rev=1669702&view=auto
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/algorithms/d-ssvd.mdtext (added)
+++ mahout/site/mahout_cms/trunk/content/users/algorithms/d-ssvd.mdtext Fri Mar 27 23:01:11 2015
@@ -0,0 +1,134 @@
+# Distributed Stochastic Singular Value Decomposition
+
+
+## Intro
+
+Mahout has a distributed implementation of Stochastic Singular Value Decomposition [1].
+
+## Modified SSVD Algorithm
+
+Given an `\(m\times n\)`
+matrix `\(\mathbf{A}\)`, a target rank `\(k\in\mathbb{N}_{1}\)`
+, an oversampling parameter `\(p\in\mathbb{N}_{1}\)`,
+and the number of additional power iterations `\(q\in\mathbb{N}_{0}\)`,
+this procedure computes an `\(m\times\left(k+p\right)\)`
+SVD `\(\mathbf{A\approx U}\boldsymbol{\Sigma}\mathbf{V}^{\top}\)`:
+
+ 1. Create seed for random `\(n\times\left(k+p\right)\)`
+ matrix `\(\boldsymbol{\Omega}\)`. The seed defines matrix `\(\mathbf{\Omega}\)`
+ using Gaussian unit vectors per one of suggestions in [Halko, Martinsson, Tropp].
+
+ 2. `\(\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\mathbb{R}^{m\times\left(k+p\right)}\)`
+
+ 3. Column-orthonormalize `\(\mathbf{Y}\rightarrow\mathbf{Q}\)`
+ by computing thin decomposition `\(\mathbf{Y}=\mathbf{Q}\mathbf{R}\)`.
+ Also, `\(\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)`; denoted as `\(\mathbf{Q}=\mbox{qr}\left(\mathbf{Y}\right).\mathbf{Q}\)`
+
+ 4. `\(\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B}\in\mathbb{R}^{\left(k+p\right)\times n}\)`.
+
+ 5. If `\(q>0\)`
+ repeat: for `\(i=1..q\)`:
+ `\(\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}\)`
+ (power iterations step).
+
+ 6. Compute Eigensolution of a small Hermitian `\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}\)`,
+ `\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)`.
+
+ 7. Singular values `\(\mathbf{\boldsymbol{\Sigma}}=\boldsymbol{\Lambda}^{0.5}\)`,
+ or, in other words, `\(s_{i}=\sqrt{\sigma_{i}}\)`.
+
+ 8. If needed, compute `\(\mathbf{U}=\mathbf{Q}\hat{\mathbf{U}}\)`.
+
+ 9. If needed, compute `\(\mathbf{V}=\mathbf{B}_{q}^{\top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}\)`.
+Another way is `\(\mathbf{V}=\mathbf{A}^{\top}\mathbf{U}\boldsymbol{\Sigma}^{-1}\)`.
+
+
+
+
+## Implementation
+
+Mahout `dssvd(...)` is implemented in the mahout `math-scala` algebraic optimizer which translates Mahout's R-like linear algebra operators into a physical plan for both Spark and H2O distributed engines.
+
+ def dssvd[K: ClassTag](drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0):
+ (DrmLike[K], DrmLike[Int], Vector) = {
+
+ val drmAcp = drmA.checkpoint()
+
+ val m = drmAcp.nrow
+ val n = drmAcp.ncol
+ assert(k <= (m min n), "k cannot be greater than smaller of m, n.")
+ val pfxed = safeToNonNegInt((m min n) - k min p)
+
+ // Actual decomposition rank
+ val r = k + pfxed
+
+ // We represent Omega by its seed.
+ val omegaSeed = RandomUtils.getRandom().nextInt()
+
+ // Compute Y = A*Omega.
+ var drmY = drmAcp.mapBlock(ncol = r) {
+ case (keys, blockA) =>
+ val blockY = blockA %*% Matrices.symmetricUniformView(n, r, omegaSeed)
+ keys -> blockY
+ }
+
+ var drmQ = dqrThin(drmY.checkpoint())._1
+
+ // Checkpoint Q if last iteration
+ if (q == 0) drmQ = drmQ.checkpoint()
+
+ var drmBt = drmAcp.t %*% drmQ
+
+ // Checkpoint B' if last iteration
+ if (q == 0) drmBt = drmBt.checkpoint()
+
+ for (i <- 0 until q) {
+ drmY = drmAcp %*% drmBt
+ drmQ = dqrThin(drmY.checkpoint())._1
+
+ // Checkpoint Q if last iteration
+ if (i == q - 1) drmQ = drmQ.checkpoint()
+
+ drmBt = drmAcp.t %*% drmQ
+
+ // Checkpoint B' if last iteration
+ if (i == q - 1) drmBt = drmBt.checkpoint()
+ }
+
+ val (inCoreUHat, d) = eigen(drmBt.t %*% drmBt)
+ val s = d.sqrt
+
+ // Since neither drmU nor drmV are actually computed until actually used
+ // we don't need the flags instructing compute (or not compute) either of the U,V outputs
+ val drmU = drmQ %*% inCoreUHat
+ val drmV = drmBt %*% (inCoreUHat %*%: diagv(1 /: s))
+
+ (drmU(::, 0 until k), drmV(::, 0 until k), s(0 until k))
+ }
+
+Note: As a side effect of checkpointing, U and V values are returned as logical operators (i.e. they are neither checkpointed nor computed). Therefore there is no physical work actually done to compute `\(\mathbf{U}\)` or `\(\mathbf{V}\)` until they are used in a subsequent expression.
+
+
+## Usage
+
+The scala `dssvd(...)` method can easily be called in any Spark or H2O application built with the `math-scala` library and the corresponding `Spark` or `H2O` engine module as follows:
+
+ import org.apache.mahout.math._
+ import org.decompsitions._
+
+
+ val(drmU, drmV, s) = dssvd(drma, k = 40, q = 1)
+
+
+## References
+
+[1]: [Mahout Scala and Mahout Spark Bindings for Linear Algebra Subroutines](http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf)
+
+[2]: [Halko, Martinsson, Tropp](http://arxiv.org/abs/0909.4061)
+
+[3]: [Mahout Spark and Scala Bindings](http://mahout.apache.org/users/sparkbindings/home.html)
+
+[4]: [Randomized methods for computing low-rank
+approximations of matrices](http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf)
+
+
Modified: mahout/site/mahout_cms/trunk/templates/standard.html
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/templates/standard.html?rev=1669702&r1=1669701&r2=1669702&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/templates/standard.html (original)
+++ mahout/site/mahout_cms/trunk/templates/standard.html Fri Mar 27 23:01:11 2015
@@ -165,8 +165,9 @@
</li>
<li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown">Algorithms<b class="caret"></b></a>
<ul class="dropdown-menu">
- <li class="nav-header">Matrix Decomposition</li>
+ <li class="nav-header">Distributed Matrix Decomposition</li>
<li><a href="/users/algorithms/d-qr.html">Cholesky QR</a></li>
+ <li><a href="/users/algorithms/d-ssvd.html">SSVD</a></li>
<li class="nav-header">Recommendations</li>
<li><a href="/users/algorithms/recommender-overview.html">Recommender Overview</a></li>
<li><a href="/users/algorithms/intro-cooccurrence-spark.html">Intro to cooccurrence-based<br/> recommendations with Spark</a></li>