You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by bu...@apache.org on 2017/02/03 23:10:24 UTC

svn commit: r1006164 - in /websites/staging/mahout/trunk/content: ./ users/algorithms/d-spca.html

Author: buildbot
Date: Fri Feb  3 23:10:24 2017
New Revision: 1006164

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/algorithms/d-spca.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Feb  3 23:10:24 2017
@@ -1 +1 @@
-1781612
+1781616

Modified: websites/staging/mahout/trunk/content/users/algorithms/d-spca.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/algorithms/d-spca.html (original)
+++ websites/staging/mahout/trunk/content/users/algorithms/d-spca.html Fri Feb  3 23:10:24 2017
@@ -283,27 +283,27 @@ h2:hover > .headerlink, h3:hover > .head
 <h2 id="intro">Intro<a class="headerlink" href="#intro" title="Permanent link">&para;</a></h2>
 <p>Mahout has a distributed implementation of Stochastic PCA[1].</p>
 <h2 id="algorithm">Algorithm<a class="headerlink" href="#algorithm" title="Permanent link">&para;</a></h2>
-<p>Given an <em>m</em> <code>\(\times\)</code> <em>n</em> matrix <code>\(\mathbf{A}\)</code>, a target rank <em>k</em>, and an oversampling parameter <em>p</em>, this procedure computes a <em>k</em>-rank PCA by finding the unknowns in <code>\(\mathbf{A}−1\mu \ge U\SigmaV\)</code>:
-(1) Create seed for random <em>n</em> <code>\(\times\)</code> <em>(k+p)</em> matrix <code>\(\Omega\)</code>.
-(2) <code>\(s_\Omega \leftarrow \Omega^\top \mu\)</code>.
-(3) <code>\(\mathbf{Y_0} \leftarrow A\Omega − 1(s_\Omega)^\top, Y \in \mathbb{R}^(m\times(k+p))\)</code>.
-(4) Column-orthonormalize <code>\(\mathbf{Y_0} \rightarrow \mathbf{Q}\)</code> by computing thin decomposition <code>\(\mathbf{Y_0} = \mathbf{QR}\)</code>. Also, <code>\(\mathbf{Q}\in\mathbb{R}^(m\times(k+p)), \mathbf{R}\in\mathbb{R}^((k+p)\times(k+p))\)</code>.
-(5) <code>\(s_Q \leftarrow Q^\top 1\)</code>.
-(6) <code>\(\mathbf{B_0} \leftarrow Q^\top A: B \in \mathbb{R}^((k+p)\times n)\)</code>.
-(7) <code>\(s_B \leftarrow (B_0)^\top \mu\)</code>.
-(8) For <em>i</em> in 1..<em>q</em> repeat (power iterations):
-  (a) For <em>j</em> in 1..<em>n</em> <code>\(apply(B_(i−1))_(∗j) \leftarrow (B_(i−1))_(∗j)−\mu_j s_Q\)</code>.
-  (b) <code>\(\mathbf{Y_i) \leftarrow \mathbf{(AB_(i−1)^\top)−1(s_B−\mu^\top \mu s_Q^\top)}\)</code>.
-  (c) Column-orthonormalize <code>\(\mathbf{Y_i} \rightarrow \mathbf{Q}\)</code> by computing thin decomposition <code>\(\mathbf{Y_i = QR}\)</code>.
-  (d) <code>\(\mathbf{s_Q \leftarrow Q^\top 1}\)</code>.
-  (e) <code>\(\mathbf{B_i \leftarrow Q^\top A}\)</code>.
-  (f) <code>\(\mathbf{s_B \leftarrow (B_i)^\top \mu}\)</code>.
-(9) Let <code>\(\mathbf{C \triangleq s_Q (s_B)^\top}\)</code>. <code>\(\mathbf{M \leftarrow B_q (B_q)^\top − C − C^\top + \mu^\top \mu s_Q (s_Q)^\top}\)</code>.
-(10) Compute an eigensolution of the small symmetric <code>\(\mathbf{M = \hat{U} \Lambda \hat{U}^\top: M \in \mathbb{R}^((k+p)\times(k+p))}\)</code>.
-(11) The singular values <code>\(\Sigma = \Lambda^(\circ 0.5)\)</code>, or, in other words, <code>\(\mathbf{\sigma_i= \sqrt{\lambda_i}}\)</code>.
-(12) If needed, compute <code>\(\mathbf{U = Q\hat{U}}\)</code>.
-(13) If needed, compute <code>\(\mathbf{V = B^\top \hat{U} \Sigma^(−1)}\)</code>. Another way is <code>\(\mathbf{V = A^\top U\Sigma^(−1)}\)1.
-(14) If needed, items converted to the PCA space can be computed as</code>(\mathbf{U\Sigma})`.</p>
+<p>Given an <em>m</em> <code>\(\times\)</code> <em>n</em> matrix <code>\(\mathbf{A}\)</code>, a target rank <em>k</em>, and an oversampling parameter <em>p</em>, this procedure computes a <em>k</em>-rank PCA by finding the unknowns in <code>\(\mathbf{A−1\mu^\top \ge U\Sigma V}\)</code>:
+1. Create seed for random <em>n</em> <code>\(\times\)</code> <em>(k+p)</em> matrix <code>\(\Omega\)</code>.
+2. <code>\(s_\Omega \leftarrow \Omega^\top \mu\)</code>.
+3. <code>\(\mathbf{Y_0} \leftarrow A\Omega − 1(s_\Omega)^\top, Y \in \mathbb{R}^(m\times(k+p))\)</code>.
+4. Column-orthonormalize <code>\(\mathbf{Y_0} \rightarrow \mathbf{Q}\)</code> by computing thin decomposition <code>\(\mathbf{Y_0} = \mathbf{QR}\)</code>. Also, <code>\(\mathbf{Q}\in\mathbb{R}^(m\times(k+p)), \mathbf{R}\in\mathbb{R}^((k+p)\times(k+p))\)</code>.
+5. <code>\(s_Q \leftarrow Q^\top 1\)</code>.
+6. <code>\(\mathbf{B_0} \leftarrow Q^\top A: B \in \mathbb{R}^((k+p)\times n)\)</code>.
+7. <code>\(s_B \leftarrow (B_0)^\top \mu\)</code>.
+8. For <em>i</em> in 1..<em>q</em> repeat (power iterations):
+    - For <em>j</em> in 1..<em>n</em> <code>\(apply(B_(i−1))_(∗j) \leftarrow (B_(i−1))_(∗j)−\mu_j s_Q\)</code>.
+    - <code>\(\mathbf{Y_i) \leftarrow \mathbf{(AB_(i−1)^\top)−1(s_B−\mu^\top \mu s_Q^\top)}\)</code>.
+    - Column-orthonormalize <code>\(\mathbf{Y_i} \rightarrow \mathbf{Q}\)</code> by computing thin decomposition <code>\(\mathbf{Y_i = QR}\)</code>.
+    - <code>\(\mathbf{s_Q \leftarrow Q^\top 1}\)</code>.
+    - <code>\(\mathbf{B_i \leftarrow Q^\top A}\)</code>.
+    - <code>\(\mathbf{s_B \leftarrow (B_i)^\top \mu}\)</code>.
+9. Let <code>\(\mathbf{C \triangleq s_Q (s_B)^\top}\)</code>. <code>\(\mathbf{M \leftarrow B_q (B_q)^\top − C − C^\top + \mu^\top \mu s_Q (s_Q)^\top}\)</code>.
+10. Compute an eigensolution of the small symmetric <code>\(\mathbf{M = \hat{U} \Lambda \hat{U}^\top: M \in \mathbb{R}^((k+p)\times(k+p))}\)</code>.
+11. The singular values <code>\(\Sigma = \Lambda^(\circ 0.5)\)</code>, or, in other words, <code>\(\mathbf{\sigma_i= \sqrt{\lambda_i}}\)</code>.
+12. If needed, compute <code>\(\mathbf{U = Q\hat{U}}\)</code>.
+13. If needed, compute <code>\(\mathbf{V = B^\top \hat{U} \Sigma^(−1)}\)</code>. Another way is <code>\(\mathbf{V = A^\top U\Sigma^(−1)}\)1.
+14. If needed, items converted to the PCA space can be computed as</code>(\mathbf{U\Sigma})`.</p>
 <h2 id="implementation">Implementation<a class="headerlink" href="#implementation" title="Permanent link">&para;</a></h2>
 <p>Mahout <code>dspca(...)</code> is implemented in the mahout <code>math-scala</code> algebraic optimizer which translates Mahout's R-like linear algebra operators into a physical plan for both Spark and H2O distributed engines.</p>
 <p>def dspca<a href="drmA: DrmLike[K], k: Int, p: Int = 15, q: Int = 0">K</a>:
@@ -437,7 +437,7 @@ val drmV = drmBt %*% (inCoreUHat %*% dia
 
 <p>Note the parameter is optional and its default value is zero.</p>
 <h2 id="references">References<a class="headerlink" href="#references" title="Permanent link">&para;</a></h2>
-<p>[1]: <a href="https://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785">"Apache Mahout: Beyond MapReduce; Distributed Algorithm Design", Lyubimov and Palumbo</a></p>
+<p>[1]: Lyubimov and Palumbo, <a href="https://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785">"Apache Mahout: Beyond MapReduce; Distributed Algorithm Design"</a></p>
    </div>
   </div>     
 </div>