You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mahout.apache.org by bu...@apache.org on 2014/05/02 22:33:15 UTC

svn commit: r907806 - in /websites/staging/mahout/trunk/content: ./ users/clustering/spectral-clustering.html

Author: buildbot
Date: Fri May  2 20:33:15 2014
New Revision: 907806

Log:
Staging update by buildbot for mahout

Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri May  2 20:33:15 2014
@@ -1 +1 @@
-1592028
+1592031

Modified: websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html
==============================================================================
--- websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html (original)
+++ websites/staging/mahout/trunk/content/users/clustering/spectral-clustering.html Fri May  2 20:33:15 2014
@@ -257,9 +257,10 @@
 <p>As of Mahout 0.3, spectral clustering has been implemented to take advantage of the MapReduce framework. It uses <a href="http://mahout.apache.org/users/dim-reduction/ssvd.html">SSVD</a> for dimensionality reduction of the input data set, and <a href="http://mahout.apache.org/users/clustering/k-means-clustering.html">k-means</a> to perform the final clustering.</p>
 <p><strong>(<a href="https://issues.apache.org/jira/browse/MAHOUT-1538">MAHOUT-1538</a> will port the existing Hadoop MapReduce implementation to Mahout DSL, allowing for one of several distinct distributed back-ends to conduct the computation)</strong></p>
 <h2 id="input">Input</h2>
-<p>The input format for the algorithm currently takes the form of a Hadoop-backed affinity matrix, in text form. Each line of the text file specifies a single element of the affinity matrix: the row index <code>\(i\)</code>, the column index <code>\(j\)</code>, and the value:</p>
+<p>The input format for the algorithm currently takes the form of a Hadoop-backed affinity matrix in the form of text files. Each line of the text file specifies a single element of the affinity matrix: the row index <code>\(i\)</code>, the column index <code>\(j\)</code>, and the value:</p>
 <p><code>i, j, value</code></p>
 <p>The affinity matrix is symmetric, and any unspecified <code>\(i, j\)</code> pairs are assumed to be 0 for sparsity. The row and column indices are 0-indexed. Thus, only the non-zero entries of either the upper or lower triangular need be specified.</p>
+<p>The matrix elements specified in the text files are collected into a Mahout <code>DistributedRowMatrix</code>.</p>
 <p><strong>(<a href="https://issues.apache.org/jira/browse/MAHOUT-1539">MAHOUT-1539</a> will allow for the creation of the affinity matrix to occur as part of the core spectral clustering algorithm, as opposed to the current requirement that the user create this matrix themselves and provide it, rather than the original data, to the algorithm)</strong></p>
 <h2 id="running-spectral-clustering">Running spectral clustering</h2>
 <p><strong>(<a href="https://issues.apache.org/jira/browse/MAHOUT-1540">MAHOUT-1540</a> will provide a running example of this algorithm and this section will be updated to show how to run the example and what the expected output should be; until then, this section provides a how-to for simply running the algorithm on arbitrary input)</strong></p>
@@ -273,7 +274,7 @@
 </pre></div>
 
 
-<p>The affinity matrix can be contained in a single text file (using the aforementioned one-line-per-entry format) or span many text files (per (MAHOUT-978)[https://issues.apache.org/jira/browse/MAHOUT-978], do not prefix text files with a leading underscore '_' or period '.'). The <code>-d</code> flag is required for the algorithm to know the dimensions of the affinity matrix. <code>-k</code> is the number of top eigenvectors from the normalized graph Laplacian in the SSVD step, and also the number of clusters given to k-means after the SSVD step.</p>
+<p>The affinity matrix can be contained in a single text file (using the aforementioned one-line-per-entry format) or span many text files <a href="https://issues.apache.org/jira/browse/MAHOUT-978">per (MAHOUT-978</a>, do not prefix text files with a leading underscore '_' or period '.'). The <code>-d</code> flag is required for the algorithm to know the dimensions of the affinity matrix. <code>-k</code> is the number of top eigenvectors from the normalized graph Laplacian in the SSVD step, and also the number of clusters given to k-means after the SSVD step.</p>
 <h2 id="example">Example</h2>
 <p>To provide a simple example, take the following affinity matrix, contained in a text file called <code>affinity.txt</code>:</p>
 <div class="codehilite"><pre>0<span class="p">,</span> 0<span class="p">,</span> 0