You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Fan Jiang (JIRA)" <ji...@apache.org> on 2014/11/06 01:19:34 UTC

[jira] [Updated] (SPARK-4259) Add Spectral Clustering Algorithm with Gaussian Similarity Function

     [ https://issues.apache.org/jira/browse/SPARK-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fan Jiang updated SPARK-4259:
-----------------------------
    Description: 
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm.

We implemented the unnormalized graph Laplacian matrix by Gaussian similarity function. A brief design looks like below:

Unnormalized spectral clustering

Input: raw data points, number k of clusters to construct: 
• Comupte Similarity matrix S ∈ Rn×n, .
• Construct a similarity graph. Let W be its weighted adjacency matrix.
• Compute the unnormalized Laplacian L = D - W. where D is the Degree diagonal matrix
• Compute the first k eigenvectors u1, . . . , uk of L.
• Let U ∈ Rn×k be the matrix containing the vectors u1, . . . , uk as columns.
• For i = 1, . . . , n, let yi ∈ Rk be the vector corresponding to the i-th row of U.
• Cluster the points (yi)i=1,...,n in Rk with the k-means algorithm into clusters C1, . . . , Ck.
Output: Clusters A1, . . . , Ak with Ai = {j| yj ∈ Ci}.



  was:
In recent years, spectral clustering has become one of the most popular modern clustering
algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software,
and very often outperforms traditional clustering algorithms such as the k-means algorithm.

We implemented the unnormalized graph Laplacian matrix by Gaussian similarity function. A brief design looks like below:

Unnormalized spectral clustering

Input: raw data points, number k of clusters to construct: 
• Comupte Similarity matrix S ∈ Rn×n, .
• Construct a similarity graph. Let W be its weighted adjacency matrix.
• Compute the unnormalized Laplacian L = D - W. where D is the Degree diagonal matrix
• Compute the first k eigenvectors u1, . . . , uk of L.
• Let U ∈ Rn×k be the matrix containing the vectors u1, . . . , uk as columns.
• For i = 1, . . . , n, let yi ∈ Rk be the vector corresponding to the i-th row of U.
• Cluster the points (yi)i=1,...,n in Rk with the k-means algorithm into clusters C1, . . . , Ck.
Output: Clusters A1, . . . , Ak with Ai = {j| yj ∈ Ci}.




> Add Spectral Clustering Algorithm with Gaussian Similarity Function
> -------------------------------------------------------------------
>
>                 Key: SPARK-4259
>                 URL: https://issues.apache.org/jira/browse/SPARK-4259
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Fan Jiang
>              Labels: features
>
> In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm.
> We implemented the unnormalized graph Laplacian matrix by Gaussian similarity function. A brief design looks like below:
> Unnormalized spectral clustering
> Input: raw data points, number k of clusters to construct: 
> • Comupte Similarity matrix S ∈ Rn×n, .
> • Construct a similarity graph. Let W be its weighted adjacency matrix.
> • Compute the unnormalized Laplacian L = D - W. where D is the Degree diagonal matrix
> • Compute the first k eigenvectors u1, . . . , uk of L.
> • Let U ∈ Rn×k be the matrix containing the vectors u1, . . . , uk as columns.
> • For i = 1, . . . , n, let yi ∈ Rk be the vector corresponding to the i-th row of U.
> • Cluster the points (yi)i=1,...,n in Rk with the k-means algorithm into clusters C1, . . . , Ck.
> Output: Clusters A1, . . . , Ak with Ai = {j| yj ∈ Ci}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org