You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Shannon Quinn <sq...@gatech.edu> on 2015/05/07 22:59:16 UTC

Spectral clustering

Hi Sugam,

This is in response to your original thread: 
http://mail-archives.apache.org/mod_mbox/mahout-user/201505.mbox/%3C1412053714.1387791.1431020729309.JavaMail.yahoo%40mail.yahoo.com%3E

The first thing you need to do is build the graph affinity matrix 
yourself. That's the input to the map-reduce spectral clustering 
algorithm, and what is described in the documentation (the "i, j, value" 
part). Basically you'll consider each document as a single node in a 
graph, and weight the connections between nodes. "i" and "j" are the 
pair of nodes you're considering, and "value" is the similarity / 
affinity, usually between 0 (completely dissimilar) and 1 (identical). 
Typically you use RBF to compute affinities.

Once you have the data in this format, then you can feed it to the 
spectral clustering algorithm. Having the Mahout package compute the 
affinities is at the top of my to-do list for the next version (though 
there are still some questions that have to be addressed), so in theory 
you could just submit the documents as you would to any other algorithm 
in Mahout, but for now you have to compute the affinities yourself.

Let me know if anything still isn't clear.

Shannon

Re: Spectral clustering

Posted by Shannon Quinn <sq...@gatech.edu>.
Hi Sugam,

To clarify, the "RBF" I mentioned for computing affinities is the radial 
basis function, linked in mahout's spectral clustering documentation: 
http://en.wikipedia.org/wiki/RBF_kernel

The basic layout is to compare documents pairwise, use RBF to compute 
their similiarity, and set the entries in the affinity matrix 
corresponding to the two documents to the output of the RBF.

On 5/7/15 4:59 PM, Shannon Quinn wrote:
> Hi Sugam,
>
> This is in response to your original thread: 
> http://mail-archives.apache.org/mod_mbox/mahout-user/201505.mbox/%3C1412053714.1387791.1431020729309.JavaMail.yahoo%40mail.yahoo.com%3E
>
> The first thing you need to do is build the graph affinity matrix 
> yourself. That's the input to the map-reduce spectral clustering 
> algorithm, and what is described in the documentation (the "i, j, 
> value" part). Basically you'll consider each document as a single node 
> in a graph, and weight the connections between nodes. "i" and "j" are 
> the pair of nodes you're considering, and "value" is the similarity / 
> affinity, usually between 0 (completely dissimilar) and 1 (identical). 
> Typically you use RBF to compute affinities.
>
> Once you have the data in this format, then you can feed it to the 
> spectral clustering algorithm. Having the Mahout package compute the 
> affinities is at the top of my to-do list for the next version (though 
> there are still some questions that have to be addressed), so in 
> theory you could just submit the documents as you would to any other 
> algorithm in Mahout, but for now you have to compute the affinities 
> yourself.
>
> Let me know if anything still isn't clear.
>
> Shannon