You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Bae, Jae Hyeon" <me...@gmail.com> on 2011/11/26 22:21:52 UTC

Scalable graph clustering implementation

Hi

I am a graduate student at University of Texas at Austin CS department and
a real fan of Mahout. I've been using mahout library for more than a year.

This semester, I implemented scalable graph clustering without eigenvector
computation with multi-level approach as my data mining term project. Its
main idea is that weighted kernel k-means clustering is mathematically
equivalent to normalized cut graph clustering. So if we can find the
optimal initial clustering, we can find graph clustering without
eigenvector calculation. To find the optimal initial clustering, we coarsen
the graph at multi-level with edge contraction method, and we apply
spectral clustering (we can use any other graph clustering method) to the
smallest graph. This is called *base clustering*. The clustering
information of the previous graph will be the initial clustering of the
next level graph. Using this approach, we can do graph cut without
eigenvector calculation.

Currently, Mahout has a spectral k-means clustering which needs eigenvector
calculation using distributed Lancoz algorithm. When I tried this
implementation with livejournal graph data, I'd waited for more than 15
hours but it didn't finish building matrix laplacian yet but my
implementation was finished within 12 hours. I didn't implement base
clustering. In my experiment, I used the legacy graph clustering package
"Graclus" written in C but we can use Mahout spectral k-means clustering as
the base clustering phase.

I want to contribute my implementation to Mahout if it is available and
allowed. Please let me know how I can follow up.

Thank you

Best, Jae

Re: Scalable graph clustering implementation

Posted by Isabel Drost <is...@apache.org>.
First of all, your findings sound very interesting - thanks for sharing.


On 26.11.2011 Bae, Jae Hyeon wrote: 
> I want to contribute my implementation to Mahout if it is available and
> allowed. Please let me know how I can follow up.

The easiest starting point would be our how to contribute wiki page:

https://cwiki.apache.org/MAHOUT/how-to-contribute.html

Please keep in mind that contributing whole new packages and algorithms may be a 
bit more involved: Apart from the implementation itself including unit tests 
there also is a need for a running example that shows how to use your code and 
at least some sort of quickstart documentation in our wiki.


Isabel