You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by "Bae, Jae Hyeon" <me...@gmail.com> on 2011/10/23 07:16:52 UTC

Help for graph processing implementation

Hi

I am implementing graph clustering algorithm based on hadoop and mahout.
This is my term project of data mining course.

Spectral method of graph clustering needs calculation of eigenvectors,
which is not practically efficient with the large scale graph. Thus, there
exists multi-level graph clustering method without eigenvectors. This
contains graph coarsening, base clustering, refining. Refining stage can be
done with weighted kernel k-means clustering which is not so difficult to
be implemented in MapReduce way, but the problem is graph coarsening.
Pseudocode is on this paper
http://www.cs.utexas.edu/~ddn/papers/sui10.pdfLike any graph
processing algorithm, this algorithm does not look easy to
be intuitively implemented in MapReduce way. So, I need a help from experts
more proficient at converting single thread graph algorithm to MapReduce
way. If this work is done smoothly, I will contribute this graph clustering
algorithm to Mahout if I am allowed to do so.

Thank you!

Best, Jae

Re: Help for graph processing implementation

Posted by Dan Brickley <da...@danbri.org>.

On 23 October 2011 07:16, Bae, Jae Hyeon <me...@gmail.com> wrote:

> I am implementing graph clustering algorithm based on hadoop and mahout.
> This is my term project of data mining course.

Cool! Fun topic...

> Spectral method of graph clustering needs calculation of eigenvectors,
> which is not practically efficient with the large scale graph. Thus, there
> exists multi-level graph clustering method without eigenvectors. This
> contains graph coarsening, base clustering, refining. Refining stage can be
> done with weighted kernel k-means clustering which is not so difficult to
> be implemented in MapReduce way, but the problem is graph coarsening.
> Pseudocode is on this paper
> http://www.cs.utexas.edu/~ddn/papers/sui10.pdf Like any graph
> processing algorithm, this algorithm does not look easy to
> be intuitively implemented in MapReduce way. So, I need a help from experts
> more proficient at converting single thread graph algorithm to MapReduce
> way. If this work is done smoothly, I will contribute this graph clustering
> algorithm to Mahout if I am allowed to do so.

Do take a look at the Spectral Clustering code (and Eigencuts) already
in Mahout.

orig proposal: https://issues.apache.org/jira/browse/MAHOUT-363
http://code.google.com/p/eigencuts/
wiki: https://cwiki.apache.org/MAHOUT/spectral-clustering.html
blog notes: http://spectrallyclustered.wordpress.com/
bugs: https://issues.apache.org/jira/browse/MAHOUT-524
javadoc: (maybe not freshest)
http://javasourcecode.org/html/open-source/mahout/mahout-0.5/org/apache/mahout/clustering/spectral/common/package-summary.html

The Wiki page in particular might be a good start,
https://cwiki.apache.org/MAHOUT/spectral-clustering.html

Code is in public svn:
http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/

Take a look at the run() method in
http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java

...this takes in a textual representation of an affinity matrix,
constructs the laplacian representation of it and uses the existing
DistributedLanczosSolver/SVD (see
https://cwiki.apache.org/MAHOUT/dimensional-reduction.html ) and
K-Means components from elsewhere in Mahout. The code seems fairly
well commented.

Having said that, this code doesn't seem very happy currently. It
seems per https://issues.apache.org/jira/browse/MAHOUT-524 difficult
to get running in current Mahout trunk. I couldn't get it running yet.

If you can find a way in your term project to build on this work, that
would be fantastic...

Hope this helps,

cheers,

Dan

Re: Help for graph processing implementation

Posted by "Bae, Jae Hyeon" <me...@gmail.com>.

I am sorry for using very negative word for eigenvectors. Efficient graph
clustering method is mentioning that without calculating eigenvectors, it
would be more efficient. Please refer to
METIS<http://www.google.com/url?sa=t&rct=j&q=metis%20graph%20clustering&source=web&cd=3&ved=0CD0QFjAC&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.106.8157%26rep%3Drep1%26type%3Dpdf&ei=-yakTo3pJ6risQKgmOiyBQ&usg=AFQjCNHCqF2I50rQlCqstBYojYUNI3ZIjA>or
GRACLUS<http://www.cs.utexas.edu/users/inderjit/public_papers/multilevel_pami.pdf>
.

My goal is graph clustering with the objective function of normalized cut.
I have been using Mahout and I know there are spectral clustering algorithm
but I don't know the running performance of spectral clustering using
eigenvectors calculated by DistributedLancozSolver. Also, I wanted to
implement multi-level approach without calculating eigenvectors mainly due
to this idea is invented by the professor in my university.

Coarsening is a tricky graph processing like minimum spanning tree, so I
need an inventive way for edge coarsening. Actually, parallel version of
METIS is implemented with MPI.

Thank you so much for Dan's comment and yours

Best, Jae

2011/10/23 Ted Dunning <te...@gmail.com>

> Why do you say that eigenvectors are infeasible?
>
> Also, I think your link should have been
> http://www.cs.utexas.edu/~ddn/papers/sui10.pdf
>
> You accidentally glued some extra text to it.
>
> On Sat, Oct 22, 2011 at 10:16 PM, Bae, Jae Hyeon <me...@gmail.com>
> wrote:
>
> > Hi
> >
> > I am implementing graph clustering algorithm based on hadoop and mahout.
> > This is my term project of data mining course.
>
>
> > Spectral method of graph clustering needs calculation of eigenvectors,
> > which is not practically efficient with the large scale graph. Thus,
> there
> > exists multi-level graph clustering method without eigenvectors. This
> > contains graph coarsening, base clustering, refining. Refining stage can
> be
> > done with weighted kernel k-means clustering which is not so difficult to
> > be implemented in MapReduce way, but the problem is graph coarsening.
> > Pseudocode is on this paper
> > http://www.cs.utexas.edu/~ddn/papers/sui10.pdfLike any graph
> > processing algorithm, this algorithm does not look easy to
> > be intuitively implemented in MapReduce way. So, I need a help from
> experts
> > more proficient at converting single thread graph algorithm to MapReduce
> > way. If this work is done smoothly, I will contribute this graph
> clustering
> > algorithm to Mahout if I am allowed to do so.
> >
> > Thank you!
> >
> > Best, Jae
> >
>

Re: Help for graph processing implementation

Posted by Ted Dunning <te...@gmail.com>.

Why do you say that eigenvectors are infeasible?

Also, I think your link should have been
http://www.cs.utexas.edu/~ddn/papers/sui10.pdf

You accidentally glued some extra text to it.

On Sat, Oct 22, 2011 at 10:16 PM, Bae, Jae Hyeon <me...@gmail.com> wrote:

> Hi
>
> I am implementing graph clustering algorithm based on hadoop and mahout.
> This is my term project of data mining course.


> Spectral method of graph clustering needs calculation of eigenvectors,
> which is not practically efficient with the large scale graph. Thus, there
> exists multi-level graph clustering method without eigenvectors. This
> contains graph coarsening, base clustering, refining. Refining stage can be
> done with weighted kernel k-means clustering which is not so difficult to
> be implemented in MapReduce way, but the problem is graph coarsening.
> Pseudocode is on this paper
> http://www.cs.utexas.edu/~ddn/papers/sui10.pdfLike any graph
> processing algorithm, this algorithm does not look easy to
> be intuitively implemented in MapReduce way. So, I need a help from experts
> more proficient at converting single thread graph algorithm to MapReduce
> way. If this work is done smoothly, I will contribute this graph clustering
> algorithm to Mahout if I am allowed to do so.
>
> Thank you!
>
> Best, Jae
>