You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Tillmann Fiehn <tf...@mailbox.tu-berlin.de> on 2011/05/23 14:44:25 UTC

Contribution of Graph Algorithms to MAHOUT

Hello everybody,

we are Tillmann Fiehn and Sebastian Arnold, IT students from TU Berlin.
As Sebastian Schelter already announced, we are atteding Isabel's and
Sebastian's class "Large scale data analysis and data mining" and picked
an interesting project that we want to implement in Mahout. We are open
for any hints and suggestions and would appreciate if you could share
your thoughts on our proposal.

Our goal is to implement a map/reduce algorithm for finding k-trusses in
a given graph. A k-truss is a nontrivial, single-component maximal
subgraph, such that every edge is contained in at least k-2 triangles in
the subgraph. The algorithm was proposed in the IEEE paper J. Cohen
2009: "Graph Twiddling in a MapReduce World" and involves a number of
graph algorithms that are to our knowledge currently not present in
Mahout:

- simplifyGraph: Edges->RepresentativeEdges
 remove Loops (not cycles)
 eliminate duplicate edges

- augmentGraphWithDegrees: RepresentativeEdges->AugmentedEdges
 augment the edges with degree count for both nodes

- enumerateTriangles: AugmentedEdges->Triangles
 find all triangles in a graph

- findComponents: RepresentativeEdges->ZoneAssignments
 find all components of a graph, each identified as the order number of
 the lowest-order vertex contained
 includes: findAdjacentZones, mergeAdjacentZones

- findKTrusses: RepresentativeEdges->ZoneAssignments(v,z)
 find all k-trusses of the graph
 each returned vertex v is part of a truss z

Can you help us to identify sub-problems or data structures that may be
already implemented in Mahout? Furthermore, we will try to implement all
the graph algorithms as general as possible so they can be re-used for
future development.

We will create a JIRA issue with further details. We want to finish the
implementation in 4-6 weeks.

We are happy to recieve your feedback!

Best regards,
Tillmann & Sebastian

-- 
*FG DIMA*
Einsteinufer 17 / 705
10587 Berlin
phone: +49 / 30 / 31425552