You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Pascal Jäger <pa...@pascaljaeger.de> on 2014/02/11 10:24:35 UTC

Aggregators as

Hi all,

I have some questions about the use of aggregators. I want to implement an algorithms for Community Detection and Community Tracking.
The Community Detection algorithm basically  outputs a file where each line represents a community and contains the IDs of the nodes in the community.

For the Community Tracking part I cluster a second graph (i.e. another time step) and then need to compare each community of the first time step with every community of the second time step.  For large graphs the number of communities can get quite large as well.

One idea I had was to register an aggregator for each community of the first time step and then for each community found in the second time step one node of each community send a message to the aggregator containing the nodes of its community. The aggregator the calculates the similarity for each received community of time step 2.
I would end up registering several thousand aggregators I only need after one superstep.

The other idea was to alter the compute method for the node with the smallest ID in each community and let them do the similarity calculation. This the means I would have to add (and later remove) some thousand edges to the graph.

What do you think would perform better? Or should I do the calculation outside of giraph?

I appreciate any input.

Thanks

Pascal