You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Alessio Arleo <in...@icloud.com> on 2015/03/25 00:34:51 UTC

"Local-only" aggregators

Hello everybody

I was wondering if it was possible to extend the concept of aggregator from a “global” to a “local-only” perspective. 

Normally, aggregators DO cause network traffic because of the cycle: Workers -> Aggregator Owner-> MasterAggregator -> AggregatorOwner -> Workers

What if I’d like to fetch and aggregate values as I would normally do with aggregators but without causing this traffic? Let’s assume this situation:

1 - Define a custom partitioning class and let it partition the graph. This is the partition used to assign vertices to workers. 
2 - in the computation class, every time che compute method is called on a vertex, the data needed for computation is stored inside the vertex neighbours but also in non-neighbouring vertices (think about Force Directed layout algorithm for example; to compute the forces, is necessary the distance between neighbouring and not-neighbouring vertices, applying different kind of forces).
	
— Given that the compute class is computing on vertex X
	a - I pick information from X neighbours as I would normally do (iterating its edges or the incoming messages)
	b - When it comes to non-neighbouring vertices I would like to use data from X worker only.

The first thing I tried to understand before asking this question was: does this make any sense? I am probably wrong, but this actually does. If I partition my graph to maximize locality, what I am actually trying to do is to reduce the network traffic as much as possibile. 

My doubt is that if I use aggregators to achieve the result the network traffic would be heavy, probably losing the advantages of the initial partitioning. What if I could access and modify an aggregator-like local data structure in the same fashion (i.e. “getAggregatedValue”) but without broadcasting it (assuming that I do not need the aggregator to be accessible to every worker)? Or could it be possibile to manually assign partition owners in order to minimise network traffic (if I need to aggregate all values from vertices in partition 3 and 3 only, I assign the partition 3 aggregator owner to partition 3 worker)?

I hope in your comprehension and I hope I somehow caught your attention, even if for a brief moment. Ask me if something is not clear ;)

Cheers!

~~~~~~~~~~~~~~~~~~~

Ing. Alessio Arleo

Dottorando in Ingegneria Industriale e dell’Informazione

Dottore Magistrale in Ingegneria Informatica e dell’Automazione
Dottore in Ingegneria Informatica ed Elettronica

Linkedin: it.linkedin.com/in/IngArleo <http://it.linkedin.com/in/IngArleo>
Skype: Ing. Alessio Arleo

Tel: +39 075 5853920
Cell: +39 349 0575782

~~~~~~~~~~~~~~~~~~~