You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hama.apache.org by Apache Wiki <wi...@apache.org> on 2013/06/17 00:00:32 UTC
[Hama Wiki] Update of "DynamicGraphs" by AnastasisAndronidis

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "DynamicGraphs" page has been changed by AnastasisAndronidis:
https://wiki.apache.org/hama/DynamicGraphs

New page:
== Dynamic Addition and Removal of Vertexes in Graph API ==

=== Introduction ===
Nowadays more and more people turn to distributed environments to store and analyze Big Data. For that reason there is a growing need in both achieving efficiency and new features. Hama is an upcoming project that gives to researchers and analysts a way to handle big amounts of data, through the BSP computation model.
 
A very interesting feature of the Hama project is the Graph API for graph analysis. A lot of scientists and companies represent and manage their data with the use of one or more of the many different kinds of graphs (e.g. incremental graphs).
 
This article will present some new features on the Graph API of Hama, on how to create a dynamic graphs on run time. It also will serve as a technical document of the internals of those features.

=== Description ===
At the time being, we will discuss only two different operations (addition and deletion) from both points of the end-user and internal architecture.

Some points that we need to keep in mind, are:
 1. Deletion/Addition is happening after a super step. ''We are providing methods inside a vertex instance that create/delete a vertex''
 2. New vertexes needs to be distributed through partitioner to be placed on right peers.
 3. Keep in mind that a new created vertex and old vertexes, will have the same super step counter. ''Various algorithms change their behavior though time, an example with an internal counter to serve as a state is needed''

=== Implementation Details ===
The `GraphJobRunner` class, contains the major steps for the graph computation. It starts with an initial setup, where the vertexes are loaded to memory, then there is the main computation loop, where the supersteps occur, and in the end there is the cleanup, where of the graph and the results are written to HDFS.

Our main target (concerning the operations +/-) is to enable the functionality of `GraphJobRunner::loadVertices()` function during a superstep so the user will be able to call a function during a superstep (e.g. `addVertex(value, List<> edges)`) to add a vertex. The user will also be able to call a delete method (e.g. `deleteVertex(vertexID)`) to remove vertexes.

An important note here is that adding or removing vertexes is happening in every sync of the peers. This means that during a superstep the additions or deletions will be locally stored and after the end of the superstep will be committed, through messages, to the peers through the partitioner.