You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Mirko Kaempf (JIRA)" <ji...@apache.org> on 2014/06/18 09:07:07 UTC

[jira] [Created] (GIRAPH-920) Dynamic snapshot control via Zookeeper

Mirko Kaempf created GIRAPH-920:
-----------------------------------

             Summary: Dynamic snapshot control via Zookeeper 
                 Key: GIRAPH-920
                 URL: https://issues.apache.org/jira/browse/GIRAPH-920
             Project: Giraph
          Issue Type: Bug
          Components: bsp
    Affects Versions: 1.1.0
            Reporter: Mirko Kaempf
            Priority: Minor


Gephi is great for showing (even time dependent) graphs, and using the Gephi-Hadoop-Connector such time dependent graphs can be imported into Gephi from Hadoop via a set of node- & edge-list queries against Hive or Impala. This helps a lot for debugging and showing the properties of an algorithm.

Starting with an existing Giraph algorithm in which "Snapshots" are used to store the state of the graph from time to time using the built in feature from Giraph. 
To this we add a "hook", which allows a kind of "turn on" or "turn off" feature (just switching a flag which tells the algorithm to do a snapshot or even not during a superstep) ... the request goes from a client to Zookeeper which registers all snapshot requests and all snapshotable jobs.

We use the tool gctrl tool, which has to be created.

A command line call looks like this.

gctrl enableSnap $jobID $step0 $stepDist

gctrl : the tool to interact with a Giraph job via zookeeper

enableSnap : command to turn dynamic snapshotting on
disableSnap : command to turn of dynamic snapshotting
listSnap : shows all running jobs, which are registered with the "snapshot feature"

$jobID : the id of a Girpah job
$step0 : first or next superstep, which finishes with a snapshot
$stepDist : steps without a snapshot


A basic structure for the Zookeeper stuff is ready (inspired by the Zookeeper book). We have to change the GiraphJob a bit. We introduce a helper class into which all snapshot controle things are delegated.

Use Case:

If snapshots are enabled, the state of the current graph is implicitly dumped to HDFS in a way which allows Hive / Impala queries.
Therefore the Tables are prepared and all snapshots build partitions within that table. This allows us to show the graph in Gephi and we can do
dynamic inspection outside of Giraph. In a long running job, one can step from one superstep to the next to study the behaviour at the critical point e.g.
in the range around a phase transition.

Expected results:

a) the patch which has the code for Giraph
b) a demo to present the feature, especially to show how to debug algorithms on scale, using a new algorithm, which is still in research.




--
This message was sent by Atlassian JIRA
(v6.2#6252)