You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by ok...@apache.org on 2015/04/06 17:39:26 UTC

[2/3] incubator-tinkerpop git commit: added Spark algorithm diagram with a detailed paragraph explaining how the SparkGraphComputer engine works.

added Spark algorithm diagram with a detailed paragraph explaining how the SparkGraphComputer engine works.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/c49154f5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/c49154f5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/c49154f5

Branch: refs/heads/master
Commit: c49154f5b1a977f5479125f586d30f3a2521dc7a
Parents: 16bf899
Author: Marko A. Rodriguez <ok...@gmail.com>
Authored: Mon Apr 6 09:37:16 2015 -0600
Committer: Marko A. Rodriguez <ok...@gmail.com>
Committed: Mon Apr 6 09:37:24 2015 -0600

----------------------------------------------------------------------
 docs/src/implementations.asciidoc      |    4 +
 docs/static/images/spark-algorithm.png |  Bin 0 -> 286915 bytes
 docs/static/images/tinkerpop3.graffle  | 5024 ++++++++++++++++++++++++++-
 3 files changed, 4958 insertions(+), 70 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/c49154f5/docs/src/implementations.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/implementations.asciidoc b/docs/src/implementations.asciidoc
index 56ec85e..44e3c74 100644
--- a/docs/src/implementations.asciidoc
+++ b/docs/src/implementations.asciidoc
@@ -661,6 +661,10 @@ gremlin> g.V().out().out().values('name')
 
 CAUTION: The HadoopRemoteAcceptor (`:remote`) currently does not support `SparkGraphComputer`. As such, submitting lambda containing traversals to the Spark cluster is not possible via the Gremlin Console.
 
+The `SparkGraphComputer` algorithm leverages Spark's caching abilities to reduce the amount of data shuffled across the wire on each iteration of the <<vertexprogram,`VertexProgram`>>. When the graph is loaded as a Spark RDD (Resilient Distributed Dataset) it is immediately cached as `graphRDD`. The `graphRDD` is a distributed adjacency list which encodes the vertex, its properties, and all its incident edges. On the first iteration, each vertex (in parallel) is passed through `VertexProgram.execute()`. This yields an output of the vertex's mutated state (i.e. updated compute keys -- `propertyX`) and its outgoing messages. This `viewOutgoingRDD` is then reduced to `viewIncomingRDD` where the outgoing messages are sent to their respective vertices. If a `MessageCombiner` exists for the vertex program, then messages are aggregated locally and globally to ultimately yield one incoming message for the vertex. This reduce sequence is the "message pass." If the vertex program does not ter
 minate on this iteration, then the `viewIncomingRDD` is joined with the cached `graphRDD` and the process continues. When there are no more iterations, there is a final join and the resultant RDD is stripped of its edges and messages. This `mapReduceRDD` is cached and is processed by each <<mapreduce,`MapReduce`>> job in the <<graphcomputer,`GraphComputer`>> computation.
+
+image::spark-algorithm.png[width=775]
+
 [[mapreducegraphcomputer]]
 MapReduceGraphComputer
 ^^^^^^^^^^^^^^^^^^^^^^

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/c49154f5/docs/static/images/spark-algorithm.png
----------------------------------------------------------------------
diff --git a/docs/static/images/spark-algorithm.png b/docs/static/images/spark-algorithm.png
new file mode 100644
index 0000000..a4a40fc
Binary files /dev/null and b/docs/static/images/spark-algorithm.png differ