You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "stephen mallette (JIRA)" <ji...@apache.org> on 2016/04/06 19:38:25 UTC

[jira] [Updated] (TINKERPOP-1250) Support Subgraph-Centric GraphComputer

     [ https://issues.apache.org/jira/browse/TINKERPOP-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stephen mallette updated TINKERPOP-1250:
----------------------------------------
    Issue Type: Improvement  (was: New Feature)

> Support Subgraph-Centric GraphComputer
> --------------------------------------
>
>                 Key: TINKERPOP-1250
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1250
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.2.0-incubating
>            Reporter: Marko A. Rodriguez
>              Labels: breaking
>
> Right now, {{GraphComputer}} and {{VertexPrograms}} are "vertex-centric." That is, the boundary of an atomic unit of computation is a single vertex, its properties, and its incident edges. We should support "subgraph-centric" computing where each worker's partition is loaded in memory (RAM) as a connected subgraph. For those vertices that are not in its partition, a special "reference vertex" is use to reference it. What this would mean is that when a {{Traverser}} is processing, it can continue to evaluate as long as its within the subgraph partition. The moment it references a vertex not in the partition (a "reference vertex"), it serializes itself as a message.
> This would greatly increase the speed of Gremlin OLAP at the cost of requiring a large amount of memory to store all the worker partition subgraphs in RAM. There might even be a way to have a hybrid-model where some of the partition is held in RAM and the other is (even though still in the same partition) is stored as "star vertices."
> How would this be added to {{GraphComputer}} in a backwards compatible way?
> {code}
> GraphComputer.supportsVertexCentricComputing)() // currently true for all implementations
> GraphComputer.supportsSubgraphCentricComputing()
> {code}
> A {{VertexProgram}} could then have the following method:
> {code}
> boolean VertexProgram.withinCentricity(final M message)
> {code}
> If the message is NOT within "centricity", then it is serialized and distributed, else it continues to execute.
> I haven't thought through all the API and implementation considerations. Though it would be good to make this backwards compatible and, moving forward, able to support "edge-centric computing" and thus, have a very memory limited OLAP system.
> How to think of the different models:
> 1. Vertex-centric: medium speed, medium expressivity, medium memory costs.
> 2. Subgraph-centric: high speed, high expressivity, high memory costs.
> 3. Edge-centric: high speed, low expressivity, low memory costs.
> What is "expressivity"? Well, subgraph-centric computations can have "local traversals" that move beyond the "star graph." Edge-centric would not support any `by()`-modulators as the computation is bound to a single edge.  Thus, low expressivity. It really depends on how you represent the edge-centric edge list. Do you have the vertices on each side of an edge duplicated with all their properties? This wold still be low memory costs, but you could get more expressivity.
> Anywho, in general it would be nice if the underlying execution engine can handle the three common distributed graph computing paradigms. Subgraph-centric seems the easiest to support at this point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)