You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Marko A. Rodriguez (JIRA)" <ji...@apache.org> on 2016/01/07 17:40:39 UTC

[jira] [Created] (TINKERPOP-1074) More contractual testing/specifications around Persist and ResultGraph.

Marko A. Rodriguez created TINKERPOP-1074:
---------------------------------------------

             Summary: More contractual testing/specifications around Persist and ResultGraph.
                 Key: TINKERPOP-1074
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1074
             Project: TinkerPop
          Issue Type: Improvement
          Components: process
    Affects Versions: 3.1.0-incubating
            Reporter: Marko A. Rodriguez
             Fix For: 3.2.0-incubating


A {{ComputerResult}} references two objects: a graph and a memory. The graph is the resultant computed graph and the memory contains all the sideEffect data from the computation (if any).

Right now, we have the following {{Persist}} options: {{NOTHING}}, {{VERTEX_PROPERTIES}}, {{EDGES}}. We also have the following {{ResultGraph}} options: {{ORIGINAL}}, {{NEW}}.

* NOTHING + ORIGINAL = ComputerResult contains original graph reference.
* NOTHING + NEW = ?? No test to force what this means! Should be {{EmptyGraph.instance()}}.
* VERTEX_PROPERTIES + ORIGINAL = ComputerResult contains original graph, but the computed vertex properties have been "saved" to it. (no contractual test cases here either!)
* VERTEX_PROPERTIES + NEW = ComputerResult contains new graph with only vertices and their properties.
* EDGES + NEW = ComputerResult contains new graph with vertices, edges, and their properties.
* EDGES + ORIGINAL = ComputerResult contains original graph, but the computed vertex properties and edges have been "saved" to it. (no contractual test cases here either!)

{{TinkerGraphComputer}} is the only system that supports all the above configuration combinations. Add test cases to {{GraphComputerTest}} that verify the behavior of all combinations.

HOWEVER !!!! ------ should we really respect ORIGINAL+PERSIST? Most providers will use {{BulkLoaderVertexProgram}} to write the computed graph back to the original graph. If there are TWO ways of doing this, this seems bad? In fact, the way that TinkerGraphComputer writes the computed graph back to the original graph is nearly identical to how it BulkLoaderVertexProgram works. Thus, I'm wondering if we simply get rid the concept of {{ResultGraph}} and ONLY have {{Persist}}.

* Persist.NOTHING: Returns the original graph in {{ComputerResult}}.
* Persist.VERTEX_PROPERTIES: Returns a new graph with only vertices and properties.
* Persist.EDGES: Returns a new graph with vertices, edges, and their properties.

For in-memory graphs like {{TinkerGraph}}, "new graph" can mean the original graph with the {{GraphView}} overlay. Thus, its not really a full copy of the original graph. Moreover, Persist.NOTHING just garbage collects the GraphView and thus, the original graph.

------------------

Next, what does {{Persist}} mean for memory? Remember, {{ComputerResult}} also has a reference to sideEffect memory. What if you want to run a job, NOT persist the graph, but persist the memory only. I think we should ALWAYS assume memory persistence. For TinkerGraph, that means the the ComputerResult.memory() has a HashMap of memory values. For Giraph/Spark, that means that the {{Storage}} will always have resultant sideEffect data in the output directory even if there is no graph.

* {{NOTHING}}: persist memory and return the original graph.
* {{VERTEX_PROPERTIES}}: persist memory and return new graph of just vertex properties.
* {{EDGES}}: persist memory and return new graph of vertex properties, and edges.

Decisions, decisions, decisions....



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)