You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2016/04/12 19:57:20 UTC

[DISCUSS] OLAP Mutations

Hello,

One of the big outstanding features in TinkerPop3 is OLAP-based graph mutations. That is:

	graph database --read--> graph processor --write--> graph database

Daniel Kuppitz has provided BulkLoaderVertexProgram with "from scratch" or incremental loading capabilities. It would be nice to be able to piggy back on this work for OLAP mutations. Here is an idea that borrows from the OLAP mutation model developed by Matthias Broecheler in Faunus.

1. We create a MutatingProgram interface.
2. That interface will have a collection of public static final String compute keys.
	- gremlin.mutatingProgram.mutation
	- gremlin.mutatingProgram.droppedProperties
	- gremlin.mutatingProgram.addedProperties
3. Any VertexProgram can implement MutatingProgram and in doing so, it will use the respective compute keys.
4. If that VertexProgram deletes a vertex, it does not delete the vertex, it simple adds the property "gremlin.mutatingProgram.mutation=dropped."
5. If that VertexProgram adds an edge, it adds the edge with the property "gremlin.mutatingProgram.mutation=added."
6. If that VertexProgram adds a vertex property, it adds the vertex property with the property "gremlin.mutatingProgram.mutation=added."
7. If that VertexProgram deletes a property, it adds the property to the element "gremlin.mutatingProgram.droppedProperties=[x,y,z]"
8. ...
9. It is up to the VertexProgram to be smart about consistency on mutations:
	* If an edge is added, the next iteration should copy that edge to incoming/outgoing vertex's incident edge set.
	* If an edge property is added, the next iteration should update the property on the incoming/outgoing vertex's incident edge's property.
	* We can provide various static helper methods in MutatingProgram to make this easy.
10. When the VertexProgram has completed its computation (terminated), BulkLoaderVertexProgram will be able to read the resultant graph and use the MutatingProgram compute keys as necessary to do the respective updates to the source graph (i.e. the graph database).

---------------------

Problems:

	1. How do we deal with ID generation?
	2. How do we add vertices in OLAP?
	3. How do we deal with updates that already occurred at the graph database while OLAP was processing?

@kuppitz -- would this notion of a "mutation tags" be useful in BulkLoaderVertexProgram? I assume it would make your life much easier as you don't have to do a "diff" -- the diff is provided to you.

Thoughts on this matter would be much appreciated.

Thank you,
Marko.

http://markorodriguez.com


Re: [DISCUSS] OLAP Mutations

Posted by Marko Rodriguez <ok...@gmail.com>.
Hello,

I've been telling people that OLAP mutations is something we would be working on for the TinkerPop 3.2.x line, but after some more thought as of lately, I don't think we can put this into 3.2.x, but instead, must wait for 3.3.0. Why? Backwards compatibility issues. Let me articulate the problems as I see them and perhaps someone can argue me away from my line of reasoning.

	1. GraphComputers would need to support edge and vertex addition/deletion.
		* We do have these as features right now, so presumably, GraphComputers that don't support those features would have them set to false.
			https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/GraphComputer.java#L160
		* However, what about those that do?! What are the semantics they are using? Nothing has been specified.
		* Not a show stopper, but a bit of a "hmmm…k."
	
	2. GraphComputers that support edge and vertex deletion will need to support "special" mutation messages.
		* When you delete an edge, that edge deletion will need to be messaged to the adjacent vertex so it can delete the corresponding edge from its star graph.
		* When you delete a vertex, that vertex deletion will need to be messaged to all adjacent vertices so it can delete the corresponding edges to that vertex in their star graph.
		* Similarly for edge property mutations...
		* We could require the VertexProgram to handle all that synchronization, but I think its best to leave that logic to GraphComputer.
			* Users who write VertexPrograms that do mutations will be overwhelmed handling the semantics and will be very error prone.
		* If this is core to GraphComputer (which I think it should be), then we need "special messages." (changing the GraphComputer contract to do so).

	3. GraphComputers support ResultGraph.NEW and ResultGraph.ORIGINAL.
		* This was a bad idea on my part from a long time ago that we should have deprecated it for 3.2.0.
		* This really should be cleared up and handled as we will need to provide semantics for this when it comes to graph mutations.
		* Thus, deprecate it when OLAP mutations are provided.

	4. VertexProgram API updates.
		* VertexProgram.getVertexComputeKeys() (exists)
		* VertexProgram.getEdgeComputeKeys() ?
	
	5. I believe that mutation metadata needs to be core to the GraphComputer star graphs and not using namespaces properties.
		* I think global mutations should be core to the GraphComputer infrastructure.
		* As such, I think that computed elements should have explicit metadata for their status.
			* That is, ComputerElement.getStatus() -> Status.ADDED, Status.REMOVED, Status.UPDATED, Status.UNCHANGED …
			* public interface ComputerElement …
		* If we do this, then the StarGraph representation changes for serialization and thus, a breaking Gryo change.
			* Or, we will need to make it so that StarGraph serializes differently for GryoReader/Writer than it does for GraphComputers that use StarGraph :|

All of these issues lead me to conclude that this can NOT happen in the 3.2.x line, but instead, need to come with the 3.3.x line. Moreover, with the 3.3.x line we would also, in conjunction, handle the following issues.

	https://issues.apache.org/jira/browse/TINKERPOP-1128
	https://issues.apache.org/jira/browse/TINKERPOP-1122
	https://issues.apache.org/jira/browse/TINKERPOP-1118
	https://issues.apache.org/jira/browse/TINKERPOP-1074
	https://issues.apache.org/jira/browse/TINKERPOP-942
	https://issues.apache.org/jira/browse/TINKERPOP-1070

In short, save the cheerleader, save the world.

Thoughts?,
Marko.	
		
http://markorodriguez.com

On Apr 12, 2016, at 10:57 PM, Daniel Kuppitz <me...@gremlin.guru> wrote:

>> 
>> would this notion of a "mutation tags" be useful in
>> BulkLoaderVertexProgram?
> 
> 
> Yes. I've used this approach in DSEG's BulkUpdateVertexProgram, which
> allows to add properties and edges. IMO we should also create a new VP in
> TinkerPop, one that is optimized to handle all update scenarios.
> 
> 1. How do we deal with ID generation?
> 
> 
> Do we need to care about IDs for edges and properties? I don't think so.
> Vertices would be tricky though, but...
> 
> 2. How do we add vertices in OLAP?
> 
> 
> We never allowed that in the past iirc, but that would be a "great to
> have". The question is, do we then have to care about IDs? The user could
> provide an ID or the underlying Graph DB could provide one at write time.
> 
> 3. How do we deal with updates that already occurred at the graph database
>> while OLAP was processing?
> 
> 
> It depends. Some scenarios would probably be solvable, others would lead to
> failure. Maybe we can even make some of the behaviors configurable through
> the BulkLoaderVP.
> 
> Cheers,
> Daniel
> 
> 
> 
> On Tue, Apr 12, 2016 at 7:57 PM, Marko Rodriguez <ok...@gmail.com>
> wrote:
> 
>> Hello,
>> 
>> One of the big outstanding features in TinkerPop3 is OLAP-based graph
>> mutations. That is:
>> 
>>        graph database --read--> graph processor --write--> graph database
>> 
>> Daniel Kuppitz has provided BulkLoaderVertexProgram with "from scratch" or
>> incremental loading capabilities. It would be nice to be able to piggy back
>> on this work for OLAP mutations. Here is an idea that borrows from the OLAP
>> mutation model developed by Matthias Broecheler in Faunus.
>> 
>> 1. We create a MutatingProgram interface.
>> 2. That interface will have a collection of public static final String
>> compute keys.
>>        - gremlin.mutatingProgram.mutation
>>        - gremlin.mutatingProgram.droppedProperties
>>        - gremlin.mutatingProgram.addedProperties
>> 3. Any VertexProgram can implement MutatingProgram and in doing so, it
>> will use the respective compute keys.
>> 4. If that VertexProgram deletes a vertex, it does not delete the vertex,
>> it simple adds the property "gremlin.mutatingProgram.mutation=dropped."
>> 5. If that VertexProgram adds an edge, it adds the edge with the property
>> "gremlin.mutatingProgram.mutation=added."
>> 6. If that VertexProgram adds a vertex property, it adds the vertex
>> property with the property "gremlin.mutatingProgram.mutation=added."
>> 7. If that VertexProgram deletes a property, it adds the property to the
>> element "gremlin.mutatingProgram.droppedProperties=[x,y,z]"
>> 8. ...
>> 9. It is up to the VertexProgram to be smart about consistency on
>> mutations:
>>        * If an edge is added, the next iteration should copy that edge to
>> incoming/outgoing vertex's incident edge set.
>>        * If an edge property is added, the next iteration should update
>> the property on the incoming/outgoing vertex's incident edge's property.
>>        * We can provide various static helper methods in MutatingProgram
>> to make this easy.
>> 10. When the VertexProgram has completed its computation (terminated),
>> BulkLoaderVertexProgram will be able to read the resultant graph and use
>> the MutatingProgram compute keys as necessary to do the respective updates
>> to the source graph (i.e. the graph database).
>> 
>> ---------------------
>> 
>> Problems:
>> 
>>        1. How do we deal with ID generation?
>>        2. How do we add vertices in OLAP?
>>        3. How do we deal with updates that already occurred at the graph
>> database while OLAP was processing?
>> 
>> @kuppitz -- would this notion of a "mutation tags" be useful in
>> BulkLoaderVertexProgram? I assume it would make your life much easier as
>> you don't have to do a "diff" -- the diff is provided to you.
>> 
>> Thoughts on this matter would be much appreciated.
>> 
>> Thank you,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> 


Re: [DISCUSS] OLAP Mutations

Posted by Daniel Kuppitz <me...@gremlin.guru>.
>
> would this notion of a "mutation tags" be useful in
> BulkLoaderVertexProgram?


Yes. I've used this approach in DSEG's BulkUpdateVertexProgram, which
allows to add properties and edges. IMO we should also create a new VP in
TinkerPop, one that is optimized to handle all update scenarios.

1. How do we deal with ID generation?


Do we need to care about IDs for edges and properties? I don't think so.
Vertices would be tricky though, but...

2. How do we add vertices in OLAP?


We never allowed that in the past iirc, but that would be a "great to
have". The question is, do we then have to care about IDs? The user could
provide an ID or the underlying Graph DB could provide one at write time.

3. How do we deal with updates that already occurred at the graph database
> while OLAP was processing?


It depends. Some scenarios would probably be solvable, others would lead to
failure. Maybe we can even make some of the behaviors configurable through
the BulkLoaderVP.

Cheers,
Daniel



On Tue, Apr 12, 2016 at 7:57 PM, Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello,
>
> One of the big outstanding features in TinkerPop3 is OLAP-based graph
> mutations. That is:
>
>         graph database --read--> graph processor --write--> graph database
>
> Daniel Kuppitz has provided BulkLoaderVertexProgram with "from scratch" or
> incremental loading capabilities. It would be nice to be able to piggy back
> on this work for OLAP mutations. Here is an idea that borrows from the OLAP
> mutation model developed by Matthias Broecheler in Faunus.
>
> 1. We create a MutatingProgram interface.
> 2. That interface will have a collection of public static final String
> compute keys.
>         - gremlin.mutatingProgram.mutation
>         - gremlin.mutatingProgram.droppedProperties
>         - gremlin.mutatingProgram.addedProperties
> 3. Any VertexProgram can implement MutatingProgram and in doing so, it
> will use the respective compute keys.
> 4. If that VertexProgram deletes a vertex, it does not delete the vertex,
> it simple adds the property "gremlin.mutatingProgram.mutation=dropped."
> 5. If that VertexProgram adds an edge, it adds the edge with the property
> "gremlin.mutatingProgram.mutation=added."
> 6. If that VertexProgram adds a vertex property, it adds the vertex
> property with the property "gremlin.mutatingProgram.mutation=added."
> 7. If that VertexProgram deletes a property, it adds the property to the
> element "gremlin.mutatingProgram.droppedProperties=[x,y,z]"
> 8. ...
> 9. It is up to the VertexProgram to be smart about consistency on
> mutations:
>         * If an edge is added, the next iteration should copy that edge to
> incoming/outgoing vertex's incident edge set.
>         * If an edge property is added, the next iteration should update
> the property on the incoming/outgoing vertex's incident edge's property.
>         * We can provide various static helper methods in MutatingProgram
> to make this easy.
> 10. When the VertexProgram has completed its computation (terminated),
> BulkLoaderVertexProgram will be able to read the resultant graph and use
> the MutatingProgram compute keys as necessary to do the respective updates
> to the source graph (i.e. the graph database).
>
> ---------------------
>
> Problems:
>
>         1. How do we deal with ID generation?
>         2. How do we add vertices in OLAP?
>         3. How do we deal with updates that already occurred at the graph
> database while OLAP was processing?
>
> @kuppitz -- would this notion of a "mutation tags" be useful in
> BulkLoaderVertexProgram? I assume it would make your life much easier as
> you don't have to do a "diff" -- the diff is provided to you.
>
> Thoughts on this matter would be much appreciated.
>
> Thank you,
> Marko.
>
> http://markorodriguez.com
>
>