You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2016/02/18 23:57:12 UTC

[DISCUSS] A New GraphComputer Memory API for TinkerPop 3.2.0? (BREAKING CHANGE)

Hi people,

Here is a ticket that I think we should strongly consider.

https://issues.apache.org/jira/browse/TINKERPOP-1166 (in particular read my last comment for a clean breakdown)

This would be an API breaking change for both users (who write VertexPrograms) and providers (who have their own GraphComputer implementation).

* If you are a user and don't have any VertexProgram implementations, this will not effect you save for performance gains.
* If you are a graph system provider that does not have a custom GraphComputer (e.g. you rely on SparkGraphComputer for instance), this will not effect you either.

If you do write VertexPrograms, it will require you to go through your VertexProgram and change all your memory.xxx() calls. Here are the stats on the main VertexPrograms TinkerPop has:
PageRankVertexProgram -- 0 memory calls.
PeerPressureVertexProgram -- 3 memory calls. (could be 2 if I was smart organizing my code)
TraversalVertexProgram -- 3 memory calls.
Thus, its not a big rewrite. It will be simply changing, for example, "memory.and("vote",true) to memory.add("vote",true)" .. thats it. No more incr(), sum(), etc. methods. You just .add().

If you do have a custom GraphComputer, it will require you to rewrite your Memory implementation. The logic is basically the same (nothing you can't already express with your system now), but the API will be different, though less methods required. Finally, I will add a GraphComputerTest.shouldRespectTransientKeys() that will make sure transient memory and compute keys are purged prior to returning the ComputerResult.

Please review the proposed changes and provide your feedback. I don't think we will be able to make this a backwards compatible change so, please think hard.

Thanks,
Marko.

http://markorodriguez.com

Re: [DISCUSS] A New GraphComputer Memory API for TinkerPop 3.2.0? (BREAKING CHANGE)

Posted by Marko Rodriguez <ok...@gmail.com>.

Hello,

This work has been completed and merged into master/.

	https://github.com/apache/incubator-tinkerpop/pull/243

This work was well worth it -- even though we have breaking changes to deal with.

GraphComputer providers (non-trivial changes)
	http://tinkerpop.apache.org/docs/3.2.0-SNAPSHOT/upgrade/#_graphcomputer_semantics_and_api
Users (trivial changes if any)
	http://tinkerpop.apache.org/docs/3.2.0-SNAPSHOT/upgrade/#_vertexprogram_and_memorycomputekey_and_vertexcomputekey

Here is what we have gained:

	1. MemoryComputeKeys and VertexComputeKeys can be transient.
		- e.g. No more EDGE_COUNT properties left on the vertices after executing PageRankTraversalVertexProgram.
	2. MemoryComputeKeys can be set to NOT broadcast.
		- e.g. If the workers never need to read a memory value (only add to it), then broadcasting can be turned off.
	3. Gremlin OLAP now fully supports OLTP->OLAP->OLTP->OLAP->etc.
		- When barriers are reached (e.g. groupCount(), count(), sum(), etc.), processing becomes local to the master traversal.
		- When the master traversal starts to touch elements again (vertices/edges/properties), it sends the traversers back to the workers.
		- This process of parallel->sequential->parallel->… can go on indefinitely.
	
#3 is the biggest boon. Gremlin OLTP and Gremlin OLAP can now execute all the same traversals -- save for the following exceptions:

	1. by()-modulators in OLAP can not leave the local star graph. (as before)
	2. path processors (e.g. path(), select()) by()-modulators can only touch element ids. (as before)
	3. --- there are a couple more that are currently not allowed because of semantics issues in OLTP! that are valid in OLAP :)

Now you can do complex, nested, multi-barrier, etc. OLAP traversals.

gremlin> g = TinkerFactory.createModern().traversal().withComputer() 
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], tinkergraphcomputer]
gremlin> g.V().group().by(label).
                 select("person").unfold().
               groupCount().by(bothE().count()).
                 select(keys).sum(local) 
==>4
gremlin>

And, yes, this is one big TraversalVertexProgram!

gremlin> g.V().group().by(label).select("person").unfold().groupCount().by(bothE().count()).select(keys).sum(local).iterate().toString()
==>[TraversalVertexProgramStep([GraphStep([],vertex), GroupStep(label,[FoldStep]), SelectOneStep(person), UnfoldStep, GroupCountStep([VertexStep(BOTH,edge), CountGlobalStep]), LambdaMapStep(keys), SumLocalStep]), ComputerResultStep]
gremlin>

Even though we have multiple barriers -- group() and groupCount(), TraversalVertexProgram is smart about how to converge barriers into "OLTP streams" and back again. Its actually all very clean and simple.

Enjoy,
Marko.

http://markorodriguez.com

On Feb 18, 2016, at 3:57 PM, Marko Rodriguez <ok...@gmail.com> wrote:

> Hi people,
> 
> Here is a ticket that I think we should strongly consider.
> 
> 	https://issues.apache.org/jira/browse/TINKERPOP-1166 (in particular read my last comment for a clean breakdown)
> 	
> This would be an API breaking change for both users (who write VertexPrograms) and providers (who have their own GraphComputer implementation).
> 
> * If you are a user and don't have any VertexProgram implementations, this will not effect you save for performance gains.
> * If you are a graph system provider that does not have a custom GraphComputer (e.g. you rely on SparkGraphComputer for instance), this will not effect you either.
> 
> If you do write VertexPrograms, it will require you to go through your VertexProgram and change all your memory.xxx() calls. Here are the stats on the main VertexPrograms TinkerPop has:
> 	PageRankVertexProgram -- 0 memory calls.
> 	PeerPressureVertexProgram -- 3 memory calls. (could be 2 if I was smart organizing my code)
> 	TraversalVertexProgram -- 3 memory calls.
> Thus, its not a big rewrite. It will be simply changing, for example, "memory.and("vote",true) to memory.add("vote",true)" .. thats it. No more incr(), sum(), etc. methods. You just .add().
> 
> If you do have a custom GraphComputer, it will require you to rewrite your Memory implementation. The logic is basically the same (nothing you can't already express with your system now), but the API will be different, though less methods required. Finally, I will add a GraphComputerTest.shouldRespectTransientKeys() that will make sure transient memory and compute keys are purged prior to returning the ComputerResult.
> 
> Please review the proposed changes and provide your feedback. I don't think we will be able to make this a backwards compatible change so, please think hard.
> 
> Thanks,
> Marko.
> 
> http://markorodriguez.com
>