You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2016/01/13 19:02:25 UTC

[DISCUSS] Big ideas for Traversals/DSLs/OLAP in TinkerPop 3.2.0

Hello everyone,

There is currently no active development on TinkerPop 3.2.0, however, in my spare time I've been developing (on paper) some new ideas that should make traversals, DSLs, and OLAP even better.

---------------------------------------------------------------------------------------

Problem #1: The Builder pattern for TraversalSources is lame.
 	[https://issues.apache.org/jira/browse/TINKERPOP-971]

The first proposal is to make use of a fluent API to construct a TraversalSource and then, ultimately spawn a Traversal. For instance:

	g = graph.traversal().withComputer(SparkGraphComputer).withStrategy(MyPersonalStrategy.instance());

And, as we can't do this currently in 3.1.x:

	g = graph.traversal().withComputer(graph -> graph.compute(SparkGraphComputer).workers(10)).withStrategy(MyPersonalStrategy.instance());

In essence, like Traversal, a TraversalSource is constructed in a fluent manner. The methods for TraversalSource would be:

	withComputer()
	withStrategy()
	withoutStrategy() // remove default strategies
	withBulk() // 3.2.0 will provide generalized bulking [https://issues.apache.org/jira/browse/TINKERPOP-960]
	withSack() // you can declare the sack once and reuse its definition with each traversal spawned
	withSideEffect() // like withSack() (and don't worry, immutable thread-safe..you'll see)

Finally, for custom DSLs with respective TraversalSources, users will be able to do:
	[https://issues.apache.org/jira/browse/TINKERPOP-786]

	social = graph.traversal(SocialTraversalSource.class).withComputer(…).withStrategy(…).withBulk(…)

---------------------------------------------------------------------------------------

Problem #2: It is not natural going from OLTP to OLAP to OLTP to OLAP.
	[https://issues.apache.org/jira/browse/TINKERPOP-570]

For this problem, I think we can go far with an AbstractVertexProgramStep<ComputerResult,ComputerResult> in the core step library with, for example, TraversalVertexProgramStep and PageRankVertexProgramStep being subclasses. What does this get us?

	g = graph.traversal().withComputer(SparkGraphComputer)
	g.V().values("name")

The above traversal would compile to:

	[TraversalVertexProgramStep([GraphStep,PropertiesStep(values,name)], ComputerResultStep]

Thus, TraversalVertexProgramStep would simply pass its ComputerResult to ComputerResultStep<ComputerResult,E> which would know to flatMap-out computerResult.memory().get("~traversers"). Okay, so this is all fine and good and we currently do something analogous to this today. However, watch when we do an OLAP chain.

	g.V().hasLabel("person").pageRank(out("knows")).by("page.rank").valueMap("name","page.rank")

The above traversal would compile to:

	[TraversalVertexProgramStep([GraphStep,HasStep(label,person)]),PageRankVertexProgramStep(0.85,[VertexStep(knows)]),TraversalVertexProgramStep([PropertyMapStep(name,page.rank)]),ComputerResultStep]

The first TraversalVertexProgramStep will give its ComputerResult to PageRankVertexProgram. PageRankVertexProgram will use the computerResult.graph() for its computation. Note that after the completion of this PageRank computation, the subsequent ComputerResult.graph() will have both HALTED_TRAVERSERS and PAGE_RANK properties on the vertices. Thus, when the final TraversalVertexProgramStep takes over, it simply brings the HALTED_TRAVERSERS (which are at person vertices) back to life and then it will execute its traversal which will be able to read the PAGE_RANK values! Tada!

To give even more street-cred to this idea, check this traversal:

	g.V().hasLabel("person").pageRank(out("knows")).by("page.rank").order().by("page.rank",decr).limit(10).values("name")

This will compile to:

	[TraversalVertexProgramStep([GraphStep,HasStep(label,person)]),				// OLAP 
	 PageRankVertexProgramStep(0.85,[VertexStep(knows)]),					// OLAP 
	 TraversalVertexProgramStep([OrderGlobalStep(page.rank,decr)]),ComputerResultStep, 	// OLAP
	 RangeGlobalStep(0,10),PropertiesStep(values,name)] 					// OLTP

The ability to compile arbitrary segments of a traversal into an OLAP job and then pass the ComputerResult.graph() between jobs will enable us to easily (w/o user awareness) move between OLTP/OLAP within a single traversal. Moreover, there are numerous traversal patterns that currently will not execute in OLAP (e.g. you may sometimes see exceptions like "mid-traversal barriers are not allowed"), but with this model, they will work as we will be able to go OLAP->OLTP->OLAP.

---------------------------------------------------------------------------------------

Conclusion

In conclusion, configuring a TraversalSource will be much more elegant and we will be able to incorporate VertexPrograms into the Traversal API. Moreover, much like we have "lambda steps" for user defined step functions (e.g. map{it.get() + Math.sqrt(10)}), we will have a program()-step.

	g.V().program(MyVertexProgram.instance()).values("my.vertex.program.property")

For all the VertexPrograms that TinkerPop provides, we will have respective steps in the GraphTraversal API that will have all the nice by()-modulations/etc.

I hope everyone sees the beauty of this new model and perhaps has some thoughts/recommendations regarding its design. Finally, as a parting note, contemplate this:

	g.V().hasLabel("person").pageRank(out("knows")).by("page.rank").bulkLoad(graph,"page.rank") 
		/// needs thought, but you get the direction.

Take care,
Marko.

http://markorodriguez.com