You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2019/03/29 17:34:01 UTC

TinkerPop4 Status Report #2

Hello,

This is an update of what I’ve been up to on the tp4/ branch since the last report 2 weeks ago.

	1. Arguments
		TP4 brings the concept of an Argument to the front and center. An argument can either be a constant (e.g. 2) or a dynamically determined value (e.g. out().count()). This means that users will be able to do things such as:
			* has(‘name’,out(‘father’).value(‘name’)) // is he a jr?
			* is(eq(out(‘mananger’))) // is he is own boss?
		This flexibility is starting to make the steps bleed into each other.
			is(eq(select(‘a’))) == where(eq(‘a’))
		One Gremlin-C# guy on Twitter was saying that Gremlin has too many ways to do things. It will be nice if we can reduce the number of steps we have with Arguments.

	2. Console
		Java9+ brings with it JShell. I posed the question on dev@ — do we need GremlinConsole?
			https://lists.apache.org/thread.html/b9083cf992b01bcfe4b82d14b9aa2d30c90707c4c134c6cfefade4ae@%3Cdev.tinkerpop.apache.org%3E <https://lists.apache.org/thread.html/b9083cf992b01bcfe4b82d14b9aa2d30c90707c4c134c6cfefade4ae@%3Cdev.tinkerpop.apache.org%3E>
		It is possible to configure JShell to look (and feel?) like the GremlinConsole with a short startup script.
		I would like to shoot for TP4 being as small and compact as possible — less to build, less to document, less to maintain, …
		Gremlin-Java -> JShell, Gremlin-Groovy -> GroovySh, Gremlin-Python -> Python CLI, … why not reuse?
		The most beautiful code is the code that was never written. The greatest programmers are those that coded themselves out of a job. Let us be great and beautiful.

	3. Data Structures
		I’m still trying to figure out how to generalize Gremlin out of graph. Limited luck.
		Worked with Kuppitz a bit on how to represent all steps using just map, flatmap, reduce, filter, branch only! (its a little too nutz for my tastes, but maybe…)
			https://twitter.com/twarko/status/1109491874333515778 <https://twitter.com/twarko/status/1109491874333515778>
		Ryan Wisnesky was kind enough to provide a demo of his Category Query Language (CQL) on Monday. Cool stuff indeed.
		Ryan pointed me to this paper which I found worthwhile: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.3252&rep=rep1&type=pdf <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.3252&rep=rep1&type=pdf>
		This is the big unknown for me and I want to solve it. If we can do this right, TinkerPop will permeate all things Apache…all things data.
			https://twitter.com/twarko/status/1109540859442163712 <https://twitter.com/twarko/status/1109540859442163712>

	4. The Machine
		I introduced the Machine interface.
			https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java>
		This interface encompasses both TraversalSource and RemoteConnection functionality.
		The general use is g = Gremlin.traversal(machine).withProcess(...).withStrategy(...)
		This move turned Gremlin into basically “nothing” — Gremlin is a just the “builder-pattern” applied to Bytecode. Check out how small Gremlin is!
			https://github.com/apache/tinkerpop/tree/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin <https://github.com/apache/tinkerpop/tree/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin>
			Thats it. ?! … Gremlin is trivial. Much less to consider for Gremlin-JS, Gremlin-C#, Gremlin-?? …

	5. RemoteMachine, TraverserServer, and MachineServer
		https://twitter.com/twarko/status/1110612168968265729 <https://twitter.com/twarko/status/1110612168968265729>
		“GremlinServer” is too serial in concept. Receive bytecode, execute bytecode, aggregate traversers, return traversers.
			- This is bad. We need to start thinking distributed execution and aggregation from the start. We need to blur the concept of a “server.”
		https://github.com/apache/tinkerpop/tree/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/remote <https://github.com/apache/tinkerpop/tree/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/remote>
			MachineServer — sits somewhere an accepts Bytecode. (multi-threaded server)
			RemoteMachine —  can talk to a MachineServer to submit Bytecode.	(single socket client)
			Processor — exists throughout the cluster and executes bytecode. (parallel/distributed execution engine)
			TraverserServer — can sit somewhere and parallellily (?is that a word?) accept traverser results. (multi-threaded server)
		The thing which accepts bytecode, the thing which executes bytecode, and the thing which aggregates results are all different things and the entailments are worthy.
		Much like how the Machine interface killed the complexity of Gremlin, I believe this server architecture will kill the complexity of GremlinServer.
			- The biggest part of our I/O will be the binary protocol (for now I’m just using Object[Input/Output]Stream).
	
	6. Implementing Instructions
		I’m trying not to rip out the full language as I just want to focus on implementing only one instruction from each “class” of instruction.
		This way, if an insight comes, large amounts of code don’t need to be rewritten.
		My latest achievement was the implementation of order().by().by(). [from the barrier class of instructions]
			- Along with match() and repeat(), this is arguably one of the more difficult steps to implement.
			- The TP4 implementation is 1/3 the size of the TP3 implementation and it just worked right out of the box on Apache Beam.
			- The abstract VM model we have in TP4 is simple and consistent. Complex operations are just working.

There you have it. That is a review of the tp4/ branch over the last two weeks. Moving forward, I hope to make headway on the following:

	* AkkaProcessor
		- unlike Pipes and Beam where Function is the thread of execution, for Akka, Traverser is the thread of execution.
		- Will the TP4 architecture be able to naturally support this conceptual tweak? TP3 couldn’t.
	* A data structure breakthrough.
		- Contrary to popular belief, everything is not a graph. 
		- The only time I think “graph” is when I talk to a graphdb. 
		- For the most part I think in lists, maps, sets, primitives — don’t you?
	* A better understanding of the TP4 instruction set.
		- What is truly needed? What is our core instruction set?
	* A documentation infrastructure stub.
		- Gremlin-Groovy away… how do we do documentation?
	* Traverser species
		- I’m currently copying the TP3 model. I didn’t like it before and I still don’t like it.
	* Strategies
		- I haven’t worked on this much, but I believe we might have “strategies” all wrong (these are our compiler optimizations).
		- The TP3 model worked well enough for TP3, but for TP4, I think we might need a major conceptual overhaul.
		- Just a feeling at this point…

Thanks for reading. As always, I’m more than happy to receive any questions or comments.

Take care,
Marko.

http://rredux.com <http://rredux.com/>