You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Marko A. Rodriguez (JIRA)" <ji...@apache.org> on 2015/08/27 19:21:46 UTC

[jira] [Created] (TINKERPOP3-813) [Proposal] Make the Gremlin Graph Traversal Machine and Instruction Set Explicit

Marko A. Rodriguez created TINKERPOP3-813:
---------------------------------------------

             Summary: [Proposal] Make the Gremlin Graph Traversal Machine and Instruction Set Explicit
                 Key: TINKERPOP3-813
                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-813
             Project: TinkerPop 3
          Issue Type: Improvement
          Components: process
            Reporter: Marko A. Rodriguez
            Assignee: Marko A. Rodriguez


There is a lot of confusion as to "what is Gremlin?" Is it Groovy? Whats Gremlin-Scala? Does it only work for the JVM? Is it a language or is it an execution engine? Is Gremlin OLAP different from Gremlin OLTP? How is Cypher related to Gremlin? Who is John Galt?

For TinkerPop 3.2.0, we need to make explicit *The Gremlin Graph Traversal Machine and Instruction Set*. Here is a direct mapping to the Java world so people can get a grasp of where we are going with this.

Gremlin-Java8 == Java programming language
Gremlin-Scala == Scala programming language
Gremlin-Groovy == Groovy programming language
...
Gremlin steps == Java instruction set
Gremlin machine == Java virtual machine

Lets go through each one of these to flush out the concepts.

# Gremlin as a Language.

When you see:

{code}
g.V().as("a").out("knows").as("b").
  select("a","b").
    by("name").
    by("age")
{code}

You are bearing witness to Gremlin-Java8. This is a fluent-style graph traversal API that ultimately writes out Gremlin steps. The Gremlin language (i.e. the fluent-style for doing graph traversing) can be embedded in any host language that supports function composition and function nesting. Because of this simple requirement, there exists Gremlin-Groovy, Gremlin-Scala, Gremlin-Clojure, etc. However, if the host language is NOT on the JVM, then it doesn't compile directly to Gremlin steps. How do we support other languages outside the JVM? There are two ways currently, via GremlinServer, but also, via the step-syntax described next.

# Gremlin as an Instruction Set.

{code}
g.V().as("a").out("knows").as("b").
  select("a","b").
    by("name").
    by("age")
{code}

The above Gremlin-Java8 query gets compiled down to a step sequence called a "traversal." Here is the *unique* string representation of the query above.

{code}
[GraphStep([],vertex)@[a], VertexStep(OUT,[knows],vertex)@[b], SelectStep([a, b],[value(name), value(age)])]
{code}

The steps are the primitives of the Gremlin graph traversal machine. They are the parameterized instructions that the machine ultimately executes. The Gremlin instruction set is approximately 30 steps. These are sufficient to provide general purpose computing and what is typically required to express the common motifs of any user query. Moreover, with Gremlin3 there is no need for lambdas so the step syntax above is sufficient to completely express the traversal computation.

Note that ANY one can create a "Gremlin program" (traversal). You can imagine a compiler that takes SPARQL and writes out Gremlin steps. Neo4j's Cypher could be compiled to write out Gremlin steps. Realize that SPARQL is NOT a JVM language. Thus, PHP/Python/Go/Rust/Lisp can all have Gremlin "languages" that simply write out traversal steps in the step-syntax presented previously. Why would anyone want to do this? -- because of the Gremlin graph traversal machine.

# Gremlin as a Virtual Machine.

The Gremlin graph traversal machine can be executed in OLTP or OLAP. What does that mean? It means Gremlin can be executed single machine or distributed across a multi-machine compute cluster. Best of all, NOTHING changes about the language or the instruction set. That is the promise of the Gremlin machine. Most importantly, the Gremlin machine is agnostic to the underlying graph system. It works over Titan, Neo4j, OrientDB, Giraph, Hadoop, Spark, BlazeGraph, InfiniteGraph, SQLg, Bitsy, IBM Graph Store, AmazonDynamoDB, ArangoDB, etc. The Gremlin machine simply executes the steps and a result set is returned. It doesn't care if the steps were created from Gremlin-Java8, Cypher, SPARQL, MyGraphLanguage, R:Statistics, NetLogo, punch cards, etc. It works at the step level, not the human-writable level. This is the power of Gremlin. Gremlin is to the graph world as Java is the programming world. And best of all, Gremlin is owned by the Apache Software Foundation and thus, free to use for any reason commercial or otherwise.

----

So, whats next?

  1. Re-read http://arxiv.org/abs/1508.03843.
  2. Hone the Gremlin step library for TinkerPop 3.2.0
  3. Write a blog explaining this concept for developers to understand.
  4. Write a SPARQL(--) (toy) compiler that generates Gremlin steps for 3.

Hopefully that answers the question -- "What is Gremlin?" and will allow vendors to realize how they can leverage the execution engine of Gremlin for their graph product (and if they want their own query language, simply have it compile to a Gremlin traversal). 

Finally, note that I did not discuss {{TraversalStrategies}} (i.e. compiler optimizers). These are important, but does not effect the conceptual distinctions made above or their practical application.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)