You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2015/10/30 23:26:57 UTC

Does system X require Clojure? No, JVM.

Hello,

While these ideas are not new to people on TinkerPop3, I had a nice revelation that I expressed in the following tweet series.

	https://twitter.com/twarko/status/660215611117535232

Why is there no "standard programming language?" Different programming languages are good at different things.
What makes more languages emerge and grow? A virtual machine abstraction.
The JVM is the breeding ground for programming languages.
Java projects can have many programming languages in them. No worries.

There should be no "standard graph language?" Different graph languages are good at different things.
What makes more graph languages emerge and grow? A traversal machine abstraction.
The Gremlin traversal machine can be the breeding ground for traversal languages.
TinkerPop projects can have many graph languages in them. No worries.

Take care,
Marko.

http://markorodriguez.com


Re: [TinkerPop] Does system X require Clojure? No, JVM.

Posted by Marko Rodriguez <ok...@gmail.com>.
Hello,

There are two ways of going about "more information about the database."

	1. The provider has access to the Traversal and can rewrite as they need.
		- e.g. XXXGraphStepStrategy implementations selecting the appropriate indices.
	2. The provider provides more information to TinkerPop to allow TinkerPop to do the work.
		- e.g. MatchStep (sorta) where we infer the graph statistics from runtime performance.

There has been various discussions on this list (primarily stated by Pieter Martin) about getting schema information to TinkerPop. However, like indices, do we want to make that explicit given every providers differences in how such matters are handled. Thus, its the tradeoff between does the provider do the heavy lifting (1) or does TinkerPop (2). I think there will always be a balance in that providers will always have to do their own XXXGraphStep implementations where they can determine the selectivity of various indices internally. For (2), one of the big pushes for 3.2.0 is the development of "RuntimeTraversalStrategy" which will generalize the "MatchAlgorithm" package (and thus, kill it) to support runtime traversal ordering for other area of Gremlin such as OR, AND, linear reversal, etc.

Marko.

http://markorodriguez.com

On Nov 2, 2015, at 11:12 AM, Matthias Broecheler <me...@matthiasb.com> wrote:

> I think this is a compelling argument, however, it has one major flaw: Gremlin is currently not aware of the schema or any statistics of the underlying graph database.
> For "simple" optimizations that's not too bad - the underlying graph database can simply replace the respective step in the traversal with an optimized step. That's what Titan does for TitanGraphStep or TitanVertexStep. Those are also interesting as there is quite a bit of logic you need to put in there to understand what you can reorder and pull into a step.
> 
> However, it gets pretty complicated when you look at a thing like MatchStep which is very crucial for most of the arguments that make Gremlin a general "traversal machine". Both SPARQL->Gremlin and SQL->Gremlin rely heavily on MatchStep.
> Now, looking more closely at MatchStep there seems to be no way for Titan to instill its knowledge of the schema or statistic or indexes or anything into the algorithm that executes MatchStep. 
> So, for Titan to get a "good" implementation of MatchStep Titan will need to effectively reimplement it. And, arguably, that's a big part of a query language (i.e. the entire declarative piece of Gremlin).
> 
> So, the question then becomes: Does the argument of Gremlin being a universal traversal machine only hold for the imperative parts or can it be extended to the declarative aspects as well?
> 
> On Sat, Oct 31, 2015 at 9:06 AM Marko Rodriguez <ok...@gmail.com> wrote:
> Hi,
> 
> Yesterday I was HipChatting with Alex Popescu (cc:d) about "there is no need for a standard query language" as there is no need for a "standard programming language." He said something to the effect of "that is a strong argument, however there will then be discussions of virtual machine execution vs. native execution."
> 
> Last night I was thinking -- "hmmm, that will be a bad argument to make." Why?
> 
> Gremlin shouldn't be touted as a "virtual machine" but as a "traversal machine" (an execution engine). When Gremlin talks to an underlying graph system its talking to TinkerPop ("Blueprints") and then to the native API of the graph system. For systems that have TinkerPop as their native API (Titan/Bitsy/etc.) Gremlin is not a "virtual machine." For systems that don't (OrientDB/Neo4j/etc.), the cost for the indirection from going from TinkerPop API to the graph systems native API is trivial as its typically just object wrapping on the short-lived object heap (we will amortize this cost later -- watch). Next, all graph systems maintain an "execution engine" for their respective query language. That is, OrientSQL, Cypher, SPARQL ultimately talk to their graph system's API: OrientDB Java API, Neo4j Java API, and Sesame or Jena, respectively. Gremlin does the same thing, it just talks to TinkerPop ("Blueprints") first, which then talks to those APIs. What makes Gremlin neat is that the execution engine and the language are not strongly coupled as its very easy for any graph language to compile to the Gremlin machine. So there is no relative cost in the language->machine translation, the cost (though minor -- wait for it) is in the machine->API translation. However, given the conceptual simplicity (and engineering) of the Gremlin machine, those costs are quickly subsumed. With MatchStep's runtime optimizer, traverser bulking, LazyBarriers, and (most importantly) provider specific compiler strategies (see Titan's beautiful use of these), Gremlin can be faster than the provider's "native query" language. In fact, some internal benchmarking I've done has shown that Gremlin is indeed equal or faster than the native language of the graph system where sometimes those speed differences are 5x to the the life of the universe. Thus, the cost of TinkerPopAPI->NativeAPI is so trivial at that point, its not worth even considering discussing the "cost of virtualization." I suspect that (though this is complete speculation at this point) that X-Language->GremlinMachine->Y-System could be faster than X-Language->Y-System given Gremlin's current (and future) compiler/engine design and evolution.
> 
> Thus, Gremlin shouldn't be seen as a "virtual machine," but as a "traversal machine" that any one can connect to their graph system. It supports any graph language that compiles to it. It is an efficient/simple OLTP/OLAP execution engine pre-written for you.
> 
> Thanks,
> Marko.
> 
> http://markorodriguez.com
> 
> On Oct 31, 2015, at 12:37 AM, pieter <pi...@gmail.com> wrote:
> 
>> Yeah, Cypher/Sparql/OrientQL whatever does not compete with Gremlin.
>> Gremlin enables all of them.
>> 
>> Cheers
>> Pieter
>> 
>> On 31/10/2015 00:26, Marko Rodriguez wrote:
>>> Hello,
>>> 
>>> While these ideas are not new to people on TinkerPop3, I had a nice
>>> revelation that I expressed in the following tweet series.
>>> 
>>> https://twitter.com/twarko/status/660215611117535232
>>> 
>>> Why is there no "standard programming language?" Different programming
>>> languages are good at different things.
>>> What makes more languages emerge and grow? A virtual machine abstraction.
>>> The JVM is the breeding ground for programming languages.
>>> Java projects can have many programming languages in them. No worries.
>>> 
>>> There should be no "standard graph language?" Different graph
>>> languages are good at different things.
>>> What makes more graph languages emerge and grow? A traversal machine
>>> abstraction.
>>> The Gremlin traversal machine can be the breeding ground for traversal
>>> languages.
>>> TinkerPop projects can have many graph languages in them. No worries.
>>> 
>>> Take care,
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Gremlin-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to gremlin-users+unsubscribe@googlegroups.com
>>> <ma...@googlegroups.com>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/gremlin-users/C31C88C0-DE7B-4383-94B8-8E8EEAA82A69%40gmail.com
>>> <https://groups.google.com/d/msgid/gremlin-users/C31C88C0-DE7B-4383-94B8-8E8EEAA82A69%40gmail.com?utm_medium=email&utm_source=footer>.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/563461AC.1070605%40gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/F7FA4CE3-CDCD-40CE-8A10-8D0DF18FD89B%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CAEsQWxrSPnubOYWX%3DqJ2AujYvwu8TsekmbZC_bNurooKRGGG7Q%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.


Re: [TinkerPop] Does system X require Clojure? No, JVM.

Posted by Matthias Broecheler <me...@matthiasb.com>.
I think this is a compelling argument, however, it has one major flaw:
Gremlin is currently not aware of the schema or any statistics of the
underlying graph database.
For "simple" optimizations that's not too bad - the underlying graph
database can simply replace the respective step in the traversal with an
optimized step. That's what Titan does for TitanGraphStep or
TitanVertexStep. Those are also interesting as there is quite a bit of
logic you need to put in there to understand what you can reorder and pull
into a step.

However, it gets pretty complicated when you look at a thing like MatchStep
which is very crucial for most of the arguments that make Gremlin a general
"traversal machine". Both SPARQL->Gremlin and SQL->Gremlin rely heavily on
MatchStep.
Now, looking more closely at MatchStep there seems to be no way for Titan
to instill its knowledge of the schema or statistic or indexes or anything
into the algorithm that executes MatchStep.
So, for Titan to get a "good" implementation of MatchStep Titan will need
to effectively reimplement it. And, arguably, that's a big part of a query
language (i.e. the entire declarative piece of Gremlin).

So, the question then becomes: Does the argument of Gremlin being a
universal traversal machine only hold for the imperative parts or can it be
extended to the declarative aspects as well?

On Sat, Oct 31, 2015 at 9:06 AM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi,
>
> Yesterday I was HipChatting with Alex Popescu (cc:d) about "there is no
> need for a standard query language" as there is no need for a "standard
> programming language." He said something to the effect of "that is a strong
> argument, however there will then be discussions of virtual machine
> execution vs. native execution."
>
> Last night I was thinking -- "hmmm, that will be a bad argument to make."
> Why?
>
> Gremlin shouldn't be touted as a "virtual machine" but as a "traversal
> machine" (an execution engine). When Gremlin talks to an underlying graph
> system its talking to TinkerPop ("Blueprints") and then to the native API
> of the graph system. For systems that have TinkerPop as their native API
> (Titan/Bitsy/etc.) Gremlin is not a "virtual machine." For systems that
> don't (OrientDB/Neo4j/etc.), the cost for the indirection from going from
> TinkerPop API to the graph systems native API is trivial as its typically
> just object wrapping on the short-lived object heap (we will amortize this
> cost later -- watch). Next, all graph systems maintain an "execution
> engine" for their respective query language. That is, OrientSQL, Cypher,
> SPARQL ultimately talk to their graph system's API: OrientDB Java API,
> Neo4j Java API, and Sesame or Jena, respectively. Gremlin does the same
> thing, it just talks to TinkerPop ("Blueprints") first, which then talks to
> those APIs. What makes Gremlin neat is that the execution engine and the
> language are not strongly coupled as its very easy for any graph language
> to compile to the Gremlin machine. So there is no relative cost in the
> language->machine translation, the cost (though minor -- wait for it) is in
> the machine->API translation. However, given the conceptual simplicity (and
> engineering) of the Gremlin machine, those costs are quickly subsumed. With
> MatchStep's runtime optimizer, traverser bulking, LazyBarriers, and (most
> importantly) provider specific compiler strategies (see Titan's beautiful
> use of these), Gremlin can be faster than the provider's "native query"
> language. In fact, some internal benchmarking I've done has shown that
> Gremlin is indeed equal or faster than the native language of the graph
> system where sometimes those speed differences are 5x to the the life of
> the universe. Thus, the cost of TinkerPopAPI->NativeAPI is so trivial at
> that point, its not worth even considering discussing the "cost of
> virtualization." I suspect that (though this is complete speculation at
> this point) that X-Language->GremlinMachine->Y-System could be faster than
> X-Language->Y-System given Gremlin's current (and future) compiler/engine
> design and evolution.
>
> Thus, Gremlin shouldn't be seen as a "virtual machine," but as a
> "traversal machine" that any one can connect to their graph system. It
> supports any graph language that compiles to it. It is an efficient/simple
> OLTP/OLAP execution engine pre-written for you.
>
> Thanks,
> Marko.
>
> http://markorodriguez.com
>
> On Oct 31, 2015, at 12:37 AM, pieter <pi...@gmail.com> wrote:
>
> Yeah, Cypher/Sparql/OrientQL whatever does not compete with Gremlin.
> Gremlin enables all of them.
>
> Cheers
> Pieter
>
> On 31/10/2015 00:26, Marko Rodriguez wrote:
>
> Hello,
>
> While these ideas are not new to people on TinkerPop3, I had a nice
> revelation that I expressed in the following tweet series.
>
> https://twitter.com/twarko/status/660215611117535232
>
> Why is there no "standard programming language?" Different programming
> languages are good at different things.
> What makes more languages emerge and grow? A virtual machine abstraction.
> The JVM is the breeding ground for programming languages.
> Java projects can have many programming languages in them. No worries.
>
> There should be no "standard graph language?" Different graph
> languages are good at different things.
> What makes more graph languages emerge and grow? A traversal machine
> abstraction.
> The Gremlin traversal machine can be the breeding ground for traversal
> languages.
> TinkerPop projects can have many graph languages in them. No worries.
>
> Take care,
> Marko.
>
> http://markorodriguez.com
>
> --
> You received this message because you are subscribed to the Google
> Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to gremlin-users+unsubscribe@googlegroups.com
> <ma...@googlegroups.com>.
> To view this discussion on the web visit
>
> https://groups.google.com/d/msgid/gremlin-users/C31C88C0-DE7B-4383-94B8-8E8EEAA82A69%40gmail.com
> <
> https://groups.google.com/d/msgid/gremlin-users/C31C88C0-DE7B-4383-94B8-8E8EEAA82A69%40gmail.com?utm_medium=email&utm_source=footer
> >.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gremlin-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/563461AC.1070605%40gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gremlin-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/F7FA4CE3-CDCD-40CE-8A10-8D0DF18FD89B%40gmail.com
> <https://groups.google.com/d/msgid/gremlin-users/F7FA4CE3-CDCD-40CE-8A10-8D0DF18FD89B%40gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

Re: [TinkerPop] Does system X require Clojure? No, JVM.

Posted by Marko Rodriguez <ok...@gmail.com>.
Hi,

Yesterday I was HipChatting with Alex Popescu (cc:d) about "there is no need for a standard query language" as there is no need for a "standard programming language." He said something to the effect of "that is a strong argument, however there will then be discussions of virtual machine execution vs. native execution."

Last night I was thinking -- "hmmm, that will be a bad argument to make." Why?

Gremlin shouldn't be touted as a "virtual machine" but as a "traversal machine" (an execution engine). When Gremlin talks to an underlying graph system its talking to TinkerPop ("Blueprints") and then to the native API of the graph system. For systems that have TinkerPop as their native API (Titan/Bitsy/etc.) Gremlin is not a "virtual machine." For systems that don't (OrientDB/Neo4j/etc.), the cost for the indirection from going from TinkerPop API to the graph systems native API is trivial as its typically just object wrapping on the short-lived object heap (we will amortize this cost later -- watch). Next, all graph systems maintain an "execution engine" for their respective query language. That is, OrientSQL, Cypher, SPARQL ultimately talk to their graph system's API: OrientDB Java API, Neo4j Java API, and Sesame or Jena, respectively. Gremlin does the same thing, it just talks to TinkerPop ("Blueprints") first, which then talks to those APIs. What makes Gremlin neat is that the execution engine and the language are not strongly coupled as its very easy for any graph language to compile to the Gremlin machine. So there is no relative cost in the language->machine translation, the cost (though minor -- wait for it) is in the machine->API translation. However, given the conceptual simplicity (and engineering) of the Gremlin machine, those costs are quickly subsumed. With MatchStep's runtime optimizer, traverser bulking, LazyBarriers, and (most importantly) provider specific compiler strategies (see Titan's beautiful use of these), Gremlin can be faster than the provider's "native query" language. In fact, some internal benchmarking I've done has shown that Gremlin is indeed equal or faster than the native language of the graph system where sometimes those speed differences are 5x to the the life of the universe. Thus, the cost of TinkerPopAPI->NativeAPI is so trivial at that point, its not worth even considering discussing the "cost of virtualization." I suspect that (though this is complete speculation at this point) that X-Language->GremlinMachine->Y-System could be faster than X-Language->Y-System given Gremlin's current (and future) compiler/engine design and evolution.

Thus, Gremlin shouldn't be seen as a "virtual machine," but as a "traversal machine" that any one can connect to their graph system. It supports any graph language that compiles to it. It is an efficient/simple OLTP/OLAP execution engine pre-written for you.

Thanks,
Marko.

http://markorodriguez.com

On Oct 31, 2015, at 12:37 AM, pieter <pi...@gmail.com> wrote:

> Yeah, Cypher/Sparql/OrientQL whatever does not compete with Gremlin.
> Gremlin enables all of them.
> 
> Cheers
> Pieter
> 
> On 31/10/2015 00:26, Marko Rodriguez wrote:
>> Hello,
>> 
>> While these ideas are not new to people on TinkerPop3, I had a nice
>> revelation that I expressed in the following tweet series.
>> 
>> https://twitter.com/twarko/status/660215611117535232
>> 
>> Why is there no "standard programming language?" Different programming
>> languages are good at different things.
>> What makes more languages emerge and grow? A virtual machine abstraction.
>> The JVM is the breeding ground for programming languages.
>> Java projects can have many programming languages in them. No worries.
>> 
>> There should be no "standard graph language?" Different graph
>> languages are good at different things.
>> What makes more graph languages emerge and grow? A traversal machine
>> abstraction.
>> The Gremlin traversal machine can be the breeding ground for traversal
>> languages.
>> TinkerPop projects can have many graph languages in them. No worries.
>> 
>> Take care,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Gremlin-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to gremlin-users+unsubscribe@googlegroups.com
>> <ma...@googlegroups.com>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/gremlin-users/C31C88C0-DE7B-4383-94B8-8E8EEAA82A69%40gmail.com
>> <https://groups.google.com/d/msgid/gremlin-users/C31C88C0-DE7B-4383-94B8-8E8EEAA82A69%40gmail.com?utm_medium=email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/563461AC.1070605%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.