You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2019/04/15 18:43:49 UTC

[DISCUSS] The Two Protocols of TP4

Hello,

I believe there will only be two protocols in TP4.

	1. The VM communication protocol. (Rexster)
	2. The data serialization protocol. (Frames)

[VM COMMUNICATION PROTOCOL]

	1. Register bytecode —returns—> bytecode.
	2. Submit bytecode —returns—> iterator of traversers.
	3. Unregister bytecode source —returns—> void

Here is a trippy idea. These operations are simply bytecode.

	1. [[register,[bytecode]]] —returns—> single traverser referencing bytecode.
	2. [[submit, [bytecode]]] —returns—> many traversers referencing primitives.
	3. [[unregister, [bytecode]]] —returns —> no traversers.

Thus, THE ONLY THING YOU SEND TO THE TP4 VM IS BYTECODE and THE ONLY THING RETURNED IS ZERO OR MORE TRAVERSERS!

Now, think about JanusGraph. It has database operations such as create index, create schema, drop graph, etc. These are just custom instructions in the bytecode of submit.

	[[submit, [[jg:createIndex,people-idx,person]]]

A JaunusGraph strategy will know what to do with that instruction and a traverser can be returned. Traverser.of(“SUCCESS”). And there you have, just like processing instructions are extended via namespaced instructions and strategies, so are server instructions. Providers have an extensible framework to support all their custom operations because, in the end, its just bytecode, strategies, and resultant traversers! (everything is the same).
 
Next, in order to send bytecode and get back traversers ‘over the wire', there needs to be a serialization specification.

[DATA SERIALIZATION PROTOCOL]

	1. I don’t know much about GraphBinary, but I believe its this without complex types.
		- Why?
			- bytecode is primitive.
			- traversers are primitive (as they can’t reference complex types — see other [DISCUSS] from today).


Thoughts?,
Marko.

http://rredux.com <http://rredux.com/>

Re: [DISCUSS] The Two Protocols of TP4

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi,

> Currently users can send either bytecode or groovy scripts to be executed
> on the server. I'm saying we replace "groovy scripts evaluation" with
> "gremlin groovy traversal execution”.

I concur. But why even send Gremlin-Groovy traversals? Just send bytecode.
	- assuming we can get rid of lambdas

> In TP3, it's possible for the user to submit to the script engine something
> like "Thread.sleep(4000)" that will be executed inside a sandboxed vm.
> I'm proposing we get rid of this approach in TP4 and, as gremlin groovy
> script are still useful (for example, you can store a bunch of traversals
> to execute in a text file), we replace it with a language recognition
> engine that will parse what is sent and evaluate it, using a restricted
> grammar set. The variant for gremlin strings would still be groovy/java but
> the user won't be able to submit arbitrary groovy instructions.

Understood. Again, I would make this super simple by just sending bytecode.

One thing I’m pushing for is a “reference implementation server.” No more monolithic GremlinServer. The reference server has the following features:

	- Sits on a socket waiting for bytecode.
	- Executes bytecode and returns traversers.
	- For distributed processors, can send traversers back to client from any machine in the cluster.

From this reference server, providers can extend it as they see fit. Perhaps someone wants to execute Groovy scripts!

	- ScriptEngineStrategy
	- ScriptEngineFlatMap
	- [ex:script,groovy,Thread.sleep(1000)]

In other words, our reference implementation server is bare bones, rock solid, speedy, and safe. How the pieces are reassembled by the provider is up to them.

Thoughts?,
Marko.

http://rredux.com <http://rredux.com/>

Re: [DISCUSS] The Two Protocols of TP4

Posted by Jorge Bay Gondra <jo...@gmail.com>.

> are you saying that we should write an ANTLR parser that compiles
Gremlin-XXX into Bytecode directly?

Not exactly.

Currently users can send either bytecode or groovy scripts to be executed
on the server. I'm saying we replace "groovy scripts evaluation" with
"gremlin groovy traversal execution".

In TP3, it's possible for the user to submit to the script engine something
like "Thread.sleep(4000)" that will be executed inside a sandboxed vm.
I'm proposing we get rid of this approach in TP4 and, as gremlin groovy
script are still useful (for example, you can store a bunch of traversals
to execute in a text file), we replace it with a language recognition
engine that will parse what is sent and evaluate it, using a restricted
grammar set. The variant for gremlin strings would still be groovy/java but
the user won't be able to submit arbitrary groovy instructions.

I think this is not directly related to this thread (sorry!), do you think
I should start a new one to discuss this?

Jorge

On Tue, Apr 23, 2019 at 1:14 PM Marko Rodriguez <ok...@gmail.com>
wrote:

> Whoa! — are you saying that we should write an ANTLR parser that compiles
> Gremlin-XXX into Bytecode directly?
>
> Thus, for every Gremlin language variant, we will have an ANTLR parser.
>
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
> > On Apr 23, 2019, at 5:01 AM, Jorge Bay Gondra <jo...@gmail.com>
> wrote:
> >
> > Hi,
> > Language recognition engines will give us a set of tokens, usually in
> some
> > sort of tree but the result can be thought of nested collections, for
> > example:
> >
> > The following string "g.V().values('name')" could be parsed into
> something
> > like [["g"], ["V"], ["values", "name"]].
> >
> > Then, we would have to create some sort of "evaluator", that translates
> > these string tokens into a traversal, similar to bytecode parsing and
> > execution. This evaluator can use static evaluation of the tokens (like,
> do
> > the tokens evaluate into something meaningful?), can be optimized with
> > caching techniques (like preparing traversals) and more importantly, will
> > only execute class methods that are whitelisted, i.e., users can't use it
> > to execute arbitrary groovy code.
> >
> > Best,
> > Jorge
> >
> >
> > On Tue, Apr 23, 2019 at 12:36 PM Marko Rodriguez <okrammarko@gmail.com
> <ma...@gmail.com>>
> > wrote:
> >
> >> Hi Jorge,
> >>
> >>> Instead of supporting a ScriptEngine or enable providers to implement
> >> one,
> >>> TP4 could be a good opportunity to ditch script engines while continue
> >>> supporting gremlin-groovy string literals using language recognition
> >>> engines like ANTLR.
> >>
> >> Huh…….. Can you explain how you think of using ANTLR vs
> >> ScriptEngine.submit(String)
> >>
> >>> Language recognition and parsing engines have several benefits over the
> >>> current approach, most notably that it's safe to parse text using
> >> language
> >>> recognition as it results in string tokens, opposed to let users run
> code
> >>> in a sandboxed vm.
> >>
> >> How would the ANTLR-parsed text ultimately be executed?
> >>
> >> Thanks,
> >> Marko.
> >>
> >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>>
>
>

Re: [DISCUSS] The Two Protocols of TP4

Posted by Marko Rodriguez <ok...@gmail.com>.

Whoa! — are you saying that we should write an ANTLR parser that compiles Gremlin-XXX into Bytecode directly?

Thus, for every Gremlin language variant, we will have an ANTLR parser.

Marko.

http://rredux.com <http://rredux.com/>




> On Apr 23, 2019, at 5:01 AM, Jorge Bay Gondra <jo...@gmail.com> wrote:
> 
> Hi,
> Language recognition engines will give us a set of tokens, usually in some
> sort of tree but the result can be thought of nested collections, for
> example:
> 
> The following string "g.V().values('name')" could be parsed into something
> like [["g"], ["V"], ["values", "name"]].
> 
> Then, we would have to create some sort of "evaluator", that translates
> these string tokens into a traversal, similar to bytecode parsing and
> execution. This evaluator can use static evaluation of the tokens (like, do
> the tokens evaluate into something meaningful?), can be optimized with
> caching techniques (like preparing traversals) and more importantly, will
> only execute class methods that are whitelisted, i.e., users can't use it
> to execute arbitrary groovy code.
> 
> Best,
> Jorge
> 
> 
> On Tue, Apr 23, 2019 at 12:36 PM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>>
> wrote:
> 
>> Hi Jorge,
>> 
>>> Instead of supporting a ScriptEngine or enable providers to implement
>> one,
>>> TP4 could be a good opportunity to ditch script engines while continue
>>> supporting gremlin-groovy string literals using language recognition
>>> engines like ANTLR.
>> 
>> Huh…….. Can you explain how you think of using ANTLR vs
>> ScriptEngine.submit(String)
>> 
>>> Language recognition and parsing engines have several benefits over the
>>> current approach, most notably that it's safe to parse text using
>> language
>>> recognition as it results in string tokens, opposed to let users run code
>>> in a sandboxed vm.
>> 
>> How would the ANTLR-parsed text ultimately be executed?
>> 
>> Thanks,
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>

Re: [DISCUSS] The Two Protocols of TP4

Posted by Jorge Bay Gondra <jo...@gmail.com>.

Hi,
Language recognition engines will give us a set of tokens, usually in some
sort of tree but the result can be thought of nested collections, for
example:

The following string "g.V().values('name')" could be parsed into something
like [["g"], ["V"], ["values", "name"]].

Then, we would have to create some sort of "evaluator", that translates
these string tokens into a traversal, similar to bytecode parsing and
execution. This evaluator can use static evaluation of the tokens (like, do
the tokens evaluate into something meaningful?), can be optimized with
caching techniques (like preparing traversals) and more importantly, will
only execute class methods that are whitelisted, i.e., users can't use it
to execute arbitrary groovy code.

Best,
Jorge

On Tue, Apr 23, 2019 at 12:36 PM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi Jorge,
>
> > Instead of supporting a ScriptEngine or enable providers to implement
> one,
> > TP4 could be a good opportunity to ditch script engines while continue
> > supporting gremlin-groovy string literals using language recognition
> > engines like ANTLR.
>
> Huh…….. Can you explain how you think of using ANTLR vs
> ScriptEngine.submit(String)
>
> > Language recognition and parsing engines have several benefits over the
> > current approach, most notably that it's safe to parse text using
> language
> > recognition as it results in string tokens, opposed to let users run code
> > in a sandboxed vm.
>
> How would the ANTLR-parsed text ultimately be executed?
>
> Thanks,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>

Re: [DISCUSS] The Two Protocols of TP4

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi Jorge,

> Instead of supporting a ScriptEngine or enable providers to implement one,
> TP4 could be a good opportunity to ditch script engines while continue
> supporting gremlin-groovy string literals using language recognition
> engines like ANTLR.

Huh…….. Can you explain how you think of using ANTLR vs ScriptEngine.submit(String)

> Language recognition and parsing engines have several benefits over the
> current approach, most notably that it's safe to parse text using language
> recognition as it results in string tokens, opposed to let users run code
> in a sandboxed vm.

How would the ANTLR-parsed text ultimately be executed?

Thanks,
Marko.

http://rredux.com <http://rredux.com/>

Re: [DISCUSS] The Two Protocols of TP4

Posted by Jorge Bay Gondra <jo...@gmail.com>.

Hi,
I'm still trying to catch up with TP4 topics.

I agree that we can reuse bytecode to submit gremlin string literals,
like [[submit,
[ex:script, gremlin-groovy, g.V.out.name]]]

Instead of supporting a ScriptEngine or enable providers to implement one,
TP4 could be a good opportunity to ditch script engines while continue
supporting gremlin-groovy string literals using language recognition
engines like ANTLR.

Language recognition and parsing engines have several benefits over the
current approach, most notably that it's safe to parse text using language
recognition as it results in string tokens, opposed to let users run code
in a sandboxed vm.

Jorge



On Tue, Apr 16, 2019 at 8:43 PM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi,
>
>
> > hmm - it sounds like supporting the vm protocol requires a session. like
> > each "g" from a client needs to hold state on the server between
> requests.
> > or am i thinking about it too concretely and this protocol is more of an
> > abstraction of what's happening?
>
> No, you are right. Its pretty analogous to TP3. The server holds a bunch
> of “g” instances. “g” instances are thread-safe and immutable. Submitted
> bytecode can have a source instruction that references a cached “g” on the
> server (e.g. via a UUID — though this is up to the Machine implementation).
> If it does, then that cached “g” is used to spawn the traversal via the
> operation instructions. Also, this is not just for “over the wire”
> communication. Its not specific to server behavior. The Machine interface
> can be a LocalMachine and still you have this notion of pre-compiled source
> instructions that were machine.registered().
>
>
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/LocalMachine.java#L41
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/LocalMachine.java#L41
> >
>
> Finally, if you want to build a Machine that doesn’t pre-compile the
> source instructions, well, this is what your Machine implementation looks
> like:
>
>
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/BasicMachine.java
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/BasicMachine.java
> >
>
> Marko.
>
> >
> >
> > On Tue, Apr 16, 2019 at 1:58 PM Marko Rodriguez <okrammarko@gmail.com
> <ma...@gmail.com>>
> > wrote:
> >
> >> Hi,
> >>
> >>> i get the "submit" part but could you explain the "register" and
> >>> "unregister" parts (referenced in another post somewhere perhaps)?
> >>
> >> These three methods are from the Machine API.
> >>
> >>
> >>
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> >
> >> <
> >>
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> >
> >>>
> >>
> >> Bytecode is composed of two sets of instructions.
> >>        - source instructions
> >>        - operation instructions
> >>
> >> source instructions are withProcessor(), withStructure(),
> withStrategy(),
> >> etc.
> >> operation instructions are out(), in(), count(), where(), etc.
> >>
> >> The source instructions are expensive to execute. Why? — when you
> evaluate
> >> a withStructure(), you are creating a connection to the database. When
> you
> >> evaluate a withStrategy(), you are sorting strategies. It is for this
> >> reason that we have the concept of a TraversalSource in TP3 that does
> all
> >> that “setup stuff” once and only once for each g. The reason we tell
> people
> >> to not do graph.traversal().V(), but instead g = graph.traversal(). Once
> >> you have ‘g’, you can then spawn as many traversals as you want off
> that it
> >> without incurring the cost of re-processing the source instructions
> again.
> >>
> >> In TP4, there is no state in Gremlin’s TraversalSource. Gremlin doesn’t
> >> know about databases, processors, strategy compilation, etc. Thus, when
> you
> >> Machine.register(Bytecode) you are sending over the source instructions,
> >> having them processed at the TP4 VM and then all subsequent submits()
> with
> >> the same source instruction header will use the “pre-compiled” source
> >> bytecode cached in the TP4 VM. g.close() basically does
> >> Machine.unregister().
> >>
> >>
> >>
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112
> >> <
> >>
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112
> >
> >>>
> >>
> >>
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116
> >
> >> <
> >>
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116
> >
> >>>
> >>
> >> In short, we have just offloaded the TP3 TraversalSource work to TP4
> >> Machine.
> >>
> >> HTH,
> >> Marko.
> >>
> >> P.S. I don’t like the term “source instructions.” I’m thinking of
> calling
> >> them “meta instructions” or “setup instructions” or “staging
> instructions’
> >> … ?
> >>
> >>
> >>
> >>
> >>
> >>>
> >>> regarding this:
> >>>
> >>>> just like processing instructions are extended via namespaced
> >>> instructions and strategies, so are server instructions
> >>>
> >>> i was thinking that an extensible bytecode model would be the solution
> >> for
> >>> these kinds of things. without the scriptengine anymore (stoked to see
> >> that
> >>> go away) graph providers with schema languages and other admin
> functions
> >>> will need something to replace that. what's neat about that option is
> >> that
> >>> such features would no longer need to be bound to just the JVM. Python
> >>> users could use the JanusGraph clean utility to drop a database or use
> >>> javscript to create a graph in DSE Graph. pretty cool.
> >>>
> >>>
> >>> On Mon, Apr 15, 2019 at 2:44 PM Marko Rodriguez <okrammarko@gmail.com
> >> <mailto:okrammarko@gmail.com <ma...@gmail.com>>>
> >>> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I believe there will only be two protocols in TP4.
> >>>>
> >>>>       1. The VM communication protocol. (Rexster)
> >>>>       2. The data serialization protocol. (Frames)
> >>>>
> >>>> [VM COMMUNICATION PROTOCOL]
> >>>>
> >>>>       1. Register bytecode —returns—> bytecode.
> >>>>       2. Submit bytecode —returns—> iterator of traversers.
> >>>>       3. Unregister bytecode source —returns—> void
> >>>>
> >>>> Here is a trippy idea. These operations are simply bytecode.
> >>>>
> >>>>       1. [[register,[bytecode]]] —returns—> single traverser
> >> referencing
> >>>> bytecode.
> >>>>       2. [[submit, [bytecode]]] —returns—> many traversers referencing
> >>>> primitives.
> >>>>       3. [[unregister, [bytecode]]] —returns —> no traversers.
> >>>>
> >>>> Thus, THE ONLY THING YOU SEND TO THE TP4 VM IS BYTECODE and THE ONLY
> >> THING
> >>>> RETURNED IS ZERO OR MORE TRAVERSERS!
> >>>>
> >>>> Now, think about JanusGraph. It has database operations such as create
> >>>> index, create schema, drop graph, etc. These are just custom
> >> instructions
> >>>> in the bytecode of submit.
> >>>>
> >>>>       [[submit, [[jg:createIndex,people-idx,person]]]
> >>>>
> >>>> A JaunusGraph strategy will know what to do with that instruction and
> a
> >>>> traverser can be returned. Traverser.of(“SUCCESS”). And there you
> have,
> >>>> just like processing instructions are extended via namespaced
> >> instructions
> >>>> and strategies, so are server instructions. Providers have an
> extensible
> >>>> framework to support all their custom operations because, in the end,
> >> its
> >>>> just bytecode, strategies, and resultant traversers! (everything is
> the
> >>>> same).
> >>>>
> >>>> Next, in order to send bytecode and get back traversers ‘over the
> wire',
> >>>> there needs to be a serialization specification.
> >>>>
> >>>> [DATA SERIALIZATION PROTOCOL]
> >>>>
> >>>>       1. I don’t know much about GraphBinary, but I believe its this
> >>>> without complex types.
> >>>>               - Why?
> >>>>                       - bytecode is primitive.
> >>>>                       - traversers are primitive (as they can’t
> >>>> reference complex types — see other [DISCUSS] from today).
> >>>>
> >>>>
> >>>> Thoughts?,
> >>>> Marko.
> >>>>
> >>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
> >> http://rredux.com/ <http://rredux.com/>>>
>
>

Re: [DISCUSS] The Two Protocols of TP4

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi,


> hmm - it sounds like supporting the vm protocol requires a session. like
> each "g" from a client needs to hold state on the server between requests.
> or am i thinking about it too concretely and this protocol is more of an
> abstraction of what's happening?

No, you are right. Its pretty analogous to TP3. The server holds a bunch of “g” instances. “g” instances are thread-safe and immutable. Submitted bytecode can have a source instruction that references a cached “g” on the server (e.g. via a UUID — though this is up to the Machine implementation). If it does, then that cached “g” is used to spawn the traversal via the operation instructions. Also, this is not just for “over the wire” communication. Its not specific to server behavior. The Machine interface can be a LocalMachine and still you have this notion of pre-compiled source instructions that were machine.registered().

	https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/LocalMachine.java#L41 <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/LocalMachine.java#L41>

Finally, if you want to build a Machine that doesn’t pre-compile the source instructions, well, this is what your Machine implementation looks like:

	https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/BasicMachine.java <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/BasicMachine.java>

Marko.

> 
> 
> On Tue, Apr 16, 2019 at 1:58 PM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>>
> wrote:
> 
>> Hi,
>> 
>>> i get the "submit" part but could you explain the "register" and
>>> "unregister" parts (referenced in another post somewhere perhaps)?
>> 
>> These three methods are from the Machine API.
>> 
>> 
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java>
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java>
>>> 
>> 
>> Bytecode is composed of two sets of instructions.
>>        - source instructions
>>        - operation instructions
>> 
>> source instructions are withProcessor(), withStructure(), withStrategy(),
>> etc.
>> operation instructions are out(), in(), count(), where(), etc.
>> 
>> The source instructions are expensive to execute. Why? — when you evaluate
>> a withStructure(), you are creating a connection to the database. When you
>> evaluate a withStrategy(), you are sorting strategies. It is for this
>> reason that we have the concept of a TraversalSource in TP3 that does all
>> that “setup stuff” once and only once for each g. The reason we tell people
>> to not do graph.traversal().V(), but instead g = graph.traversal(). Once
>> you have ‘g’, you can then spawn as many traversals as you want off that it
>> without incurring the cost of re-processing the source instructions again.
>> 
>> In TP4, there is no state in Gremlin’s TraversalSource. Gremlin doesn’t
>> know about databases, processors, strategy compilation, etc. Thus, when you
>> Machine.register(Bytecode) you are sending over the source instructions,
>> having them processed at the TP4 VM and then all subsequent submits() with
>> the same source instruction header will use the “pre-compiled” source
>> bytecode cached in the TP4 VM. g.close() basically does
>> Machine.unregister().
>> 
>> 
>> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112 <https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112>
>>> 
>> 
>> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116 <https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116>
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116 <https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116>
>>> 
>> 
>> In short, we have just offloaded the TP3 TraversalSource work to TP4
>> Machine.
>> 
>> HTH,
>> Marko.
>> 
>> P.S. I don’t like the term “source instructions.” I’m thinking of calling
>> them “meta instructions” or “setup instructions” or “staging instructions’
>> … ?
>> 
>> 
>> 
>> 
>> 
>>> 
>>> regarding this:
>>> 
>>>> just like processing instructions are extended via namespaced
>>> instructions and strategies, so are server instructions
>>> 
>>> i was thinking that an extensible bytecode model would be the solution
>> for
>>> these kinds of things. without the scriptengine anymore (stoked to see
>> that
>>> go away) graph providers with schema languages and other admin functions
>>> will need something to replace that. what's neat about that option is
>> that
>>> such features would no longer need to be bound to just the JVM. Python
>>> users could use the JanusGraph clean utility to drop a database or use
>>> javscript to create a graph in DSE Graph. pretty cool.
>>> 
>>> 
>>> On Mon, Apr 15, 2019 at 2:44 PM Marko Rodriguez <okrammarko@gmail.com
>> <mailto:okrammarko@gmail.com <ma...@gmail.com>>>
>>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I believe there will only be two protocols in TP4.
>>>> 
>>>>       1. The VM communication protocol. (Rexster)
>>>>       2. The data serialization protocol. (Frames)
>>>> 
>>>> [VM COMMUNICATION PROTOCOL]
>>>> 
>>>>       1. Register bytecode —returns—> bytecode.
>>>>       2. Submit bytecode —returns—> iterator of traversers.
>>>>       3. Unregister bytecode source —returns—> void
>>>> 
>>>> Here is a trippy idea. These operations are simply bytecode.
>>>> 
>>>>       1. [[register,[bytecode]]] —returns—> single traverser
>> referencing
>>>> bytecode.
>>>>       2. [[submit, [bytecode]]] —returns—> many traversers referencing
>>>> primitives.
>>>>       3. [[unregister, [bytecode]]] —returns —> no traversers.
>>>> 
>>>> Thus, THE ONLY THING YOU SEND TO THE TP4 VM IS BYTECODE and THE ONLY
>> THING
>>>> RETURNED IS ZERO OR MORE TRAVERSERS!
>>>> 
>>>> Now, think about JanusGraph. It has database operations such as create
>>>> index, create schema, drop graph, etc. These are just custom
>> instructions
>>>> in the bytecode of submit.
>>>> 
>>>>       [[submit, [[jg:createIndex,people-idx,person]]]
>>>> 
>>>> A JaunusGraph strategy will know what to do with that instruction and a
>>>> traverser can be returned. Traverser.of(“SUCCESS”). And there you have,
>>>> just like processing instructions are extended via namespaced
>> instructions
>>>> and strategies, so are server instructions. Providers have an extensible
>>>> framework to support all their custom operations because, in the end,
>> its
>>>> just bytecode, strategies, and resultant traversers! (everything is the
>>>> same).
>>>> 
>>>> Next, in order to send bytecode and get back traversers ‘over the wire',
>>>> there needs to be a serialization specification.
>>>> 
>>>> [DATA SERIALIZATION PROTOCOL]
>>>> 
>>>>       1. I don’t know much about GraphBinary, but I believe its this
>>>> without complex types.
>>>>               - Why?
>>>>                       - bytecode is primitive.
>>>>                       - traversers are primitive (as they can’t
>>>> reference complex types — see other [DISCUSS] from today).
>>>> 
>>>> 
>>>> Thoughts?,
>>>> Marko.
>>>> 
>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>> http://rredux.com/ <http://rredux.com/>>>

Re: [DISCUSS] The Two Protocols of TP4

Posted by Stephen Mallette <sp...@gmail.com>.

>  Thus, when you Machine.register(Bytecode) you are sending over the
source instructions, having them processed at the TP4 VM and then all
subsequent submits() with the same source instruction header will use the
“pre-compiled” source bytecode cached in the TP4 VM. g.close() basically
does Machine.unregister().

hmm - it sounds like supporting the vm protocol requires a session. like
each "g" from a client needs to hold state on the server between requests.
or am i thinking about it too concretely and this protocol is more of an
abstraction of what's happening?


On Tue, Apr 16, 2019 at 1:58 PM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi,
>
> > i get the "submit" part but could you explain the "register" and
> > "unregister" parts (referenced in another post somewhere perhaps)?
>
> These three methods are from the Machine API.
>
>
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> >
>
> Bytecode is composed of two sets of instructions.
>         - source instructions
>         - operation instructions
>
> source instructions are withProcessor(), withStructure(), withStrategy(),
> etc.
> operation instructions are out(), in(), count(), where(), etc.
>
> The source instructions are expensive to execute. Why? — when you evaluate
> a withStructure(), you are creating a connection to the database. When you
> evaluate a withStrategy(), you are sorting strategies. It is for this
> reason that we have the concept of a TraversalSource in TP3 that does all
> that “setup stuff” once and only once for each g. The reason we tell people
> to not do graph.traversal().V(), but instead g = graph.traversal(). Once
> you have ‘g’, you can then spawn as many traversals as you want off that it
> without incurring the cost of re-processing the source instructions again.
>
> In TP4, there is no state in Gremlin’s TraversalSource. Gremlin doesn’t
> know about databases, processors, strategy compilation, etc. Thus, when you
> Machine.register(Bytecode) you are sending over the source instructions,
> having them processed at the TP4 VM and then all subsequent submits() with
> the same source instruction header will use the “pre-compiled” source
> bytecode cached in the TP4 VM. g.close() basically does
> Machine.unregister().
>
>
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112
> >
>
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116
> <
> https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116
> >
>
> In short, we have just offloaded the TP3 TraversalSource work to TP4
> Machine.
>
> HTH,
> Marko.
>
> P.S. I don’t like the term “source instructions.” I’m thinking of calling
> them “meta instructions” or “setup instructions” or “staging instructions’
> … ?
>
>
>
>
>
> >
> > regarding this:
> >
> >> just like processing instructions are extended via namespaced
> > instructions and strategies, so are server instructions
> >
> > i was thinking that an extensible bytecode model would be the solution
> for
> > these kinds of things. without the scriptengine anymore (stoked to see
> that
> > go away) graph providers with schema languages and other admin functions
> > will need something to replace that. what's neat about that option is
> that
> > such features would no longer need to be bound to just the JVM. Python
> > users could use the JanusGraph clean utility to drop a database or use
> > javscript to create a graph in DSE Graph. pretty cool.
> >
> >
> > On Mon, Apr 15, 2019 at 2:44 PM Marko Rodriguez <okrammarko@gmail.com
> <ma...@gmail.com>>
> > wrote:
> >
> >> Hello,
> >>
> >> I believe there will only be two protocols in TP4.
> >>
> >>        1. The VM communication protocol. (Rexster)
> >>        2. The data serialization protocol. (Frames)
> >>
> >> [VM COMMUNICATION PROTOCOL]
> >>
> >>        1. Register bytecode —returns—> bytecode.
> >>        2. Submit bytecode —returns—> iterator of traversers.
> >>        3. Unregister bytecode source —returns—> void
> >>
> >> Here is a trippy idea. These operations are simply bytecode.
> >>
> >>        1. [[register,[bytecode]]] —returns—> single traverser
> referencing
> >> bytecode.
> >>        2. [[submit, [bytecode]]] —returns—> many traversers referencing
> >> primitives.
> >>        3. [[unregister, [bytecode]]] —returns —> no traversers.
> >>
> >> Thus, THE ONLY THING YOU SEND TO THE TP4 VM IS BYTECODE and THE ONLY
> THING
> >> RETURNED IS ZERO OR MORE TRAVERSERS!
> >>
> >> Now, think about JanusGraph. It has database operations such as create
> >> index, create schema, drop graph, etc. These are just custom
> instructions
> >> in the bytecode of submit.
> >>
> >>        [[submit, [[jg:createIndex,people-idx,person]]]
> >>
> >> A JaunusGraph strategy will know what to do with that instruction and a
> >> traverser can be returned. Traverser.of(“SUCCESS”). And there you have,
> >> just like processing instructions are extended via namespaced
> instructions
> >> and strategies, so are server instructions. Providers have an extensible
> >> framework to support all their custom operations because, in the end,
> its
> >> just bytecode, strategies, and resultant traversers! (everything is the
> >> same).
> >>
> >> Next, in order to send bytecode and get back traversers ‘over the wire',
> >> there needs to be a serialization specification.
> >>
> >> [DATA SERIALIZATION PROTOCOL]
> >>
> >>        1. I don’t know much about GraphBinary, but I believe its this
> >> without complex types.
> >>                - Why?
> >>                        - bytecode is primitive.
> >>                        - traversers are primitive (as they can’t
> >> reference complex types — see other [DISCUSS] from today).
> >>
> >>
> >> Thoughts?,
> >> Marko.
> >>
> >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>>
>
>

Re: [DISCUSS] The Two Protocols of TP4

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi,

> i get the "submit" part but could you explain the "register" and
> "unregister" parts (referenced in another post somewhere perhaps)?

These three methods are from the Machine API.

	https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java <https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java>

Bytecode is composed of two sets of instructions.
	- source instructions
	- operation instructions

source instructions are withProcessor(), withStructure(), withStrategy(), etc.
operation instructions are out(), in(), count(), where(), etc.

The source instructions are expensive to execute. Why? — when you evaluate a withStructure(), you are creating a connection to the database. When you evaluate a withStrategy(), you are sorting strategies. It is for this reason that we have the concept of a TraversalSource in TP3 that does all that “setup stuff” once and only once for each g. The reason we tell people to not do graph.traversal().V(), but instead g = graph.traversal(). Once you have ‘g’, you can then spawn as many traversals as you want off that it without incurring the cost of re-processing the source instructions again.

In TP4, there is no state in Gremlin’s TraversalSource. Gremlin doesn’t know about databases, processors, strategy compilation, etc. Thus, when you Machine.register(Bytecode) you are sending over the source instructions, having them processed at the TP4 VM and then all subsequent submits() with the same source instruction header will use the “pre-compiled” source bytecode cached in the TP4 VM. g.close() basically does Machine.unregister().
	
	https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112 <https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L107-L112>
	https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116 <https://github.com/apache/tinkerpop/blob/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L114-L116>

In short, we have just offloaded the TP3 TraversalSource work to TP4 Machine.

HTH,
Marko.

P.S. I don’t like the term “source instructions.” I’m thinking of calling them “meta instructions” or “setup instructions” or “staging instructions’ … ?





> 
> regarding this:
> 
>> just like processing instructions are extended via namespaced
> instructions and strategies, so are server instructions
> 
> i was thinking that an extensible bytecode model would be the solution for
> these kinds of things. without the scriptengine anymore (stoked to see that
> go away) graph providers with schema languages and other admin functions
> will need something to replace that. what's neat about that option is that
> such features would no longer need to be bound to just the JVM. Python
> users could use the JanusGraph clean utility to drop a database or use
> javscript to create a graph in DSE Graph. pretty cool.
> 
> 
> On Mon, Apr 15, 2019 at 2:44 PM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>>
> wrote:
> 
>> Hello,
>> 
>> I believe there will only be two protocols in TP4.
>> 
>>        1. The VM communication protocol. (Rexster)
>>        2. The data serialization protocol. (Frames)
>> 
>> [VM COMMUNICATION PROTOCOL]
>> 
>>        1. Register bytecode —returns—> bytecode.
>>        2. Submit bytecode —returns—> iterator of traversers.
>>        3. Unregister bytecode source —returns—> void
>> 
>> Here is a trippy idea. These operations are simply bytecode.
>> 
>>        1. [[register,[bytecode]]] —returns—> single traverser referencing
>> bytecode.
>>        2. [[submit, [bytecode]]] —returns—> many traversers referencing
>> primitives.
>>        3. [[unregister, [bytecode]]] —returns —> no traversers.
>> 
>> Thus, THE ONLY THING YOU SEND TO THE TP4 VM IS BYTECODE and THE ONLY THING
>> RETURNED IS ZERO OR MORE TRAVERSERS!
>> 
>> Now, think about JanusGraph. It has database operations such as create
>> index, create schema, drop graph, etc. These are just custom instructions
>> in the bytecode of submit.
>> 
>>        [[submit, [[jg:createIndex,people-idx,person]]]
>> 
>> A JaunusGraph strategy will know what to do with that instruction and a
>> traverser can be returned. Traverser.of(“SUCCESS”). And there you have,
>> just like processing instructions are extended via namespaced instructions
>> and strategies, so are server instructions. Providers have an extensible
>> framework to support all their custom operations because, in the end, its
>> just bytecode, strategies, and resultant traversers! (everything is the
>> same).
>> 
>> Next, in order to send bytecode and get back traversers ‘over the wire',
>> there needs to be a serialization specification.
>> 
>> [DATA SERIALIZATION PROTOCOL]
>> 
>>        1. I don’t know much about GraphBinary, but I believe its this
>> without complex types.
>>                - Why?
>>                        - bytecode is primitive.
>>                        - traversers are primitive (as they can’t
>> reference complex types — see other [DISCUSS] from today).
>> 
>> 
>> Thoughts?,
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>

Re: [DISCUSS] The Two Protocols of TP4

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi,

> i was thinking that an extensible bytecode model would be the solution for
> these kinds of things. without the scriptengine anymore (stoked to see that
> go away) graph providers with schema languages and other admin functions
> will need something to replace that. what's neat about that option is that
> such features would no longer need to be bound to just the JVM. Python
> users could use the JanusGraph clean utility to drop a database or use
> javscript to create a graph in DSE Graph. pretty cool.

Exactly!

However, lets say some provider decides they want to support ScriptEngine.

[[submit, [ex:script, gremlin-groovy, g.V.out.name]]] 

As you note, extensible bytecode will make it so that seemingly disparate operations all use the same “bytecode protocol” pattern. And you just made me realize the benefit of that for all the language drivers. Not only is our serialization protocol going to be dead simple (always primitives), but also our communication protocol (always bytecode->traversers) as well. Gremlin-Brainfuck might just be a reality! [https://en.wikipedia.org/wiki/Brainfuck <https://en.wikipedia.org/wiki/Brainfuck>]

	- processor execution
	- database operations
	- server status inquiry
	- HDFS file system management
	- ...

For the last one:

	[[submit, [hadoop:hdfs, head -10 /data.txt]]]

That returns Iterator<Traverser<String>>.

Its as if Strategies are like “server plugins.” If you make namespaced instructions with a corresponding Strategy that can handle those instructions, then you are basically communicating with a “plugin” server-side RPC-style.

Skys the limit,
Marko.

http://rredux.com

Re: [DISCUSS] The Two Protocols of TP4

Posted by Stephen Mallette <sp...@gmail.com>.

sorry - i don't follow the nature of the vm communication protocol:

        1. Register bytecode —returns—> bytecode.
        2. Submit bytecode —returns—> iterator of traversers.
        3. Unregister bytecode source —returns—> void

i get the "submit" part but could you explain the "register" and
"unregister" parts (referenced in another post somewhere perhaps)?

regarding this:

>  just like processing instructions are extended via namespaced
instructions and strategies, so are server instructions

i was thinking that an extensible bytecode model would be the solution for
these kinds of things. without the scriptengine anymore (stoked to see that
go away) graph providers with schema languages and other admin functions
will need something to replace that. what's neat about that option is that
such features would no longer need to be bound to just the JVM. Python
users could use the JanusGraph clean utility to drop a database or use
javscript to create a graph in DSE Graph. pretty cool.


On Mon, Apr 15, 2019 at 2:44 PM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello,
>
> I believe there will only be two protocols in TP4.
>
>         1. The VM communication protocol. (Rexster)
>         2. The data serialization protocol. (Frames)
>
> [VM COMMUNICATION PROTOCOL]
>
>         1. Register bytecode —returns—> bytecode.
>         2. Submit bytecode —returns—> iterator of traversers.
>         3. Unregister bytecode source —returns—> void
>
> Here is a trippy idea. These operations are simply bytecode.
>
>         1. [[register,[bytecode]]] —returns—> single traverser referencing
> bytecode.
>         2. [[submit, [bytecode]]] —returns—> many traversers referencing
> primitives.
>         3. [[unregister, [bytecode]]] —returns —> no traversers.
>
> Thus, THE ONLY THING YOU SEND TO THE TP4 VM IS BYTECODE and THE ONLY THING
> RETURNED IS ZERO OR MORE TRAVERSERS!
>
> Now, think about JanusGraph. It has database operations such as create
> index, create schema, drop graph, etc. These are just custom instructions
> in the bytecode of submit.
>
>         [[submit, [[jg:createIndex,people-idx,person]]]
>
> A JaunusGraph strategy will know what to do with that instruction and a
> traverser can be returned. Traverser.of(“SUCCESS”). And there you have,
> just like processing instructions are extended via namespaced instructions
> and strategies, so are server instructions. Providers have an extensible
> framework to support all their custom operations because, in the end, its
> just bytecode, strategies, and resultant traversers! (everything is the
> same).
>
> Next, in order to send bytecode and get back traversers ‘over the wire',
> there needs to be a serialization specification.
>
> [DATA SERIALIZATION PROTOCOL]
>
>         1. I don’t know much about GraphBinary, but I believe its this
> without complex types.
>                 - Why?
>                         - bytecode is primitive.
>                         - traversers are primitive (as they can’t
> reference complex types — see other [DISCUSS] from today).
>
>
> Thoughts?,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
>