You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2019/03/23 16:25:19 UTC

What is the fundamental bytecode for TP4?

Hello,

As you know, one of the major objectives of TP4 is to generalize the virtual machine in order to support any data structure (not just graph).

Here is an idea that Kuppitz and I batted around yesterday and I spent this morning implementing on the tp4/ branch. 

From the Stream Ring Theory paper [https://zenodo.org/record/2565243 <https://zenodo.org/record/2565243>], we know that universal computation is possible with branch, initial, map, flatmap, filter, reduce stream-based functions. If this is the case, why not make those instructions the TP4 VM instruction set. 

If 

arg = constant | bytecode | method call, 

then the general pattern for each instruction type is:

[branch, (arg, bytecode)*]
[initial, arg]
[map, arg]
[flatmap, arg]
[filter, ?predicate, arg]
[reduce, operator, arg]
	
Let this be called the “core instruction set."

Now check this out:

g.inject(7L).choose(is(7L), incr()).sum()
[initial(7), branch([filter(eq,7)],[map(number::add,1)]), reduce(sum,0)]

g.inject(Map.of("name", "marko", "age", 29)).hasKey(regex("[a].*[e]")).has("name", "marko").value("age");
[initial({age=29, name=marko}), filter([flatmap(map::keys), filter(regex,[a].*[e])]), filter([map(map::get,name), filter(eq,marko)]), map(map::get,age)]

These core bytecode chunks currently execute on Pipes and Beam processors as expected.

Pretty trippy eh? 

Now the beautiful thing about this is:

	1. Implementing a TP4 VM is trivial. All you have to do is support 6 instruction types.
		- You could rip out a TP4 VM implementation in 1-2 days time.
		- We can create a foundational C#, Python, C/C++, etc. TP4 VM implementation.
			- this foundation can then be evolved over time at our leisure. (see next point)
	2. More advanced TP4 VMs will compile the the core bytecode to a TP4 VM-native bytecode.
		- This is just like Java’s JIT compiler. For example, the core instruction:
	  filter([map(dictionary::get,name), filter(eq,marko)])
		is compiled to the TP4-Java instruction:
	  has(name,marko)
		- Every processor must be able to work with core bytecode, but can support VM native instructions such as has(), is(), path(), loops(), groupCount(), etc.
		- These instructions automatically work for all integrating processors (e.g. Pipes, Beam, Akka — on the TP4-Java VM).
			- these higher-level instructions don’t require any updates to the processors as these are still (abstractly) filter, flatmap, reduce, etc. functions.
	3. Core bytecode is as data agnostic as you can possibly get.
		- Data structures are accessed via method call references — e.g. map::keys, list::get, vertex::outEdges, etc.
			- Adding new data structures is simply a matter of adding new datatypes.
		- The TP4 VM can be used as a general purpose, universal stream-based VM.

Here is the conceptual mapping between Java and TP4 terminology:

Java sourcecode <=> Gremlin traversal
Java bytecode <=> Core bytecode
JIT trees <=> TP4-Java-native bytecode
Machine code <=> Processor execution plan

Its a pretty intense move and all the kinks haven’t been fully worked out, but its definitely something to consider.

Your questions and comments are welcome.

Take care,
Marko.

http://rredux.com <http://rredux.com/>

Re: [TinkerPop] What is the fundamental bytecode for TP4?

Posted by Marko Rodriguez <ok...@gmail.com>.

Hello,

> (As in SQL to "guide" (force?) you, then PL/SQL or TSQL or UDFs, etc.)  The core should be simple, but not too simple, and avoid redundancy.


If you look at how I currently have it set up, we have “core instruction set” and “common instruction set.” Common is your standard count, group, sum, repeat, etc (~20 instructions). Core is only 6 instructions — branch, initial, map, flatmap, filter, and reduce. Every time an instruction is added to common, the respective core instruction is also added. The test suite for Pipes uses common and the test suite for Beam uses core. By just riding out these two instruction set branches I hope to see a pattern emerge and perhaps a “common/core instruction set” can be converged upon.

And yes, redundancy is a big flaw in the TP3 instruction set. This popped up on the radar early due to the stream ring theory article and initial stabs on TP4 development. I suspect that the common instruction set will have 1/3 of the instructions of TP3.

> I kinda wonder if it just shoves the complexity down into analyzing the arguments of the instructions themselves or other contexts associated with the instructions.........maybe too early to tell. I'm just really hoping that TP4 can offer what TP3 didn't, which was an easy way to reason about complex query patterns. we promised that with "tools" in TP3 but those never really materialized (attempts were made, but nothing seemed to stick really).

The problem with TP3 reasoning is that you are reasoning at the “step” level, not at the “instruction” level. In TP3, after bytecode, the compilation goes to Pipes. This was a wrong move. It meant that we had to embed one execution engine (Pipes) into another (Spark, e.g.). In TP4, we compile from bytecode to CFunctions (coefficient functions). CFunctions do not assume an execution engine. They are simply Map, FlatMap, Reduce, Initial, Branch, and Filter functions (stateless functions). It is then up to the execution engine to coordinate these functions accordingly. Thus, the strategy reasoning in TP3 was awkward because you had to work at manipulating methods/fields on Pipe steps (i.e. object reasoning). In TP4, you manipulate [op,arg*]-instructions (i.e. primitive array reasoning).

I have not flushed out strategies to any great extent in TP4, but I believe they will be easier to write than in TP3. However, I sorta don’t think strategies are going to go the same direction as they did in TP3. I’m having some inklings that we are not thinking about bytecode optimization in the most elegant way… 

Take care,
Marko.

http://rredux.com <http://rredux.com/>




> On Mar 30, 2019, at 10:00 AM, Ben Krug <be...@datastax.com> wrote:
> 
> As an outsider of sorts, this was my thought, too.  Supposedly 'mov' is Turing-complete, but I wouldn't want to program with just that.
> (https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf <https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf>)  
> 
> Ideally, you have a core language that guides you in how to think, model, and approach, then probably extensions for greater flexibility.
> 
> Hopefully that's the goal.
> 
> On Sat, Mar 30, 2019 at 6:15 AM Stephen Mallette <spmallette@gmail.com <ma...@gmail.com>> wrote:
> Do you/kuppitz think that the reduced/core instruction set means that complex strategy development is simplified? on the surface, less instructions sounds like it will be easier to reason about patterns when providers go to build strategies, but I'm not sure. I kinda wonder if it just shoves the complexity down into analyzing the arguments of the instructions themselves or other contexts associated with the instructions.........maybe too early to tell. I'm just really hoping that TP4 can offer what TP3 didn't, which was an easy way to reason about complex query patterns. we promised that with "tools" in TP3 but those never really materialized (attempts were made, but nothing seemed to stick really). 
> 
> On Sat, Mar 23, 2019 at 12:25 PM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>> wrote:
> Hello,
> 
> As you know, one of the major objectives of TP4 is to generalize the virtual machine in order to support any data structure (not just graph).
> 
> Here is an idea that Kuppitz and I batted around yesterday and I spent this morning implementing on the tp4/ branch. 
> 
> From the Stream Ring Theory paper [https://zenodo.org/record/2565243 <https://urldefense.proofpoint.com/v2/url?u=https-3A__zenodo.org_record_2565243&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=FjRysuDd08uia7KALFxQ_-iXKg2cpK4E3xJLlo1XxGM&e=>], we know that universal computation is possible with branch, initial, map, flatmap, filter, reduce stream-based functions. If this is the case, why not make those instructions the TP4 VM instruction set. 
> 
> If 
> 
> arg = constant | bytecode | method call, 
> 
> then the general pattern for each instruction type is:
> 
> [branch, (arg, bytecode)*]
> [initial, arg]
> [map, arg]
> [flatmap, arg]
> [filter, ?predicate, arg]
> [reduce, operator, arg]
> 	
> Let this be called the “core instruction set."
> 
> Now check this out:
> 
> g.inject(7L).choose(is(7L), incr()).sum()
> [initial(7), branch([filter(eq,7)],[map(number::add,1)]), reduce(sum,0)]
> 
> g.inject(Map.of("name", "marko", "age", 29)).hasKey(regex("[a].*[e]")).has("name", "marko").value("age");
> [initial({age=29, name=marko}), filter([flatmap(map::keys), filter(regex,[a].*[e])]), filter([map(map::get,name), filter(eq,marko)]), map(map::get,age)]
> 
> These core bytecode chunks currently execute on Pipes and Beam processors as expected.
> 
> Pretty trippy eh? 
> 
> Now the beautiful thing about this is:
> 
> 	1. Implementing a TP4 VM is trivial. All you have to do is support 6 instruction types.
> 		- You could rip out a TP4 VM implementation in 1-2 days time.
> 		- We can create a foundational C#, Python, C/C++, etc. TP4 VM implementation.
> 			- this foundation can then be evolved over time at our leisure. (see next point)
> 	2. More advanced TP4 VMs will compile the the core bytecode to a TP4 VM-native bytecode.
> 		- This is just like Java’s JIT compiler. For example, the core instruction:
> 	  filter([map(dictionary::get,name), filter(eq,marko)])
> 		is compiled to the TP4-Java instruction:
> 	  has(name,marko)
> 		- Every processor must be able to work with core bytecode, but can support VM native instructions such as has(), is(), path(), loops(), groupCount(), etc.
> 		- These instructions automatically work for all integrating processors (e.g. Pipes, Beam, Akka — on the TP4-Java VM).
> 			- these higher-level instructions don’t require any updates to the processors as these are still (abstractly) filter, flatmap, reduce, etc. functions.
> 	3. Core bytecode is as data agnostic as you can possibly get.
> 		- Data structures are accessed via method call references — e.g. map::keys, list::get, vertex::outEdges, etc.
> 			- Adding new data structures is simply a matter of adding new datatypes.
> 		- The TP4 VM can be used as a general purpose, universal stream-based VM.
> 
> Here is the conceptual mapping between Java and TP4 terminology:
> 
> Java sourcecode <=> Gremlin traversal
> Java bytecode <=> Core bytecode
> JIT trees <=> TP4-Java-native bytecode
> Machine code <=> Processor execution plan
> 
> Its a pretty intense move and all the kinks haven’t been fully worked out, but its definitely something to consider.
> 
> Your questions and comments are welcome.
> 
> Take care,
> Marko.
> 
> http://rredux.com <https://urldefense.proofpoint.com/v2/url?u=http-3A__rredux.com&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=NCKo6MQuVXOw10QFTbfSn56VJclaP4swHIv06ZzvyMk&e=>
> 
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com <ma...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/0C21D862-0F7A-4827-81F4-360E20E52B8F%40gmail.com <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_gremlin-2Dusers_0C21D862-2D0F7A-2D4827-2D81F4-2D360E20E52B8F-2540gmail.com-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dfooter&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=ap6jXqFQrYT_7-O2MMnH-MwXf9QXLOHWSOAdbz78kSo&e=>.
> For more options, visit https://groups.google.com/d/optout <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_optout&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=9hlis8BMO8zHRbx1X9ZOOOJvsTvh1SFkX_JEQPz-oIM&e=>.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com <ma...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CAA-H439KWqSjgzuznK-8NTbX5zGc4ovB8pyd4gZQCX-5p_9NeQ%40mail.gmail.com <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_gremlin-2Dusers_CAA-2DH439KWqSjgzuznK-2D8NTbX5zGc4ovB8pyd4gZQCX-2D5p-5F9NeQ-2540mail.gmail.com-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dfooter&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=TXcC5t31WLSNszNV-eA-M-1_6ti3P3zm-frPsNsKZ5A&e=>.
> For more options, visit https://groups.google.com/d/optout <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_optout&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=9hlis8BMO8zHRbx1X9ZOOOJvsTvh1SFkX_JEQPz-oIM&e=>.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com <ma...@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CAG4Qc%2B64MaBLUwY9PiTje3T_1PxbbEaTN%2BcrZfywadDpU07o2Q%40mail.gmail.com <https://groups.google.com/d/msgid/gremlin-users/CAG4Qc%2B64MaBLUwY9PiTje3T_1PxbbEaTN%2BcrZfywadDpU07o2Q%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.

Re: [TinkerPop] What is the fundamental bytecode for TP4?

Posted by Ben Krug <be...@datastax.com>.

As an outsider of sorts, this was my thought, too.  Supposedly 'mov' is
Turing-complete, but I wouldn't want to program with just that.
(https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf)

Ideally, you have a core language that guides you in how to think, model,
and approach, then probably extensions for greater flexibility.
(As in SQL to "guide" (force?) you, then PL/SQL or TSQL or UDFs, etc.)  The
core should be simple, but not too simple, and avoid redundancy.
Hopefully that's the goal.

On Sat, Mar 30, 2019 at 6:15 AM Stephen Mallette <sp...@gmail.com>
wrote:

> Do you/kuppitz think that the reduced/core instruction set means that
> complex strategy development is simplified? on the surface, less
> instructions sounds like it will be easier to reason about patterns when
> providers go to build strategies, but I'm not sure. I kinda wonder if it
> just shoves the complexity down into analyzing the arguments of the
> instructions themselves or other contexts associated with the
> instructions.........maybe too early to tell. I'm just really hoping that
> TP4 can offer what TP3 didn't, which was an easy way to reason about
> complex query patterns. we promised that with "tools" in TP3 but those
> never really materialized (attempts were made, but nothing seemed to stick
> really).
>
> On Sat, Mar 23, 2019 at 12:25 PM Marko Rodriguez <ok...@gmail.com>
> wrote:
>
>> Hello,
>>
>> As you know, one of the major objectives of TP4 is to generalize the
>> virtual machine in order to support any data structure (not just graph).
>>
>> Here is an idea that Kuppitz and I batted around yesterday and I spent
>> this morning implementing on the tp4/ branch.
>>
>> From the Stream Ring Theory paper [https://zenodo.org/record/2565243
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__zenodo.org_record_2565243&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=FjRysuDd08uia7KALFxQ_-iXKg2cpK4E3xJLlo1XxGM&e=>],
>> we know that universal computation is possible with branch, initial, map,
>> flatmap, filter, reduce stream-based functions. If this is the case, why
>> not make those instructions the TP4 VM instruction set.
>>
>> If
>>
>> arg = constant | bytecode | method call,
>>
>> then the general pattern for each instruction type is:
>>
>> [branch, (arg, bytecode)*]
>> [initial, arg]
>> [map, arg]
>> [flatmap, arg]
>> [filter, ?predicate, arg]
>> [reduce, operator, arg]
>>
>> Let this be called the “core instruction set."
>>
>> Now check this out:
>>
>> g.inject(7L).choose(is(7L), incr()).sum()
>> [initial(7), branch([filter(eq,7)],[map(number::add,1)]), reduce(sum,0)]
>>
>>
>> g.inject(Map.of("name", "marko", "age",
>> 29)).hasKey(regex("[a].*[e]")).has("name", "marko").value("age");
>> [initial({age=29, name=marko}), filter([flatmap(map::keys),
>> filter(regex,[a].*[e])]), filter([map(map::get,name), filter(eq,marko)]),
>> map(map::get,age)]
>>
>>
>> These core bytecode chunks currently execute on Pipes and Beam processors
>> as expected.
>>
>> Pretty trippy eh?
>>
>> Now the beautiful thing about this is:
>>
>> 1. Implementing a TP4 VM is trivial. All you have to do is support 6
>> instruction types.
>> - You could rip out a TP4 VM implementation in 1-2 days time.
>> - We can create a foundational C#, Python, C/C++, etc. TP4 VM
>> implementation.
>> - this foundation can then be evolved over time at our leisure. (see next
>> point)
>> 2. More advanced TP4 VMs will compile the the core bytecode to a TP4
>> VM-native bytecode.
>> - This is just like Java’s JIT compiler. For example, the core
>> instruction:
>>   filter([map(dictionary::get,name), filter(eq,marko)])
>> is compiled to the TP4-Java instruction:
>>   has(name,marko)
>> - Every processor must be able to work with core bytecode, but can
>> support VM native instructions such as has(), is(), path(), loops(),
>> groupCount(), etc.
>> - These instructions automatically work for all integrating processors
>> (e.g. Pipes, Beam, Akka — on the TP4-Java VM).
>> - these higher-level instructions don’t require any updates to the
>> processors as these are still (abstractly) filter, flatmap, reduce, etc.
>> functions.
>> 3. Core bytecode is as data agnostic as you can possibly get.
>> - Data structures are accessed via method call references — e.g.
>> map::keys, list::get, vertex::outEdges, etc.
>> - Adding new data structures is simply a matter of adding new datatypes.
>> - The TP4 VM can be used as a general purpose, universal stream-based VM.
>>
>> Here is the conceptual mapping between Java and TP4 terminology:
>>
>> Java sourcecode <=> Gremlin traversal
>> Java bytecode <=> Core bytecode
>> JIT trees <=> TP4-Java-native bytecode
>> Machine code <=> Processor execution plan
>>
>>
>> Its a pretty intense move and all the kinks haven’t been fully worked
>> out, but its definitely something to consider.
>>
>> Your questions and comments are welcome.
>>
>> Take care,
>> Marko.
>>
>> http://rredux.com
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__rredux.com&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=NCKo6MQuVXOw10QFTbfSn56VJclaP4swHIv06ZzvyMk&e=>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Gremlin-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to gremlin-users+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/gremlin-users/0C21D862-0F7A-4827-81F4-360E20E52B8F%40gmail.com
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_gremlin-2Dusers_0C21D862-2D0F7A-2D4827-2D81F4-2D360E20E52B8F-2540gmail.com-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dfooter&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=ap6jXqFQrYT_7-O2MMnH-MwXf9QXLOHWSOAdbz78kSo&e=>
>> .
>> For more options, visit https://groups.google.com/d/optout
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_optout&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=9hlis8BMO8zHRbx1X9ZOOOJvsTvh1SFkX_JEQPz-oIM&e=>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gremlin-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/CAA-H439KWqSjgzuznK-8NTbX5zGc4ovB8pyd4gZQCX-5p_9NeQ%40mail.gmail.com
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_msgid_gremlin-2Dusers_CAA-2DH439KWqSjgzuznK-2D8NTbX5zGc4ovB8pyd4gZQCX-2D5p-5F9NeQ-2540mail.gmail.com-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dfooter&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=TXcC5t31WLSNszNV-eA-M-1_6ti3P3zm-frPsNsKZ5A&e=>
> .
> For more options, visit https://groups.google.com/d/optout
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_optout&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=8mY3BASR-FKCAvAAO0gVH_vrV4YhGMLPVQHXpsTOn5Q&m=jE56FLMQGfyojkBEcgYcsaR_DGhedAcOxGu0ock5_Ko&s=9hlis8BMO8zHRbx1X9ZOOOJvsTvh1SFkX_JEQPz-oIM&e=>
> .
>

Re: [TinkerPop] What is the fundamental bytecode for TP4?

Posted by Stephen Mallette <sp...@gmail.com>.

Do you/kuppitz think that the reduced/core instruction set means that
complex strategy development is simplified? on the surface, less
instructions sounds like it will be easier to reason about patterns when
providers go to build strategies, but I'm not sure. I kinda wonder if it
just shoves the complexity down into analyzing the arguments of the
instructions themselves or other contexts associated with the
instructions.........maybe too early to tell. I'm just really hoping that
TP4 can offer what TP3 didn't, which was an easy way to reason about
complex query patterns. we promised that with "tools" in TP3 but those
never really materialized (attempts were made, but nothing seemed to stick
really).

On Sat, Mar 23, 2019 at 12:25 PM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello,
>
> As you know, one of the major objectives of TP4 is to generalize the
> virtual machine in order to support any data structure (not just graph).
>
> Here is an idea that Kuppitz and I batted around yesterday and I spent
> this morning implementing on the tp4/ branch.
>
> From the Stream Ring Theory paper [https://zenodo.org/record/2565243], we
> know that universal computation is possible with branch, initial, map,
> flatmap, filter, reduce stream-based functions. If this is the case, why
> not make those instructions the TP4 VM instruction set.
>
> If
>
> arg = constant | bytecode | method call,
>
> then the general pattern for each instruction type is:
>
> [branch, (arg, bytecode)*]
> [initial, arg]
> [map, arg]
> [flatmap, arg]
> [filter, ?predicate, arg]
> [reduce, operator, arg]
>
> Let this be called the “core instruction set."
>
> Now check this out:
>
> g.inject(7L).choose(is(7L), incr()).sum()
> [initial(7), branch([filter(eq,7)],[map(number::add,1)]), reduce(sum,0)]
>
>
> g.inject(Map.of("name", "marko", "age",
> 29)).hasKey(regex("[a].*[e]")).has("name", "marko").value("age");
> [initial({age=29, name=marko}), filter([flatmap(map::keys),
> filter(regex,[a].*[e])]), filter([map(map::get,name), filter(eq,marko)]),
> map(map::get,age)]
>
>
> These core bytecode chunks currently execute on Pipes and Beam processors
> as expected.
>
> Pretty trippy eh?
>
> Now the beautiful thing about this is:
>
> 1. Implementing a TP4 VM is trivial. All you have to do is support 6
> instruction types.
> - You could rip out a TP4 VM implementation in 1-2 days time.
> - We can create a foundational C#, Python, C/C++, etc. TP4 VM
> implementation.
> - this foundation can then be evolved over time at our leisure. (see next
> point)
> 2. More advanced TP4 VMs will compile the the core bytecode to a TP4
> VM-native bytecode.
> - This is just like Java’s JIT compiler. For example, the core instruction:
>   filter([map(dictionary::get,name), filter(eq,marko)])
> is compiled to the TP4-Java instruction:
>   has(name,marko)
> - Every processor must be able to work with core bytecode, but can support
> VM native instructions such as has(), is(), path(), loops(), groupCount(),
> etc.
> - These instructions automatically work for all integrating processors
> (e.g. Pipes, Beam, Akka — on the TP4-Java VM).
> - these higher-level instructions don’t require any updates to the
> processors as these are still (abstractly) filter, flatmap, reduce, etc.
> functions.
> 3. Core bytecode is as data agnostic as you can possibly get.
> - Data structures are accessed via method call references — e.g.
> map::keys, list::get, vertex::outEdges, etc.
> - Adding new data structures is simply a matter of adding new datatypes.
> - The TP4 VM can be used as a general purpose, universal stream-based VM.
>
> Here is the conceptual mapping between Java and TP4 terminology:
>
> Java sourcecode <=> Gremlin traversal
> Java bytecode <=> Core bytecode
> JIT trees <=> TP4-Java-native bytecode
> Machine code <=> Processor execution plan
>
>
> Its a pretty intense move and all the kinks haven’t been fully worked out,
> but its definitely something to consider.
>
> Your questions and comments are welcome.
>
> Take care,
> Marko.
>
> http://rredux.com
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gremlin-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/0C21D862-0F7A-4827-81F4-360E20E52B8F%40gmail.com
> <https://groups.google.com/d/msgid/gremlin-users/0C21D862-0F7A-4827-81F4-360E20E52B8F%40gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>