You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2019/04/15 12:06:30 UTC

[DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Hello,

I have a consolidated approach to handling data structures in TP4. I would appreciate any feedback you many have.

	1. Every object processed by TinkerPop has a TinkerPop-specific type.
		- TLong, TInteger, TString, TMap, TVertex, TEdge, TPath, TList, …
		- BENEFIT #1: A universal type system will protect us from language platform peculiarities (e.g. Python long vs Java long).
		- BENEFIT #2: The serialization format is constrained and consistent across all languages platforms. (no more coming across a MySpecialClass).
	2. All primitive T-type data can be directly access via get().
		- TBoolean.get() -> java.lang.Boolean | System.Boolean | ...
		- TLong.get() -> java.lang.Long | System.Int64 | ...
		- TString.get() -> java.lang.String | System.String | …
		- TList.get() -> java.lang.ArrayList | .. // can only contain primitives
		- TMap.get() -> java.lang.LinkedHashMap | .. // can only contain primitives
		- ...
	3. All complex T-types have no methods! (except those afforded by Object)
		- TVertex: no accessible methods.
		- TEdge: no accessible methods.
		- TRow: no accessible methods.
		- TDocument: no accessible methods.
		- TDocumentArray: no accessible methods. // a document list field that can contain complex objects
		- ...

REQUIREMENT #1: We need to be able to support multiple graphdbs in the same query.
		- e.g., read from JanusGraph and write to Neo4j.
REQUIREMENT #2: We need to make sure complex objects can not be queried client-side for properties/edges/etc. data.
		- e.g., vertices are universally assumed to be “detached."
REQUIREMENT #3: We no longer want to maintain a structure test suite. Operational semantics should be verified via Bytecode -> Processor/Structure.
		- i.e., the only way to read/write vertices is via Bytecode as complex T-types don’t have APIs.
REQUIREMENT #4: We should support other database data structures besides graph.
		- e.g., reading from MySQL and writing to JanusGraph.

———

Assume the following TraversalSource:

g.withStructure(JanusGraphStructure.class, config1).
  withStructure(Neo4jStructure.class, conflg2)

Now, assume the following traversal fragment:

	outE(’knows’).has(’stars’,5).inV()

 This would initially be written to Bytecode as:

	[[outE,knows],[has,stars,5],[inV]]

A decoration strategy realizes that there are two structures registered in the Bytecode source instructions and would rewrite the above as:

	[choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]

A JanusGraph strategy would rewrite this as:

	[choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]

A Neo4j strategy would rewrite this as:

	[choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
	
A finalization strategy would rewrite this as:

	[choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]

Now, when a TVertex gets to this CFunction, it will check its type, if its a JanusVertex, it goes down the JanusGraph-specific instruction branch. If the type is Neo4jVertex, it goes down the Neo4j-specific instruction branch.

	REQUIREMENT #1 SOLVED

The last instruction of the root bytecode can not return a complex object. If so, an exception is thrown. g.V() is illegal. g.V().id() is legal. Complex objects do not exist outside the TP4-VM. Only primitives can leave the VM-client barrier. If you want vertex property data (e.g.), you have to access it and return it within the traversal — e.g., g.V().valueMap().
	BENEFIT #1: Language variant implementations are simple. Just primitives.
	BENEFIT #2: The serialization specification is simple. Just primitives. (also, note that Bytecode is just a TList of primitives! — though TBytecode will exist.)
	BENEFIT #3: The concept of a “DetachedVertex” is universally assumed.

	REQUIREMENT #2 SOLVED

It is completely up to the structure provider to use structure-specific instructions for dealing with their particular TVertex. They will have to provide CFunction implementations for out, in, both, has, outE, inE, bothE, drop, property, value, id, label … (seems like a lot, but out/in/both could be one parameterized CFunction).
	BENEFIT #1: No more structure/ API and structure/ test suite.
	BENEFIT #2: The structure provider has full control of where the vertex data is stored (cached in memory or fetch from the db or a cut vertex or …). No assumptions are made by the TP4-VM.
	BENEFIT #3: The structure provider can safely assume their vertices will not be accessed outside the TP4-VM (outside the processor).

	REQUIREMENT #3 SOLVED

We can support TRow for relational databases. A TRow’s data is accessible via the instructions has, hasKey, value, property, id, ... The location of the data in TRow is completely up to the structure provider and its strategy analysis (if only ’name’ is accessed, then SELECT ’name’ FROM...). We can easily support TDocument for document databases. A TDocument’s data is accessible via the instructions has, hasKey, value, property, id, … A value() could return yet another TDocument (or a TDocumentArray containing TDocuments).

Supporting a new complex type is simply a function of asking: 

	“Does the TP4 VM instruction set have the requisite instruction-types (semantically) to manipulate this structure?"

We are no longer playing the language-specific object API game. We are playing the language-agnostic VM instruction game. The TP4-VM instruction set is the sole determiner of what complex objects can be processed. (i.e. what data structures can be processed without impedance mismatch).

	REQUIREMENT #4 SOLVED

———

The TP4-VM (and, in turn, Gremlin) can naturally support:

	1. Property graphs: as currently supported in TP3.
	2. RDF graphs: id() is a URI | Literal. g.V(1).value(‘foaf:name’) returns multi/meta-properties *or* g.V(1).out(‘foaf:name’) returns vertices whose id()s are xsd:string literals.
	3. Hypergraphs: inV() can return more than one vertex.
	4. Undirected graphs: in() and out() throw exceptions. Only both() works.
	5. Meta-properties: value(‘name’) can return a TVertexProperty  (a special complex object that is structure provider specific — and that is okay!).
	6. Multi-properties: value(‘name’) can return a TPropertyArray of TVertexProperty objects.

This means that the same instruction can behave differently for different structures. This is okay as there can be property graph, RDF, hypergraph, etc. test suites.

Since complex objects don’t leave the TP4-VM barrier, providers can create any complex objects they want — they just have to have corresponding strategies to create provider-unique bytecode instructions (and thus, CFunctions) for those complex objects.

Finally. there are a few of problems to work out:
	- There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]” representation. Is that bad? Perhaps not.
	- What is the nature of a TPath? Its complex, but we want to return it.
	- g.V().id() on an RDF graph can return a URI. Is a URI “simple”? No, the set of simple types should never grow!…. thus, URI => String. Is that wack?
	- Do we add g.R() and g.D() to Gremlin to type-support TRow and TDocument objects. g.V() would be weird :( … Hmmmm?
		- However, there are only so many data structures……. or are there? TMatrix, TXML, …. whoa.

Thanks for reading,
Marko.

http://rredux.com <http://rredux.com/>

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Posted by Stephen Mallette <sp...@gmail.com>.

>
> > I'd also wonder about how we treat subgraph() and tree()? could those be
> a
> > List<TPath> somehow??
>
> Yes, Tree is List<TPath>. Subgraph….hmmmm….shooting from the hip: you
> don’t get back a graph, its stored in:
>
> g.withProcessor(TinkerGraphStructure.class, config1)
>
> That is, the subgraph is written to one of the registered structures. You
> can then query it like any other registered structure. Remember, in TP4, we
> will support an arbitrary number of structures associated with a Bytecode
> source.
>

I just thought of something interesting - if we can subgraph() into a
TinkerGraph that way, then the opposite is true as well, right? like, you
could pull a subgraph(), do some mutations to it locally, then later write
some Gremlin to merge that subgraph back to its parent as a single
transaction. i suppose the nature of a "single transaction" would be
specific to each graph provider, but still neat to think about.

On Mon, Apr 15, 2019 at 2:19 PM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello Stephen,
>
> > I'd also wonder about how we treat subgraph() and tree()? could those be
> a
> > List<TPath> somehow??
>
> Yes, Tree is List<TPath>. Subgraph….hmmmm….shooting from the hip: you
> don’t get back a graph, its stored in:
>
> g.withProcessor(TinkerGraphStructure.class, config1)
>
> That is, the subgraph is written to one of the registered structures. You
> can then query it like any other registered structure. Remember, in TP4, we
> will support an arbitrary number of structures associated with a Bytecode
> source.
>
> > isn't a URI a complex type? that list is expected to grow? maybe all
> > complex types have simple type representations?
>
> The problem with every complex type having a simple type representation is
> that the serializer will have to know about complex types (as objects).
> This is just more code for Python, JavaScript, Java, etc. to maintain. If
> the serialization format is ONLY primitives, and primitives come from a
> static set of ~10 types, then writing, testing, and maintaining serializers
> in other languages will be trivial.
>
>         Bytecode in [a nested list of primitives]
>         Traversers out [a collection of coefficient wrapped primitives]
>
> Everything communicated over the wire is primitive! Basic. (TTraverser
> will have to be primitive, where get() returns a coefficient [bulk] and
> primitive [object] pair).
>
> > sorry, if some of these questions/ideas are a bit half-cocked, but i read
> > this really fast and won't be at my laptop for the rest of the day and
> > wanted to get some thoughts out. i'm really really interested in seeing
> > this aspect of TP done "right"….
>
> No worries. Thanks for replying.
>
> Some random ideas I was having.
>
>         - TXML: Assume an XML database. out() would be the children tags.
> value() would be the tag attribute value. label() would be the tag type. In
> other words, there is a clean mapping from the instructions to XML.
>         - TMatrix: Assume a database of nxm matricies. math() instruction
> will be augmented to support matrix multiplication. A matrix is a table
> with rows and columns. We would need some nice instructions for that.
>         - TJPEG: Assume a database of graphics. Does our instruction set
> have instructions that are useful for manipulating images? Probably need
> row/column type instructions like TMatrix.
>         - TObject: Assume an object database. value() are primitive
> fields. out() is object fields. id() is unique object identifier. label()
> is object class. has() is a primitive field filter.
>         - TTimeSeries: ? I don’t know anything about time series
> databases, but the question remains…do our instructions make sense for this
> data structure?
>         - https://en.wikipedia.org/wiki/List_of_data_structures <
> https://en.wikipedia.org/wiki/List_of_data_structures>
>
> The point being. I’m trying to think of odd ball data structures and then
> trying to see if the TP4 instruction set is sufficiently general to
> encompass operations used by those structures.
>
> The beautiful thing is that providers can create as many complex types as
> they want. These types are always contained with the TP4-VM and thus
> require no changes to the serialization format and respective objects in
> the deserializing language. Imagine, some XML database out there is using
> the TP4-VM, with the XPath language compiling to TP4 bytecode, and is
> processing their XML documents in real-time (Pipes/Rx), near-time
> (Flink/Akka), or batch-time (Spark/Hadoop). The TP4-VM has a life beyond
> graph! What a wonderful asset to the entire space of data processing!
>
> …now think of the RDF community using the TP4-VM. SPARQL will be
> W3C-compilant and can execute in real-time, near-time, batch-time, etc.
> What a useful technology to adopt for your RDF triple-store. I could see
> Stardog using TP4 for their batch processing. I could see Jena or OpenRDF
> importing TP4 to provide different SPARQL execution engines to their
> triple-store providers.
>
> The TP4 virtual machine may just turn out to be a technological
> masterpiece.
>
> Marko.
>
> http://rredux.com
>
>
>
>
>
>
>
> >
> > On Mon, Apr 15, 2019 at 8:06 AM Marko Rodriguez <okrammarko@gmail.com
> <ma...@gmail.com>>
> > wrote:
> >
> >> Hello,
> >>
> >> I have a consolidated approach to handling data structures in TP4. I
> would
> >> appreciate any feedback you many have.
> >>
> >>        1. Every object processed by TinkerPop has a TinkerPop-specific
> >> type.
> >>                - TLong, TInteger, TString, TMap, TVertex, TEdge, TPath,
> >> TList, …
> >>                - BENEFIT #1: A universal type system will protect us
> from
> >> language platform peculiarities (e.g. Python long vs Java long).
> >>                - BENEFIT #2: The serialization format is constrained and
> >> consistent across all languages platforms. (no more coming across a
> >> MySpecialClass).
> >>        2. All primitive T-type data can be directly access via get().
> >>                - TBoolean.get() -> java.lang.Boolean | System.Boolean |
> >> ...
> >>                - TLong.get() -> java.lang.Long | System.Int64 | ...
> >>                - TString.get() -> java.lang.String | System.String | …
> >>                - TList.get() -> java.lang.ArrayList | .. // can only
> >> contain primitives
> >>                - TMap.get() -> java.lang.LinkedHashMap | .. // can only
> >> contain primitives
> >>                - ...
> >>        3. All complex T-types have no methods! (except those afforded by
> >> Object)
> >>                - TVertex: no accessible methods.
> >>                - TEdge: no accessible methods.
> >>                - TRow: no accessible methods.
> >>                - TDocument: no accessible methods.
> >>                - TDocumentArray: no accessible methods. // a document
> >> list field that can contain complex objects
> >>                - ...
> >>
> >> REQUIREMENT #1: We need to be able to support multiple graphdbs in the
> >> same query.
> >>                - e.g., read from JanusGraph and write to Neo4j.
> >> REQUIREMENT #2: We need to make sure complex objects can not be queried
> >> client-side for properties/edges/etc. data.
> >>                - e.g., vertices are universally assumed to be
> “detached."
> >> REQUIREMENT #3: We no longer want to maintain a structure test suite.
> >> Operational semantics should be verified via Bytecode ->
> >> Processor/Structure.
> >>                - i.e., the only way to read/write vertices is via
> >> Bytecode as complex T-types don’t have APIs.
> >> REQUIREMENT #4: We should support other database data structures besides
> >> graph.
> >>                - e.g., reading from MySQL and writing to JanusGraph.
> >>
> >> ———
> >>
> >> Assume the following TraversalSource:
> >>
> >> g.withStructure(JanusGraphStructure.class, config1).
> >>  withStructure(Neo4jStructure.class, conflg2)
> >>
> >> Now, assume the following traversal fragment:
> >>
> >>        outE(’knows’).has(’stars’,5).inV()
> >>
> >> This would initially be written to Bytecode as:
> >>
> >>        [[outE,knows],[has,stars,5],[inV]]
> >>
> >> A decoration strategy realizes that there are two structures registered
> in
> >> the Bytecode source instructions and would rewrite the above as:
> >>
> >>        [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]
> >>
> >> A JanusGraph strategy would rewrite this as:
> >>
> >>
> >>
> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]
> >>
> >> A Neo4j strategy would rewrite this as:
> >>
> >>
> >>
> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
> >>
> >> A finalization strategy would rewrite this as:
> >>
> >>
> >>
> [choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
> >>
> >> Now, when a TVertex gets to this CFunction, it will check its type, if
> its
> >> a JanusVertex, it goes down the JanusGraph-specific instruction branch.
> If
> >> the type is Neo4jVertex, it goes down the Neo4j-specific instruction
> branch.
> >>
> >>        REQUIREMENT #1 SOLVED
> >>
> >> The last instruction of the root bytecode can not return a complex
> object.
> >> If so, an exception is thrown. g.V() is illegal. g.V().id() is legal.
> >> Complex objects do not exist outside the TP4-VM. Only primitives can
> leave
> >> the VM-client barrier. If you want vertex property data (e.g.), you
> have to
> >> access it and return it within the traversal — e.g., g.V().valueMap().
> >>        BENEFIT #1: Language variant implementations are simple. Just
> >> primitives.
> >>        BENEFIT #2: The serialization specification is simple. Just
> >> primitives. (also, note that Bytecode is just a TList of primitives! —
> >> though TBytecode will exist.)
> >>        BENEFIT #3: The concept of a “DetachedVertex” is universally
> >> assumed.
> >>
> >>        REQUIREMENT #2 SOLVED
> >>
> >> It is completely up to the structure provider to use structure-specific
> >> instructions for dealing with their particular TVertex. They will have
> to
> >> provide CFunction implementations for out, in, both, has, outE, inE,
> bothE,
> >> drop, property, value, id, label … (seems like a lot, but out/in/both
> could
> >> be one parameterized CFunction).
> >>        BENEFIT #1: No more structure/ API and structure/ test suite.
> >>        BENEFIT #2: The structure provider has full control of where the
> >> vertex data is stored (cached in memory or fetch from the db or a cut
> >> vertex or …). No assumptions are made by the TP4-VM.
> >>        BENEFIT #3: The structure provider can safely assume their
> >> vertices will not be accessed outside the TP4-VM (outside the
> processor).
> >>
> >>        REQUIREMENT #3 SOLVED
> >>
> >> We can support TRow for relational databases. A TRow’s data is
> accessible
> >> via the instructions has, hasKey, value, property, id, ... The location
> of
> >> the data in TRow is completely up to the structure provider and its
> >> strategy analysis (if only ’name’ is accessed, then SELECT ’name’
> FROM...).
> >> We can easily support TDocument for document databases. A TDocument’s
> data
> >> is accessible via the instructions has, hasKey, value, property, id, … A
> >> value() could return yet another TDocument (or a TDocumentArray
> containing
> >> TDocuments).
> >>
> >> Supporting a new complex type is simply a function of asking:
> >>
> >>        “Does the TP4 VM instruction set have the requisite
> >> instruction-types (semantically) to manipulate this structure?"
> >>
> >> We are no longer playing the language-specific object API game. We are
> >> playing the language-agnostic VM instruction game. The TP4-VM
> instruction
> >> set is the sole determiner of what complex objects can be processed.
> (i.e.
> >> what data structures can be processed without impedance mismatch).
> >>
> >>        REQUIREMENT #4 SOLVED
> >>
> >> ———
> >>
> >> The TP4-VM (and, in turn, Gremlin) can naturally support:
> >>
> >>        1. Property graphs: as currently supported in TP3.
> >>        2. RDF graphs: id() is a URI | Literal. g.V(1).value(‘foaf:name’)
> >> returns multi/meta-properties *or* g.V(1).out(‘foaf:name’) returns
> vertices
> >> whose id()s are xsd:string literals.
> >>        3. Hypergraphs: inV() can return more than one vertex.
> >>        4. Undirected graphs: in() and out() throw exceptions. Only
> both()
> >> works.
> >>        5. Meta-properties: value(‘name’) can return a TVertexProperty
> (a
> >> special complex object that is structure provider specific — and that is
> >> okay!).
> >>        6. Multi-properties: value(‘name’) can return a TPropertyArray of
> >> TVertexProperty objects.
> >>
> >> This means that the same instruction can behave differently for
> different
> >> structures. This is okay as there can be property graph, RDF,
> hypergraph,
> >> etc. test suites.
> >>
> >> Since complex objects don’t leave the TP4-VM barrier, providers can
> create
> >> any complex objects they want — they just have to have corresponding
> >> strategies to create provider-unique bytecode instructions (and thus,
> >> CFunctions) for those complex objects.
> >>
> >> Finally. there are a few of problems to work out:
> >>        - There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]”
> >> representation. Is that bad? Perhaps not.
> >>        - What is the nature of a TPath? Its complex, but we want to
> >> return it.
> >>        - g.V().id() on an RDF graph can return a URI. Is a URI “simple”?
> >> No, the set of simple types should never grow!…. thus, URI => String. Is
> >> that wack?
> >>        - Do we add g.R() and g.D() to Gremlin to type-support TRow and
> >> TDocument objects. g.V() would be weird :( … Hmmmm?
> >>                - However, there are only so many data structures……. or
> >> are there? TMatrix, TXML, …. whoa.
> >>
> >> Thanks for reading,
> >> Marko.
> >>
> >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>>
>
>

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Posted by Marko Rodriguez <ok...@gmail.com>.

Hello Stephen,

> I'd also wonder about how we treat subgraph() and tree()? could those be a
> List<TPath> somehow??

Yes, Tree is List<TPath>. Subgraph….hmmmm….shooting from the hip: you don’t get back a graph, its stored in:

g.withProcessor(TinkerGraphStructure.class, config1)

That is, the subgraph is written to one of the registered structures. You can then query it like any other registered structure. Remember, in TP4, we will support an arbitrary number of structures associated with a Bytecode source.

> isn't a URI a complex type? that list is expected to grow? maybe all
> complex types have simple type representations?

The problem with every complex type having a simple type representation is that the serializer will have to know about complex types (as objects). This is just more code for Python, JavaScript, Java, etc. to maintain. If the serialization format is ONLY primitives, and primitives come from a static set of ~10 types, then writing, testing, and maintaining serializers in other languages will be trivial.

	Bytecode in [a nested list of primitives]
	Traversers out [a collection of coefficient wrapped primitives]

Everything communicated over the wire is primitive! Basic. (TTraverser will have to be primitive, where get() returns a coefficient [bulk] and primitive [object] pair).

> sorry, if some of these questions/ideas are a bit half-cocked, but i read
> this really fast and won't be at my laptop for the rest of the day and
> wanted to get some thoughts out. i'm really really interested in seeing
> this aspect of TP done "right"….

No worries. Thanks for replying.

Some random ideas I was having.

	- TXML: Assume an XML database. out() would be the children tags. value() would be the tag attribute value. label() would be the tag type. In other words, there is a clean mapping from the instructions to XML.
	- TMatrix: Assume a database of nxm matricies. math() instruction will be augmented to support matrix multiplication. A matrix is a table with rows and columns. We would need some nice instructions for that.
	- TJPEG: Assume a database of graphics. Does our instruction set have instructions that are useful for manipulating images? Probably need row/column type instructions like TMatrix.
	- TObject: Assume an object database. value() are primitive fields. out() is object fields. id() is unique object identifier. label() is object class. has() is a primitive field filter.
	- TTimeSeries: ? I don’t know anything about time series databases, but the question remains…do our instructions make sense for this data structure?
	- https://en.wikipedia.org/wiki/List_of_data_structures <https://en.wikipedia.org/wiki/List_of_data_structures>

The point being. I’m trying to think of odd ball data structures and then trying to see if the TP4 instruction set is sufficiently general to encompass operations used by those structures.

The beautiful thing is that providers can create as many complex types as they want. These types are always contained with the TP4-VM and thus require no changes to the serialization format and respective objects in the deserializing language. Imagine, some XML database out there is using the TP4-VM, with the XPath language compiling to TP4 bytecode, and is processing their XML documents in real-time (Pipes/Rx), near-time (Flink/Akka), or batch-time (Spark/Hadoop). The TP4-VM has a life beyond graph! What a wonderful asset to the entire space of data processing!

…now think of the RDF community using the TP4-VM. SPARQL will be W3C-compilant and can execute in real-time, near-time, batch-time, etc. What a useful technology to adopt for your RDF triple-store. I could see Stardog using TP4 for their batch processing. I could see Jena or OpenRDF importing TP4 to provide different SPARQL execution engines to their triple-store providers.

The TP4 virtual machine may just turn out to be a technological masterpiece.

Marko.

http://rredux.com







> 
> On Mon, Apr 15, 2019 at 8:06 AM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>>
> wrote:
> 
>> Hello,
>> 
>> I have a consolidated approach to handling data structures in TP4. I would
>> appreciate any feedback you many have.
>> 
>>        1. Every object processed by TinkerPop has a TinkerPop-specific
>> type.
>>                - TLong, TInteger, TString, TMap, TVertex, TEdge, TPath,
>> TList, …
>>                - BENEFIT #1: A universal type system will protect us from
>> language platform peculiarities (e.g. Python long vs Java long).
>>                - BENEFIT #2: The serialization format is constrained and
>> consistent across all languages platforms. (no more coming across a
>> MySpecialClass).
>>        2. All primitive T-type data can be directly access via get().
>>                - TBoolean.get() -> java.lang.Boolean | System.Boolean |
>> ...
>>                - TLong.get() -> java.lang.Long | System.Int64 | ...
>>                - TString.get() -> java.lang.String | System.String | …
>>                - TList.get() -> java.lang.ArrayList | .. // can only
>> contain primitives
>>                - TMap.get() -> java.lang.LinkedHashMap | .. // can only
>> contain primitives
>>                - ...
>>        3. All complex T-types have no methods! (except those afforded by
>> Object)
>>                - TVertex: no accessible methods.
>>                - TEdge: no accessible methods.
>>                - TRow: no accessible methods.
>>                - TDocument: no accessible methods.
>>                - TDocumentArray: no accessible methods. // a document
>> list field that can contain complex objects
>>                - ...
>> 
>> REQUIREMENT #1: We need to be able to support multiple graphdbs in the
>> same query.
>>                - e.g., read from JanusGraph and write to Neo4j.
>> REQUIREMENT #2: We need to make sure complex objects can not be queried
>> client-side for properties/edges/etc. data.
>>                - e.g., vertices are universally assumed to be “detached."
>> REQUIREMENT #3: We no longer want to maintain a structure test suite.
>> Operational semantics should be verified via Bytecode ->
>> Processor/Structure.
>>                - i.e., the only way to read/write vertices is via
>> Bytecode as complex T-types don’t have APIs.
>> REQUIREMENT #4: We should support other database data structures besides
>> graph.
>>                - e.g., reading from MySQL and writing to JanusGraph.
>> 
>> ———
>> 
>> Assume the following TraversalSource:
>> 
>> g.withStructure(JanusGraphStructure.class, config1).
>>  withStructure(Neo4jStructure.class, conflg2)
>> 
>> Now, assume the following traversal fragment:
>> 
>>        outE(’knows’).has(’stars’,5).inV()
>> 
>> This would initially be written to Bytecode as:
>> 
>>        [[outE,knows],[has,stars,5],[inV]]
>> 
>> A decoration strategy realizes that there are two structures registered in
>> the Bytecode source instructions and would rewrite the above as:
>> 
>>        [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]
>> 
>> A JanusGraph strategy would rewrite this as:
>> 
>> 
>> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]
>> 
>> A Neo4j strategy would rewrite this as:
>> 
>> 
>> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>> 
>> A finalization strategy would rewrite this as:
>> 
>> 
>> [choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>> 
>> Now, when a TVertex gets to this CFunction, it will check its type, if its
>> a JanusVertex, it goes down the JanusGraph-specific instruction branch. If
>> the type is Neo4jVertex, it goes down the Neo4j-specific instruction branch.
>> 
>>        REQUIREMENT #1 SOLVED
>> 
>> The last instruction of the root bytecode can not return a complex object.
>> If so, an exception is thrown. g.V() is illegal. g.V().id() is legal.
>> Complex objects do not exist outside the TP4-VM. Only primitives can leave
>> the VM-client barrier. If you want vertex property data (e.g.), you have to
>> access it and return it within the traversal — e.g., g.V().valueMap().
>>        BENEFIT #1: Language variant implementations are simple. Just
>> primitives.
>>        BENEFIT #2: The serialization specification is simple. Just
>> primitives. (also, note that Bytecode is just a TList of primitives! —
>> though TBytecode will exist.)
>>        BENEFIT #3: The concept of a “DetachedVertex” is universally
>> assumed.
>> 
>>        REQUIREMENT #2 SOLVED
>> 
>> It is completely up to the structure provider to use structure-specific
>> instructions for dealing with their particular TVertex. They will have to
>> provide CFunction implementations for out, in, both, has, outE, inE, bothE,
>> drop, property, value, id, label … (seems like a lot, but out/in/both could
>> be one parameterized CFunction).
>>        BENEFIT #1: No more structure/ API and structure/ test suite.
>>        BENEFIT #2: The structure provider has full control of where the
>> vertex data is stored (cached in memory or fetch from the db or a cut
>> vertex or …). No assumptions are made by the TP4-VM.
>>        BENEFIT #3: The structure provider can safely assume their
>> vertices will not be accessed outside the TP4-VM (outside the processor).
>> 
>>        REQUIREMENT #3 SOLVED
>> 
>> We can support TRow for relational databases. A TRow’s data is accessible
>> via the instructions has, hasKey, value, property, id, ... The location of
>> the data in TRow is completely up to the structure provider and its
>> strategy analysis (if only ’name’ is accessed, then SELECT ’name’ FROM...).
>> We can easily support TDocument for document databases. A TDocument’s data
>> is accessible via the instructions has, hasKey, value, property, id, … A
>> value() could return yet another TDocument (or a TDocumentArray containing
>> TDocuments).
>> 
>> Supporting a new complex type is simply a function of asking:
>> 
>>        “Does the TP4 VM instruction set have the requisite
>> instruction-types (semantically) to manipulate this structure?"
>> 
>> We are no longer playing the language-specific object API game. We are
>> playing the language-agnostic VM instruction game. The TP4-VM instruction
>> set is the sole determiner of what complex objects can be processed. (i.e.
>> what data structures can be processed without impedance mismatch).
>> 
>>        REQUIREMENT #4 SOLVED
>> 
>> ———
>> 
>> The TP4-VM (and, in turn, Gremlin) can naturally support:
>> 
>>        1. Property graphs: as currently supported in TP3.
>>        2. RDF graphs: id() is a URI | Literal. g.V(1).value(‘foaf:name’)
>> returns multi/meta-properties *or* g.V(1).out(‘foaf:name’) returns vertices
>> whose id()s are xsd:string literals.
>>        3. Hypergraphs: inV() can return more than one vertex.
>>        4. Undirected graphs: in() and out() throw exceptions. Only both()
>> works.
>>        5. Meta-properties: value(‘name’) can return a TVertexProperty  (a
>> special complex object that is structure provider specific — and that is
>> okay!).
>>        6. Multi-properties: value(‘name’) can return a TPropertyArray of
>> TVertexProperty objects.
>> 
>> This means that the same instruction can behave differently for different
>> structures. This is okay as there can be property graph, RDF, hypergraph,
>> etc. test suites.
>> 
>> Since complex objects don’t leave the TP4-VM barrier, providers can create
>> any complex objects they want — they just have to have corresponding
>> strategies to create provider-unique bytecode instructions (and thus,
>> CFunctions) for those complex objects.
>> 
>> Finally. there are a few of problems to work out:
>>        - There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]”
>> representation. Is that bad? Perhaps not.
>>        - What is the nature of a TPath? Its complex, but we want to
>> return it.
>>        - g.V().id() on an RDF graph can return a URI. Is a URI “simple”?
>> No, the set of simple types should never grow!…. thus, URI => String. Is
>> that wack?
>>        - Do we add g.R() and g.D() to Gremlin to type-support TRow and
>> TDocument objects. g.V() would be weird :( … Hmmmm?
>>                - However, there are only so many data structures……. or
>> are there? TMatrix, TXML, …. whoa.
>> 
>> Thanks for reading,
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Posted by Stephen Mallette <sp...@gmail.com>.

I think that TinkerPop specific types are good and the extension model
sounds reasonable. It sounds like it will blend right into the GraphBinary
serialization model that we have though I wonder what happens when you
start blending together different graphs and serialization extensions and
what types of conflicts will occur as a result of that (e.g. two providers
using the same serialization "identifiers"). anyway, that's more of a
serialization problem than a type problem i suppose, so perhaps a different
discussion.

>         - There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]”
representation. Is that bad? Perhaps not.

without that, users are forced to write queries the way we've been
professing that they write queries, which is good and will perhaps force
better habits. a "reference" really doesn't do much to help an application.
that said, could g.V(1) be auto-converted to a Map primitive in TP4
automatically without an explicit valueMap()? maybe that will breed bad
habits or allow things to happen that we don't like....just tossing a
thought out there.

>        - What is the nature of a TPath? Its complex, but we want to
return it.

perhaps there is again a default primitive representation in List form??
I'd also wonder about how we treat subgraph() and tree()? could those be a
List<TPath> somehow??

>        - g.V().id() on an RDF graph can return a URI. Is a URI “simple”?
No, the set of simple types should never grow!…. thus, URI => String. Is
that wack?

isn't a URI a complex type? that list is expected to grow? maybe all
complex types have simple type representations?

sorry, if some of these questions/ideas are a bit half-cocked, but i read
this really fast and won't be at my laptop for the rest of the day and
wanted to get some thoughts out. i'm really really interested in seeing
this aspect of TP done "right"....

On Mon, Apr 15, 2019 at 8:06 AM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello,
>
> I have a consolidated approach to handling data structures in TP4. I would
> appreciate any feedback you many have.
>
>         1. Every object processed by TinkerPop has a TinkerPop-specific
> type.
>                 - TLong, TInteger, TString, TMap, TVertex, TEdge, TPath,
> TList, …
>                 - BENEFIT #1: A universal type system will protect us from
> language platform peculiarities (e.g. Python long vs Java long).
>                 - BENEFIT #2: The serialization format is constrained and
> consistent across all languages platforms. (no more coming across a
> MySpecialClass).
>         2. All primitive T-type data can be directly access via get().
>                 - TBoolean.get() -> java.lang.Boolean | System.Boolean |
> ...
>                 - TLong.get() -> java.lang.Long | System.Int64 | ...
>                 - TString.get() -> java.lang.String | System.String | …
>                 - TList.get() -> java.lang.ArrayList | .. // can only
> contain primitives
>                 - TMap.get() -> java.lang.LinkedHashMap | .. // can only
> contain primitives
>                 - ...
>         3. All complex T-types have no methods! (except those afforded by
> Object)
>                 - TVertex: no accessible methods.
>                 - TEdge: no accessible methods.
>                 - TRow: no accessible methods.
>                 - TDocument: no accessible methods.
>                 - TDocumentArray: no accessible methods. // a document
> list field that can contain complex objects
>                 - ...
>
> REQUIREMENT #1: We need to be able to support multiple graphdbs in the
> same query.
>                 - e.g., read from JanusGraph and write to Neo4j.
> REQUIREMENT #2: We need to make sure complex objects can not be queried
> client-side for properties/edges/etc. data.
>                 - e.g., vertices are universally assumed to be “detached."
> REQUIREMENT #3: We no longer want to maintain a structure test suite.
> Operational semantics should be verified via Bytecode ->
> Processor/Structure.
>                 - i.e., the only way to read/write vertices is via
> Bytecode as complex T-types don’t have APIs.
> REQUIREMENT #4: We should support other database data structures besides
> graph.
>                 - e.g., reading from MySQL and writing to JanusGraph.
>
> ———
>
> Assume the following TraversalSource:
>
> g.withStructure(JanusGraphStructure.class, config1).
>   withStructure(Neo4jStructure.class, conflg2)
>
> Now, assume the following traversal fragment:
>
>         outE(’knows’).has(’stars’,5).inV()
>
>  This would initially be written to Bytecode as:
>
>         [[outE,knows],[has,stars,5],[inV]]
>
> A decoration strategy realizes that there are two structures registered in
> the Bytecode source instructions and would rewrite the above as:
>
>         [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]
>
> A JanusGraph strategy would rewrite this as:
>
>
> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]
>
> A Neo4j strategy would rewrite this as:
>
>
> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>
> A finalization strategy would rewrite this as:
>
>
> [choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>
> Now, when a TVertex gets to this CFunction, it will check its type, if its
> a JanusVertex, it goes down the JanusGraph-specific instruction branch. If
> the type is Neo4jVertex, it goes down the Neo4j-specific instruction branch.
>
>         REQUIREMENT #1 SOLVED
>
> The last instruction of the root bytecode can not return a complex object.
> If so, an exception is thrown. g.V() is illegal. g.V().id() is legal.
> Complex objects do not exist outside the TP4-VM. Only primitives can leave
> the VM-client barrier. If you want vertex property data (e.g.), you have to
> access it and return it within the traversal — e.g., g.V().valueMap().
>         BENEFIT #1: Language variant implementations are simple. Just
> primitives.
>         BENEFIT #2: The serialization specification is simple. Just
> primitives. (also, note that Bytecode is just a TList of primitives! —
> though TBytecode will exist.)
>         BENEFIT #3: The concept of a “DetachedVertex” is universally
> assumed.
>
>         REQUIREMENT #2 SOLVED
>
> It is completely up to the structure provider to use structure-specific
> instructions for dealing with their particular TVertex. They will have to
> provide CFunction implementations for out, in, both, has, outE, inE, bothE,
> drop, property, value, id, label … (seems like a lot, but out/in/both could
> be one parameterized CFunction).
>         BENEFIT #1: No more structure/ API and structure/ test suite.
>         BENEFIT #2: The structure provider has full control of where the
> vertex data is stored (cached in memory or fetch from the db or a cut
> vertex or …). No assumptions are made by the TP4-VM.
>         BENEFIT #3: The structure provider can safely assume their
> vertices will not be accessed outside the TP4-VM (outside the processor).
>
>         REQUIREMENT #3 SOLVED
>
> We can support TRow for relational databases. A TRow’s data is accessible
> via the instructions has, hasKey, value, property, id, ... The location of
> the data in TRow is completely up to the structure provider and its
> strategy analysis (if only ’name’ is accessed, then SELECT ’name’ FROM...).
> We can easily support TDocument for document databases. A TDocument’s data
> is accessible via the instructions has, hasKey, value, property, id, … A
> value() could return yet another TDocument (or a TDocumentArray containing
> TDocuments).
>
> Supporting a new complex type is simply a function of asking:
>
>         “Does the TP4 VM instruction set have the requisite
> instruction-types (semantically) to manipulate this structure?"
>
> We are no longer playing the language-specific object API game. We are
> playing the language-agnostic VM instruction game. The TP4-VM instruction
> set is the sole determiner of what complex objects can be processed. (i.e.
> what data structures can be processed without impedance mismatch).
>
>         REQUIREMENT #4 SOLVED
>
> ———
>
> The TP4-VM (and, in turn, Gremlin) can naturally support:
>
>         1. Property graphs: as currently supported in TP3.
>         2. RDF graphs: id() is a URI | Literal. g.V(1).value(‘foaf:name’)
> returns multi/meta-properties *or* g.V(1).out(‘foaf:name’) returns vertices
> whose id()s are xsd:string literals.
>         3. Hypergraphs: inV() can return more than one vertex.
>         4. Undirected graphs: in() and out() throw exceptions. Only both()
> works.
>         5. Meta-properties: value(‘name’) can return a TVertexProperty  (a
> special complex object that is structure provider specific — and that is
> okay!).
>         6. Multi-properties: value(‘name’) can return a TPropertyArray of
> TVertexProperty objects.
>
> This means that the same instruction can behave differently for different
> structures. This is okay as there can be property graph, RDF, hypergraph,
> etc. test suites.
>
> Since complex objects don’t leave the TP4-VM barrier, providers can create
> any complex objects they want — they just have to have corresponding
> strategies to create provider-unique bytecode instructions (and thus,
> CFunctions) for those complex objects.
>
> Finally. there are a few of problems to work out:
>         - There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]”
> representation. Is that bad? Perhaps not.
>         - What is the nature of a TPath? Its complex, but we want to
> return it.
>         - g.V().id() on an RDF graph can return a URI. Is a URI “simple”?
> No, the set of simple types should never grow!…. thus, URI => String. Is
> that wack?
>         - Do we add g.R() and g.D() to Gremlin to type-support TRow and
> TDocument objects. g.V() would be weird :( … Hmmmm?
>                 - However, there are only so many data structures……. or
> are there? TMatrix, TXML, …. whoa.
>
> Thanks for reading,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
>

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi,

I just saw Stephen reply to a guy on gremlin-users@ about String manipulation operations in Gremlin3.

If this email thread’s direction proves correct, then TP4 will have a static set of primitive types. We must ensure that each primitive type has a corresponding set of VM instructions that can “fully” manipulate the primitive.

	TString will force us to provide string manipulation bytecode instructions.
	TLong/TInteger/TDouble/TFloat will force us to provide convenient math instructions.
	TMap and TList will force us to have corresponding get(), put(), size(), containsKey(), etc.-type instructions.
	TBoolean will force us to provide boolean operators — perhaps part of the math instruction subset.

This is great. This requirement gives us a hard and fast rule for creating primitive instructions.

———

However — here is the kicker — think about complex types. Given that this is an unbounded set that is uncontrolled by TinkerPop, we have to think about what instructions (Englsh-semantically) express operations on all potential data structures! This is where we really need a general theory of form. As it currently stands in TP3, these are our “complex type” instructions.

	is: generally useful for object filtering based on object feature.
	has: generally useful for filtering maps based on a key/value pair feature.
	property: generally useful for adding a key/value pair to maps.
	value: generally useful for getting a keys associated value from a map.

Already I think this is all wrong. Why is has() just for maps? What about looking for objects in a list? Be nice to not have a different instruction as has() is English-valid for List.contains(). Why is property() just for maps? What about inserting into a list? In this case, property() is a bad word for list.add(). Why is value() only for maps? What about getting an object from a list?

Here are some thoughts:

	1. Long, Integer, Float, Double, and Boolean do not have any internal structure.
	2. A List is an ordered set of key/value pairs where the keys are integers. list(‘a’,’b’,’c’) == map(1,’a’,2,’b’,3,’c’)
	3. A Map is an ordered set of key/value pairs where they keys are arbitrary objects.
	4. A String is a List of characters. “abc” == list(“a”,”b”,”c”) == map(1,”a”,2,”b”,3,”c”)

With some twiddling, I came up with this:

	is(filter): generally used for filtering an object based on a feature of that object (as a whole).
	has(filter): generally used for filtering an object based on a feature of the values within it. (valueFilter)
	has(filter, filter): generally used for filtering an object based on the features of the keys and values within it. (keyFilter, valueFilter)
	get(filter): generally used for getting values within an object based on key features. (keyFilter)
	get(filter, filter): generally used for getting values within an object based on key/value features. (keyFilter, valueFilter)
	add(flatmap): generally used for adding objects to the tail of an object. (values)
	add(filter, flatmap): generally used for adding values to an object at a particular key. (keyFilter, values)
	delete(): generally used for deleting an object (as a whole).
	delete(filter): generally used for deleting values in an object based on a key feature. (keyFilter)
	delete(filter, filter): generally used for deleting values in an object based on key/value features. (keyFilter, valueFilter)
	
	** TP4 pop is an key-filter function.
		- Pop.key(predicate)
		- Pop.key(object) == Pop.key(eq(object))
		- Pop.index(n) == Pop.key(eq(int)) // the keys of a list are integers
		- Pop.last() == Pop.key(unfold().tail(1))
		- Pop.first() == Pop.key(unfold().limit(1))
		- Pop.all()  == Pop.key(identity())
	** If the Pop result is greater than 1, then the result is a collection, else its a singleton.
	** Pop.last() is the default if no Pop is provided.
			

	TList
	is(list(1,2,3)): List.equals(List.of(1,2,3)) // equivalent to is(eq(list(1,2,3,))
	is(within(list(1,2,3)): List is a sublist of List.of(1,2,3)
	is(not(within(list(1,2,3))): List is not a sublist of List.of(1,2,3)
	is(type(list)): The incoming object is a list
	is(type(list).count(local).is(gt(3))): The incoming object is a list whose size is > 3
	has(‘name’): List.contains(‘name’) // equivalent to has(eq(‘name’))
	has(lt(3)): List.contains() an object less than 3
	has(regex("n*”)): List.contains() a string that matches regex.
	has(has(regex("n*”))): List.contains() a list that contains a string that matches regex.
	has(type(string)): List.contains() a string object.
	get(3): List.get(3) // equivalent to get(index(3), identity())
	get(is(3)): The object 3 if its in the list // equivalent to get(last,is(eq(3)))
	get(all,gt(3)): A list containing all list objects greater than 3 // equivalent to get(all,is(gt(3)))
	get(first,type(string)): The first string of the list
	get(all,type(string)): A list of all the strings in the list
	get(first,has(regex(“n*))): The first list in the list that contains a string that matches the regex
	get(regex(“n*”)): The last string object in the list that matches the predicate // equivalent to get(last,is(regex))
	get(index(within(1,2,4))): A list containing the original lists 1, 2, and 4 indexed objects
	get(index(gt(2))): List.sublist(2,size()-1))
	get(first,either(‘a’,1,true)): The first object in the list that is equal to a, 1, or true.
	add(‘marko’): List.add(“marko”) // equivalent to add(last,’marko’)
	add(3,’marko’): List.add(3,“marko”) // equivalent to add(index(3),’marko')
	add(3,select(‘a’).out().value(‘name’)): Add the names of the adjacent vertices of ‘a’ to the list starting at index 3
	add(3,select(‘a’).out().value(‘name’).limit(1)): Add the first name of the adjacent vertices of ‘a’ to the list at index 3
	add(3,select(‘a’).out().value(‘name’).fold()): Add the names of the adjacent vertices of ‘a’ as a list to the list starting at index 3
	add(index(either(1,5)),’marko’): List.add(1,“marko”); List.add(5,”marko”)
	delete(): List.clear() // equivalent to delete(all, identity())
	delete(all, ‘marko’): List.removeAll(“marko”) // equivalent to delete(all, is(eq(‘marko’)))
	delete(index(3)): // List.remove(3)
	delete(index(gt(3))): // Remove all objects after the third index
	delete(first, “marko”): Remove the first “marko” in the list
	delete(“marko”): Remove the last marko in the list // equivalent to delete(last,is(eq(marko)))
	delete(all,regex(“*n”)): Remove all strings in the list that match the regex.
	delete(all,type(string).count(current).gt(3)): Remove all the strings in the list whose String.size() is > 3.

	TString // A string is just a list of characters so TList method semantics map over nearly one-to-one
	is(“marko”): String.equals(“marko”) // equivalent to is(eq(“marko”))
	is(regex(“n*”)): String.matches(“n*”)
	has(“abc”): String.contains(“abc”)
	get(3): String.charAt(3)
	get(all,’a'): A string containing all the ‘a’ characters
	get(first,’b’): A string that is either empty or is equal to ‘b’
	get(all, “abc”): A string containing all the “abc” sequences
	add(“a”): String.concat(“a”)
	delete(): String = “"
	delete(all, “abc”): String.removeAll(‘abc’)
	delete(first, “abc”): Remove first abc sequence
	delete(index(3)): Remove the third character

	TMap // a map is just a list whose indices are arbitrary objects, not integers.
	is(map(a,1,b,2)): Map.equals(Map.of(a,1,b,2))
	is(within(map(a,1,b,2)): Map is a submap of Map.of(a,1,b,2)
	is(not(within(map(a,1,b,2))): Map is not a submap of Map.of(a,1,b,2)
	is(type(map)): The incoming object is a map
	is(type(map).count(local).gt(3)): The incoming object is a map whose size is > 3
	has(“marko"): Map.values().contains(’name’)
	has(regex("n*”)): Map.values() has a string that matches regex.
	has(has(regex("n*”))): Map.values() contains a list which contains a string that matches regex.
	has(type(string)): Map has a string value
	has(type(string),”marko”)): Map has a string key whose value is “marko”
	has(“name”,”marko”): Map.get(“name”).equals(“marko”)
	get(’name'): Map.get(’name’) // equivalent to get(key(is(eq(name))),identity())
	get(all, is(regex(“n*"))): Map.submap() for the values that match n*.
	get(is(regex(“n*”))): Map.submap() for the keys that match n*.
	get(within(‘a’,’b’,’c')): A map containing the key/value pairs for keys a, b, and c
	get(first,type(string)): The first string value of the Map.
	get(all,type(string)): A Map of all the key/value pairs with string values
	get(key(type(string))): A map of all key/value pairs with string keys
	get(first,is(regex(“n*))): The first key/value pair in the map that contains a string key that matches the regex
	get(first,either(‘a’,1,true)): The first key/value pair in the map whose key is equal to a, 1, or true.
	add(’name',’marko’): Map.put(“name",“marko”) 
	add(’name',select(‘a’).out().value(‘name’).limit(1)): Add the first name of the adjacent vertices of ‘a’ to the name-value
	add(’name',select(‘a’).out().value(‘name’).fold()): Add the names of the adjacent vertices of ‘a’ as a list to the name-value of the map.
	add(either(1,5),’marko’): Map.put(1,“marko”); Map.put(5,”marko”)
	delete(): Map.clear() // equivalent to delete(all)
	delete(all, ’marko’): Removes all the key/value pairs who value is marko	
	delete(“name"): Map.remove(“name")
	delete(regex(“*n”)): Remove all key/value pairs where the key matches the regex.
	delete(type(string).count(current).gt(3)): Remove all key/value pairs where the keys are strings and whose size is > 3.	
	

Now that the instructions above are generally applicable to collections. We can see if complex types can leverage them:

	Property graph vertices: 
		- g.V(1).has(’marko’) // vertex.values().contains(“name”)
		- g.V(1).has(‘name’,’marko’) // vertex.get(“name”).equals(“marko”)
		- g.V(1).get(‘name’) 
		- g.V(1).add(‘name’,’josh’) // put(‘name’,’josh’)
		- g.V(1).using(‘y’).is(within(V().using(‘x’))) // checks if vertex 1 in graph ‘y' is contained in graph ‘x’.
		- g.V(1).delete() // deletes the vertex
		- g.V(1).delete(‘name’) // deletes the vertex’s name property
		- g.V(1).delete(all, ‘marko’) // deletes the vertex properties with a marko value
		- g.V(1).delete(all, type(int).is(lt(3))) // deletes the vertex properties with values that are integers less than 3
		- g.V(1).delete(“age", type(int).is(lt(3))) // deletes the vertex age properties with values that are integers less than 3
		- g.V(1).out() // vertex.get(“outE”).unfold().get(“inV”) // crazy thought
	
	RDF graph vertices:
		g.V(uri:1).outE(‘foaf:knows’).has(‘ng’,uri2) // would determine if the triple is in the named graph uri:2.
		g.V(uri:1).out(‘foaf:name’).id() // would return marko^^xsd:string
		g.V(uri:1).delete() // DELETE uri:1 ?x ?y && ?x ?y uri:1
	
	Relational table rows:
		g.R(‘people’).has(‘name’,’marko’) // should filter out those rows that don’t have a name/marko entry.
		g.R(‘people’).get(‘name’) // would emit the value of the name column of each row.
		g.R(‘people’).is(within(map)) // would check if the row’s key/value pairs are in the map argument.
		g.R(‘people’).count(local) // would return the number of colums in the row.
		g.R(‘people’).toMap() // would turn the complex row object into the primitive TMap. // toMap() replaces valueMap().
		g.R(‘people’).join(g.R(‘addresses’)).by(‘ssn’) // join will be added to TP4 instruction set
		g.R(‘people’).has(‘age’,lt(10)).delete() // this deletes all rows from the people table that are < 10 years old
		g.R(‘people’).has(‘age’,lt(10)).toMap().delete() // this clears the map, leaving the database row unchanged.
		
	Document database:
		g.D(‘uuid:1’).has(‘name’,’marko’) // should filter out those documents who don’t have a key/value of name/marko.
		g.D(‘uuid:1’).get(‘name’) // will emit the value of the name key.
		g.D(‘uuid:1’).delete() // deletes the document from the database.
		g.D(‘uuid:1’).delete(‘name’) // delete the name key/value from the document (and subsequently, from the database)

For the most part, property graph vertices, relational database rows, and documentdb documents are just generalized maps…maps are just generalized lists… lists are just generalized strings…and strings are just generalized singletons.

Bye,
Marko.

http://rredux.com <http://rredux.com/>




> On Apr 15, 2019, at 1:07 PM, Marko Rodriguez <ok...@gmail.com> wrote:
> 
> Hello,
> 
>> I think this does satisfy your requirements, though I don't think I
>> understand all aspects the approach, especially the need for
>> TinkerPop-specific types *for basic scalar values* like booleans, strings,
>> and numbers. Since we are committed to the native data types supported by
>> the JVM.
> 
> TinkerPop4 will have VM implementations on various language-platforms. For sure, Apache’s distribution will have a JVM and .NET implementation. The purpose of TinkerPop-specific types (and not JVM, Mono, Python, etc.) types is that we know its the same type across all VMs.
> 
>> To my mind, your approach is headed in the direction of a
>> TinkerPop-specific notion of a *type*, in general, which captures the
>> structure and constraints of a logical data type
>> <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42 <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42>>,
>> and which can be used for query planning and optimization. These include
>> both scalar types as well as vertex, edge, and property types, as well as
>> more generic constructs such as optionals, lists, records.
> 
> Yes — I’d like to be able to use some type of formal data type specification. You have those skills. I don’t. My rudimentary (non-categorical) representation is just “common useful data structures” — map, list, bool, string, etc. 
> 
>> Can a TList really only contain primitives? A list of vertices or edges
>> would definitely be unusual, and typical PG implementations may not choose
>> to support them, but language-agnostic VM possibly should. They would
>> nicely capture RDF lists, in which list nodes typically do not have any
>> properties (edges) other than rdf:first and rdf:rest.
> 
> A TList only supports primitives. However, a TRDFList could be a complex type for dealing with RDF lists and would be contained with the TP4-VM. Adding complex types is okay — it doesn’t break anything.
> 
> As a related concept — realize that TDocument has a TDocumentArray not a TList. This is because TDocuments can have “lists” that contain primitives, documents, and lists.
> 
> 
>> For hypergraphs, an inV and outV which may produce more than one vertex, is
>> one way to go, but a labeled hypergraph should really have other projections
>> <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49 <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>>
>> in addition to inV, outV. That suggests a more generic step than inV or
>> outV, which takes as an argument the name of the projection as well as the
>> in/out element. E.g. project("in", v1), project("out", v1),
>> project("subject", v1).
> 
> Hm. Yea, I’m not too strong with hypergraph thinking.
> 
> 	g.V(1) // vertex
> 	g.V(1).outE(‘family’)  // hyperedges
> 	g.V(1).outE(‘family’).inV(‘father’) // ? perhaps inV/outV/bothV can take a String… label?
> 
> We should talk to the GRAKN.AI guys and see what they think.
> 	https://grakn.ai/ <https://grakn.ai/>
> 	https://dev.grakn.ai/docs/general/quickstart <https://dev.grakn.ai/docs/general/quickstart>
> 	
>> For undirected graphs, we might as well just allow both in() and out()
>> rather than throwing exceptions. You can think of an undirected edge as a
>> pair of directed edges.
> 
> Okay.
> 
>> Agreed that provider-specific structures (types) are OK, and should not be
>> discouraged. Not only do different providers have their own data models,
>> but specific applications have their own schemas. A structure like a
>> metaproperty may be allowed in certain contexts and not others, and the
>> same goes for instances of conventional structures like edges of a certain
>> label.
> 
> Yes. I want to make sure we naturally/natively support property graphs, RDF graphs, hypergraphs, tables, documents, etc. Property graphs (as specified by Neo4j) are not “special” in TP4. Like Gremlin for languages, property graphs sit side-by-side w/ other data structures. If we do this right, we will be heros!
> 
> 
>> For multi-properties, there is a distinction to be made between multiple
>> properties with the same key and element, and single collection-valued
>> properties. This is something the PG Working Group has been grappling with.
>> I think both should be allowed.
> 
> Agreed. This all gets back to a way to specify what the data structure is:
> 
> 	JanusGraph: a single-labeled property graph with multi/meta-properties.
> 	Neo4j: a multi-labeled property graph with singleton properties (w/ list values supported).
> 	RDF: an unlabeled 1-property graph (named graph property?) with vertex-based literals.
> 	… ?.
> 
> Like Graph.Features in TP3.
> 
>> IMO it's OK if URIs, in an RDF context, become Strings in a TP context. You
>> can think of URI as a constraint on String, which should be enforced at the
>> appropriate time, but does not require a vendor-specific class. Can you
>> concatenate two URIs? Sure... just concatenate the Strings, but also be
>> aware that the result is not a URI.
> 
> Cool.
> 
> Thanks for reading and providing good ideas.
> 
> Marko.
> 
> http://rredux.com <http://rredux.com/>
> 
> 
> 
>> On Mon, Apr 15, 2019 at 5:06 AM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>>
>> wrote:
>> 
>>> Hello,
>>> 
>>> I have a consolidated approach to handling data structures in TP4. I would
>>> appreciate any feedback you many have.
>>> 
>>>        1. Every object processed by TinkerPop has a TinkerPop-specific
>>> type.
>>>                - TLong, TInteger, TString, TMap, TVertex, TEdge, TPath,
>>> TList, …
>>>                - BENEFIT #1: A universal type system will protect us from
>>> language platform peculiarities (e.g. Python long vs Java long).
>>>                - BENEFIT #2: The serialization format is constrained and
>>> consistent across all languages platforms. (no more coming across a
>>> MySpecialClass).
>>>        2. All primitive T-type data can be directly access via get().
>>>                - TBoolean.get() -> java.lang.Boolean | System.Boolean |
>>> ...
>>>                - TLong.get() -> java.lang.Long | System.Int64 | ...
>>>                - TString.get() -> java.lang.String | System.String | …
>>>                - TList.get() -> java.lang.ArrayList | .. // can only
>>> contain primitives
>>>                - TMap.get() -> java.lang.LinkedHashMap | .. // can only
>>> contain primitives
>>>                - ...
>>>        3. All complex T-types have no methods! (except those afforded by
>>> Object)
>>>                - TVertex: no accessible methods.
>>>                - TEdge: no accessible methods.
>>>                - TRow: no accessible methods.
>>>                - TDocument: no accessible methods.
>>>                - TDocumentArray: no accessible methods. // a document
>>> list field that can contain complex objects
>>>                - ...
>>> 
>>> REQUIREMENT #1: We need to be able to support multiple graphdbs in the
>>> same query.
>>>                - e.g., read from JanusGraph and write to Neo4j.
>>> REQUIREMENT #2: We need to make sure complex objects can not be queried
>>> client-side for properties/edges/etc. data.
>>>                - e.g., vertices are universally assumed to be “detached."
>>> REQUIREMENT #3: We no longer want to maintain a structure test suite.
>>> Operational semantics should be verified via Bytecode ->
>>> Processor/Structure.
>>>                - i.e., the only way to read/write vertices is via
>>> Bytecode as complex T-types don’t have APIs.
>>> REQUIREMENT #4: We should support other database data structures besides
>>> graph.
>>>                - e.g., reading from MySQL and writing to JanusGraph.
>>> 
>>> ———
>>> 
>>> Assume the following TraversalSource:
>>> 
>>> g.withStructure(JanusGraphStructure.class, config1).
>>>  withStructure(Neo4jStructure.class, conflg2)
>>> 
>>> Now, assume the following traversal fragment:
>>> 
>>>        outE(’knows’).has(’stars’,5).inV()
>>> 
>>> This would initially be written to Bytecode as:
>>> 
>>>        [[outE,knows],[has,stars,5],[inV]]
>>> 
>>> A decoration strategy realizes that there are two structures registered in
>>> the Bytecode source instructions and would rewrite the above as:
>>> 
>>>        [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]
>>> 
>>> A JanusGraph strategy would rewrite this as:
>>> 
>>> 
>>> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]
>>> 
>>> A Neo4j strategy would rewrite this as:
>>> 
>>> 
>>> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>>> 
>>> A finalization strategy would rewrite this as:
>>> 
>>> 
>>> [choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>>> 
>>> Now, when a TVertex gets to this CFunction, it will check its type, if its
>>> a JanusVertex, it goes down the JanusGraph-specific instruction branch. If
>>> the type is Neo4jVertex, it goes down the Neo4j-specific instruction branch.
>>> 
>>>        REQUIREMENT #1 SOLVED
>>> 
>>> The last instruction of the root bytecode can not return a complex object.
>>> If so, an exception is thrown. g.V() is illegal. g.V().id() is legal.
>>> Complex objects do not exist outside the TP4-VM. Only primitives can leave
>>> the VM-client barrier. If you want vertex property data (e.g.), you have to
>>> access it and return it within the traversal — e.g., g.V().valueMap().
>>>        BENEFIT #1: Language variant implementations are simple. Just
>>> primitives.
>>>        BENEFIT #2: The serialization specification is simple. Just
>>> primitives. (also, note that Bytecode is just a TList of primitives! —
>>> though TBytecode will exist.)
>>>        BENEFIT #3: The concept of a “DetachedVertex” is universally
>>> assumed.
>>> 
>>>        REQUIREMENT #2 SOLVED
>>> 
>>> It is completely up to the structure provider to use structure-specific
>>> instructions for dealing with their particular TVertex. They will have to
>>> provide CFunction implementations for out, in, both, has, outE, inE, bothE,
>>> drop, property, value, id, label … (seems like a lot, but out/in/both could
>>> be one parameterized CFunction).
>>>        BENEFIT #1: No more structure/ API and structure/ test suite.
>>>        BENEFIT #2: The structure provider has full control of where the
>>> vertex data is stored (cached in memory or fetch from the db or a cut
>>> vertex or …). No assumptions are made by the TP4-VM.
>>>        BENEFIT #3: The structure provider can safely assume their
>>> vertices will not be accessed outside the TP4-VM (outside the processor).
>>> 
>>>        REQUIREMENT #3 SOLVED
>>> 
>>> We can support TRow for relational databases. A TRow’s data is accessible
>>> via the instructions has, hasKey, value, property, id, ... The location of
>>> the data in TRow is completely up to the structure provider and its
>>> strategy analysis (if only ’name’ is accessed, then SELECT ’name’ FROM...).
>>> We can easily support TDocument for document databases. A TDocument’s data
>>> is accessible via the instructions has, hasKey, value, property, id, … A
>>> value() could return yet another TDocument (or a TDocumentArray containing
>>> TDocuments).
>>> 
>>> Supporting a new complex type is simply a function of asking:
>>> 
>>>        “Does the TP4 VM instruction set have the requisite
>>> instruction-types (semantically) to manipulate this structure?"
>>> 
>>> We are no longer playing the language-specific object API game. We are
>>> playing the language-agnostic VM instruction game. The TP4-VM instruction
>>> set is the sole determiner of what complex objects can be processed. (i.e.
>>> what data structures can be processed without impedance mismatch).
>>> 
>>>        REQUIREMENT #4 SOLVED
>>> 
>>> ———
>>> 
>>> The TP4-VM (and, in turn, Gremlin) can naturally support:
>>> 
>>>        1. Property graphs: as currently supported in TP3.
>>>        2. RDF graphs: id() is a URI | Literal. g.V(1).value(‘foaf:name’)
>>> returns multi/meta-properties *or* g.V(1).out(‘foaf:name’) returns vertices
>>> whose id()s are xsd:string literals.
>>>        3. Hypergraphs: inV() can return more than one vertex.
>>>        4. Undirected graphs: in() and out() throw exceptions. Only both()
>>> works.
>>>        5. Meta-properties: value(‘name’) can return a TVertexProperty  (a
>>> special complex object that is structure provider specific — and that is
>>> okay!).
>>>        6. Multi-properties: value(‘name’) can return a TPropertyArray of
>>> TVertexProperty objects.
>>> 
>>> This means that the same instruction can behave differently for different
>>> structures. This is okay as there can be property graph, RDF, hypergraph,
>>> etc. test suites.
>>> 
>>> Since complex objects don’t leave the TP4-VM barrier, providers can create
>>> any complex objects they want — they just have to have corresponding
>>> strategies to create provider-unique bytecode instructions (and thus,
>>> CFunctions) for those complex objects.
>>> 
>>> Finally. there are a few of problems to work out:
>>>        - There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]”
>>> representation. Is that bad? Perhaps not.
>>>        - What is the nature of a TPath? Its complex, but we want to
>>> return it.
>>>        - g.V().id() on an RDF graph can return a URI. Is a URI “simple”?
>>> No, the set of simple types should never grow!…. thus, URI => String. Is
>>> that wack?
>>>        - Do we add g.R() and g.D() to Gremlin to type-support TRow and
>>> TDocument objects. g.V() would be weird :( … Hmmmm?
>>>                - However, there are only so many data structures……. or
>>> are there? TMatrix, TXML, …. whoa.
>>> 
>>> Thanks for reading,
>>> Marko.
>>> 
>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>
>

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi,


> just getting real specific around TLong/TInteger for a minute - should
> TinkerPop's primitive just be TNumber? We do a lot of "stuff" to try to
> make numbers just be numbers across TinkerPop and each language has to do
> extra stuff to match the JVM, which is something we keep trying to avoid.

I don’t know what the best practices our for modern databases. I basically just copied the SQL type system assuming that was what standard databases are able to represent.

	https://www.journaldev.com/16774/sql-data-types#sql-data-types <https://www.journaldev.com/16774/sql-data-types#sql-data-types>

?,
Marko.

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Posted by Stephen Mallette <sp...@gmail.com>.

>
> > [...]
> > TinkerPop4 will have VM implementations on various language-platforms.
> For
> > sure, Apache’s distribution will have a JVM and .NET implementation. The
> > purpose of TinkerPop-specific types (and not JVM, Mono, Python, etc.)
> types
> > is that we know its the same type across all VMs.
> >
>
> I agree it is important to define a standard set of scalar types. They can
> probably be counted on one hand, or at most two -- at Uber, we use bytes
> and byte arrays, character strings, floats (varying precision and
> signedness), and integers (varying precision and signedness) as basic
> types. My point is that you may not need special, TinkerPop-specific
> wrapper classes for the scalar types; it is enough to define a mapping.
> E.g. Integer is a suitable implementation, on the JVM (dunno what the .NET
> equivalent is), for a standard 32-bit signed integer type, but a TInteger
> wouldn't hurt.


just getting real specific around TLong/TInteger for a minute - should
TinkerPop's primitive just be TNumber? We do a lot of "stuff" to try to
make numbers just be numbers across TinkerPop and each language has to do
extra stuff to match the JVM, which is something we keep trying to avoid.


On Tue, Apr 16, 2019 at 12:34 PM Joshua Shinavier <jo...@fortytwo.net> wrote:

> On Mon, Apr 15, 2019 at 12:07 PM Marko Rodriguez <ok...@gmail.com>
> wrote:
>
> > [...]
> > TinkerPop4 will have VM implementations on various language-platforms.
> For
> > sure, Apache’s distribution will have a JVM and .NET implementation. The
> > purpose of TinkerPop-specific types (and not JVM, Mono, Python, etc.)
> types
> > is that we know its the same type across all VMs.
> >
>
> I agree it is important to define a standard set of scalar types. They can
> probably be counted on one hand, or at most two -- at Uber, we use bytes
> and byte arrays, character strings, floats (varying precision and
> signedness), and integers (varying precision and signedness) as basic
> types. My point is that you may not need special, TinkerPop-specific
> wrapper classes for the scalar types; it is enough to define a mapping.
> E.g. Integer is a suitable implementation, on the JVM (dunno what the .NET
> equivalent is), for a standard 32-bit signed integer type, but a TInteger
> wouldn't hurt.
>
>
>
> > > To my mind, your approach is headed in the direction of a
> > > TinkerPop-specific notion of a *type*, in general, which captures the
> > > structure and constraints of a logical data type
> > > <
> >
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42
> > <
> >
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42
> > >>,
> > > and which can be used for query planning and optimization. These
> include
> > > both scalar types as well as vertex, edge, and property types, as well
> as
> > > more generic constructs such as optionals, lists, records.
> >
> > Yes — I’d like to be able to use some type of formal data type
> > specification. You have those skills. I don’t. My rudimentary
> > (non-categorical) representation is just “common useful data structures”
> —
> > map, list, bool, string, etc.
> >
>
>
> I think we can formalize an appropriate general-purpose data model along
> the lines I have motivated previously, with property graphs as a special
> case. You are on the thread with Ryan, where we are trying to connect the
> intuitive model with CQL. This would provide some nice guarantees of
> tractability, and think the relationship of the model with runtime types
> ought to be straightforward; they are basically just tuples with reference
> -- pairs, lists, etc.
>
>
>
> > A TList only supports primitives. However, a TRDFList could be a complex
> > type for dealing with RDF lists and would be contained with the TP4-VM.
> > Adding complex types is okay — it doesn’t break anything.
> >
>
> Agree, and don't care too much about the names of the runtime types.
>
>
>
> > Hm. Yea, I’m not too strong with hypergraph thinking.
> >
> >         g.V(1) // vertex
> >         g.V(1).outE(‘family’)  // hyperedges
> >         g.V(1).outE(‘family’).inV(‘father’) // ? perhaps inV/outV/bothV
> > can take a String… label?
> >
> > We should talk to the GRAKN.AI guys and see what they think.
> >         https://grakn.ai/ <https://grakn.ai/>
> >         https://dev.grakn.ai/docs/general/quickstart <
> > https://dev.grakn.ai/docs/general/quickstart>
> >
>
>
> Yes, I am a fan of GRAKN.AI's data model, and I think TinkerPop's
> structure
> APIs ought to be expressive enough to interface with it. The "projections"
> I have talked about here and elsewhere are "roles" in GRAKN, which relaxes
> the property graph constraint from two projections/roles per relationship
> to any number. GRAKN's relationships are hyper-edges in that sense, and
> also in the colloquial sense of "edges to/from edges", i.e. allowing
> projection between relationship types.
>
>
>
> Yes. I want to make sure we naturally/natively support property graphs, RDF
> > graphs, hypergraphs, tables, documents, etc. Property graphs (as
> specified
> > by Neo4j) are not “special” in TP4. Like Gremlin for languages, property
> > graphs sit side-by-side w/ other data structures. If we do this right, we
> > will be heros!
> >
>
> +1
>
>
> Josh
>

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Posted by Joshua Shinavier <jo...@fortytwo.net>.

On Mon, Apr 15, 2019 at 12:07 PM Marko Rodriguez <ok...@gmail.com>
wrote:

> [...]
> TinkerPop4 will have VM implementations on various language-platforms. For
> sure, Apache’s distribution will have a JVM and .NET implementation. The
> purpose of TinkerPop-specific types (and not JVM, Mono, Python, etc.) types
> is that we know its the same type across all VMs.
>

I agree it is important to define a standard set of scalar types. They can
probably be counted on one hand, or at most two -- at Uber, we use bytes
and byte arrays, character strings, floats (varying precision and
signedness), and integers (varying precision and signedness) as basic
types. My point is that you may not need special, TinkerPop-specific
wrapper classes for the scalar types; it is enough to define a mapping.
E.g. Integer is a suitable implementation, on the JVM (dunno what the .NET
equivalent is), for a standard 32-bit signed integer type, but a TInteger
wouldn't hurt.

> > To my mind, your approach is headed in the direction of a
> > TinkerPop-specific notion of a *type*, in general, which captures the
> > structure and constraints of a logical data type
> > <
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42
> <
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42
> >>,
> > and which can be used for query planning and optimization. These include
> > both scalar types as well as vertex, edge, and property types, as well as
> > more generic constructs such as optionals, lists, records.
>
> Yes — I’d like to be able to use some type of formal data type
> specification. You have those skills. I don’t. My rudimentary
> (non-categorical) representation is just “common useful data structures” —
> map, list, bool, string, etc.
>

I think we can formalize an appropriate general-purpose data model along
the lines I have motivated previously, with property graphs as a special
case. You are on the thread with Ryan, where we are trying to connect the
intuitive model with CQL. This would provide some nice guarantees of
tractability, and think the relationship of the model with runtime types
ought to be straightforward; they are basically just tuples with reference
-- pairs, lists, etc.

> A TList only supports primitives. However, a TRDFList could be a complex
> type for dealing with RDF lists and would be contained with the TP4-VM.
> Adding complex types is okay — it doesn’t break anything.
>

Agree, and don't care too much about the names of the runtime types.

> Hm. Yea, I’m not too strong with hypergraph thinking.
>
>         g.V(1) // vertex
>         g.V(1).outE(‘family’)  // hyperedges
>         g.V(1).outE(‘family’).inV(‘father’) // ? perhaps inV/outV/bothV
> can take a String… label?
>
> We should talk to the GRAKN.AI guys and see what they think.
>         https://grakn.ai/ <https://grakn.ai/>
>         https://dev.grakn.ai/docs/general/quickstart <
> https://dev.grakn.ai/docs/general/quickstart>
>

Yes, I am a fan of GRAKN.AI's data model, and I think TinkerPop's structure
APIs ought to be expressive enough to interface with it. The "projections"
I have talked about here and elsewhere are "roles" in GRAKN, which relaxes
the property graph constraint from two projections/roles per relationship
to any number. GRAKN's relationships are hyper-edges in that sense, and
also in the colloquial sense of "edges to/from edges", i.e. allowing
projection between relationship types.

Yes. I want to make sure we naturally/natively support property graphs, RDF
> graphs, hypergraphs, tables, documents, etc. Property graphs (as specified
> by Neo4j) are not “special” in TP4. Like Gremlin for languages, property
> graphs sit side-by-side w/ other data structures. If we do this right, we
> will be heros!
>

+1

Josh

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Posted by Marko Rodriguez <ok...@gmail.com>.

Hello,

> I think this does satisfy your requirements, though I don't think I
> understand all aspects the approach, especially the need for
> TinkerPop-specific types *for basic scalar values* like booleans, strings,
> and numbers. Since we are committed to the native data types supported by
> the JVM.

TinkerPop4 will have VM implementations on various language-platforms. For sure, Apache’s distribution will have a JVM and .NET implementation. The purpose of TinkerPop-specific types (and not JVM, Mono, Python, etc.) types is that we know its the same type across all VMs.

> To my mind, your approach is headed in the direction of a
> TinkerPop-specific notion of a *type*, in general, which captures the
> structure and constraints of a logical data type
> <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42 <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42>>,
> and which can be used for query planning and optimization. These include
> both scalar types as well as vertex, edge, and property types, as well as
> more generic constructs such as optionals, lists, records.

Yes — I’d like to be able to use some type of formal data type specification. You have those skills. I don’t. My rudimentary (non-categorical) representation is just “common useful data structures” — map, list, bool, string, etc. 

> Can a TList really only contain primitives? A list of vertices or edges
> would definitely be unusual, and typical PG implementations may not choose
> to support them, but language-agnostic VM possibly should. They would
> nicely capture RDF lists, in which list nodes typically do not have any
> properties (edges) other than rdf:first and rdf:rest.

A TList only supports primitives. However, a TRDFList could be a complex type for dealing with RDF lists and would be contained with the TP4-VM. Adding complex types is okay — it doesn’t break anything.

As a related concept — realize that TDocument has a TDocumentArray not a TList. This is because TDocuments can have “lists” that contain primitives, documents, and lists.


> For hypergraphs, an inV and outV which may produce more than one vertex, is
> one way to go, but a labeled hypergraph should really have other projections
> <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49 <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>>
> in addition to inV, outV. That suggests a more generic step than inV or
> outV, which takes as an argument the name of the projection as well as the
> in/out element. E.g. project("in", v1), project("out", v1),
> project("subject", v1).

Hm. Yea, I’m not too strong with hypergraph thinking.

	g.V(1) // vertex
	g.V(1).outE(‘family’)  // hyperedges
	g.V(1).outE(‘family’).inV(‘father’) // ? perhaps inV/outV/bothV can take a String… label?

We should talk to the GRAKN.AI guys and see what they think.
	https://grakn.ai/ <https://grakn.ai/>
	https://dev.grakn.ai/docs/general/quickstart <https://dev.grakn.ai/docs/general/quickstart>
	
> For undirected graphs, we might as well just allow both in() and out()
> rather than throwing exceptions. You can think of an undirected edge as a
> pair of directed edges.

Okay.

> Agreed that provider-specific structures (types) are OK, and should not be
> discouraged. Not only do different providers have their own data models,
> but specific applications have their own schemas. A structure like a
> metaproperty may be allowed in certain contexts and not others, and the
> same goes for instances of conventional structures like edges of a certain
> label.

Yes. I want to make sure we naturally/natively support property graphs, RDF graphs, hypergraphs, tables, documents, etc. Property graphs (as specified by Neo4j) are not “special” in TP4. Like Gremlin for languages, property graphs sit side-by-side w/ other data structures. If we do this right, we will be heros!


> For multi-properties, there is a distinction to be made between multiple
> properties with the same key and element, and single collection-valued
> properties. This is something the PG Working Group has been grappling with.
> I think both should be allowed.

Agreed. This all gets back to a way to specify what the data structure is:

	JanusGraph: a single-labeled property graph with multi/meta-properties.
	Neo4j: a multi-labeled property graph with singleton properties (w/ list values supported).
	RDF: an unlabeled 1-property graph (named graph property?) with vertex-based literals.
	… ?.

Like Graph.Features in TP3.

> IMO it's OK if URIs, in an RDF context, become Strings in a TP context. You
> can think of URI as a constraint on String, which should be enforced at the
> appropriate time, but does not require a vendor-specific class. Can you
> concatenate two URIs? Sure... just concatenate the Strings, but also be
> aware that the result is not a URI.

Cool.

Thanks for reading and providing good ideas.

Marko.

http://rredux.com



> On Mon, Apr 15, 2019 at 5:06 AM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>>
> wrote:
> 
>> Hello,
>> 
>> I have a consolidated approach to handling data structures in TP4. I would
>> appreciate any feedback you many have.
>> 
>>        1. Every object processed by TinkerPop has a TinkerPop-specific
>> type.
>>                - TLong, TInteger, TString, TMap, TVertex, TEdge, TPath,
>> TList, …
>>                - BENEFIT #1: A universal type system will protect us from
>> language platform peculiarities (e.g. Python long vs Java long).
>>                - BENEFIT #2: The serialization format is constrained and
>> consistent across all languages platforms. (no more coming across a
>> MySpecialClass).
>>        2. All primitive T-type data can be directly access via get().
>>                - TBoolean.get() -> java.lang.Boolean | System.Boolean |
>> ...
>>                - TLong.get() -> java.lang.Long | System.Int64 | ...
>>                - TString.get() -> java.lang.String | System.String | …
>>                - TList.get() -> java.lang.ArrayList | .. // can only
>> contain primitives
>>                - TMap.get() -> java.lang.LinkedHashMap | .. // can only
>> contain primitives
>>                - ...
>>        3. All complex T-types have no methods! (except those afforded by
>> Object)
>>                - TVertex: no accessible methods.
>>                - TEdge: no accessible methods.
>>                - TRow: no accessible methods.
>>                - TDocument: no accessible methods.
>>                - TDocumentArray: no accessible methods. // a document
>> list field that can contain complex objects
>>                - ...
>> 
>> REQUIREMENT #1: We need to be able to support multiple graphdbs in the
>> same query.
>>                - e.g., read from JanusGraph and write to Neo4j.
>> REQUIREMENT #2: We need to make sure complex objects can not be queried
>> client-side for properties/edges/etc. data.
>>                - e.g., vertices are universally assumed to be “detached."
>> REQUIREMENT #3: We no longer want to maintain a structure test suite.
>> Operational semantics should be verified via Bytecode ->
>> Processor/Structure.
>>                - i.e., the only way to read/write vertices is via
>> Bytecode as complex T-types don’t have APIs.
>> REQUIREMENT #4: We should support other database data structures besides
>> graph.
>>                - e.g., reading from MySQL and writing to JanusGraph.
>> 
>> ———
>> 
>> Assume the following TraversalSource:
>> 
>> g.withStructure(JanusGraphStructure.class, config1).
>>  withStructure(Neo4jStructure.class, conflg2)
>> 
>> Now, assume the following traversal fragment:
>> 
>>        outE(’knows’).has(’stars’,5).inV()
>> 
>> This would initially be written to Bytecode as:
>> 
>>        [[outE,knows],[has,stars,5],[inV]]
>> 
>> A decoration strategy realizes that there are two structures registered in
>> the Bytecode source instructions and would rewrite the above as:
>> 
>>        [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]
>> 
>> A JanusGraph strategy would rewrite this as:
>> 
>> 
>> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]
>> 
>> A Neo4j strategy would rewrite this as:
>> 
>> 
>> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>> 
>> A finalization strategy would rewrite this as:
>> 
>> 
>> [choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>> 
>> Now, when a TVertex gets to this CFunction, it will check its type, if its
>> a JanusVertex, it goes down the JanusGraph-specific instruction branch. If
>> the type is Neo4jVertex, it goes down the Neo4j-specific instruction branch.
>> 
>>        REQUIREMENT #1 SOLVED
>> 
>> The last instruction of the root bytecode can not return a complex object.
>> If so, an exception is thrown. g.V() is illegal. g.V().id() is legal.
>> Complex objects do not exist outside the TP4-VM. Only primitives can leave
>> the VM-client barrier. If you want vertex property data (e.g.), you have to
>> access it and return it within the traversal — e.g., g.V().valueMap().
>>        BENEFIT #1: Language variant implementations are simple. Just
>> primitives.
>>        BENEFIT #2: The serialization specification is simple. Just
>> primitives. (also, note that Bytecode is just a TList of primitives! —
>> though TBytecode will exist.)
>>        BENEFIT #3: The concept of a “DetachedVertex” is universally
>> assumed.
>> 
>>        REQUIREMENT #2 SOLVED
>> 
>> It is completely up to the structure provider to use structure-specific
>> instructions for dealing with their particular TVertex. They will have to
>> provide CFunction implementations for out, in, both, has, outE, inE, bothE,
>> drop, property, value, id, label … (seems like a lot, but out/in/both could
>> be one parameterized CFunction).
>>        BENEFIT #1: No more structure/ API and structure/ test suite.
>>        BENEFIT #2: The structure provider has full control of where the
>> vertex data is stored (cached in memory or fetch from the db or a cut
>> vertex or …). No assumptions are made by the TP4-VM.
>>        BENEFIT #3: The structure provider can safely assume their
>> vertices will not be accessed outside the TP4-VM (outside the processor).
>> 
>>        REQUIREMENT #3 SOLVED
>> 
>> We can support TRow for relational databases. A TRow’s data is accessible
>> via the instructions has, hasKey, value, property, id, ... The location of
>> the data in TRow is completely up to the structure provider and its
>> strategy analysis (if only ’name’ is accessed, then SELECT ’name’ FROM...).
>> We can easily support TDocument for document databases. A TDocument’s data
>> is accessible via the instructions has, hasKey, value, property, id, … A
>> value() could return yet another TDocument (or a TDocumentArray containing
>> TDocuments).
>> 
>> Supporting a new complex type is simply a function of asking:
>> 
>>        “Does the TP4 VM instruction set have the requisite
>> instruction-types (semantically) to manipulate this structure?"
>> 
>> We are no longer playing the language-specific object API game. We are
>> playing the language-agnostic VM instruction game. The TP4-VM instruction
>> set is the sole determiner of what complex objects can be processed. (i.e.
>> what data structures can be processed without impedance mismatch).
>> 
>>        REQUIREMENT #4 SOLVED
>> 
>> ———
>> 
>> The TP4-VM (and, in turn, Gremlin) can naturally support:
>> 
>>        1. Property graphs: as currently supported in TP3.
>>        2. RDF graphs: id() is a URI | Literal. g.V(1).value(‘foaf:name’)
>> returns multi/meta-properties *or* g.V(1).out(‘foaf:name’) returns vertices
>> whose id()s are xsd:string literals.
>>        3. Hypergraphs: inV() can return more than one vertex.
>>        4. Undirected graphs: in() and out() throw exceptions. Only both()
>> works.
>>        5. Meta-properties: value(‘name’) can return a TVertexProperty  (a
>> special complex object that is structure provider specific — and that is
>> okay!).
>>        6. Multi-properties: value(‘name’) can return a TPropertyArray of
>> TVertexProperty objects.
>> 
>> This means that the same instruction can behave differently for different
>> structures. This is okay as there can be property graph, RDF, hypergraph,
>> etc. test suites.
>> 
>> Since complex objects don’t leave the TP4-VM barrier, providers can create
>> any complex objects they want — they just have to have corresponding
>> strategies to create provider-unique bytecode instructions (and thus,
>> CFunctions) for those complex objects.
>> 
>> Finally. there are a few of problems to work out:
>>        - There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]”
>> representation. Is that bad? Perhaps not.
>>        - What is the nature of a TPath? Its complex, but we want to
>> return it.
>>        - g.V().id() on an RDF graph can return a URI. Is a URI “simple”?
>> No, the set of simple types should never grow!…. thus, URI => String. Is
>> that wack?
>>        - Do we add g.R() and g.D() to Gremlin to type-support TRow and
>> TDocument objects. g.V() would be weird :( … Hmmmm?
>>                - However, there are only so many data structures……. or
>> are there? TMatrix, TXML, …. whoa.
>> 
>> Thanks for reading,
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Posted by Joshua Shinavier <jo...@fortytwo.net>.

Hi Marko,

I think this does satisfy your requirements, though I don't think I
understand all aspects the approach, especially the need for
TinkerPop-specific types *for basic scalar values* like booleans, strings,
and numbers. Since we are committed to the native data types supported by
the JVM, I think it is OK to use a subset of them as the basis for a
TinkerPop type system. E.g. while a formal type system might define "long"
as a signed 64-bit integer, the Long class is an appropriate
implementation; while it doesn't hurt to wrap Long in a TinkerPop-specific
TLong class, I am not sure it is necessary. Maybe there is more to your
get(), or other methods you would like to attach to these types, than I see.

To my mind, your approach is headed in the direction of a
TinkerPop-specific notion of a *type*, in general, which captures the
structure and constraints of a logical data type
<https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/42>,
and which can be used for query planning and optimization. These include
both scalar types as well as vertex, edge, and property types, as well as
more generic constructs such as optionals, lists, records.

Miscellaneous thoughts:

Can a TList really only contain primitives? A list of vertices or edges
would definitely be unusual, and typical PG implementations may not choose
to support them, but language-agnostic VM possibly should. They would
nicely capture RDF lists, in which list nodes typically do not have any
properties (edges) other than rdf:first and rdf:rest.

For hypergraphs, an inV and outV which may produce more than one vertex, is
one way to go, but a labeled hypergraph should really have other projections
<https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>
in addition to inV, outV. That suggests a more generic step than inV or
outV, which takes as an argument the name of the projection as well as the
in/out element. E.g. project("in", v1), project("out", v1),
project("subject", v1).

For undirected graphs, we might as well just allow both in() and out()
rather than throwing exceptions. You can think of an undirected edge as a
pair of directed edges.

Agreed that provider-specific structures (types) are OK, and should not be
discouraged. Not only do different providers have their own data models,
but specific applications have their own schemas. A structure like a
metaproperty may be allowed in certain contexts and not others, and the
same goes for instances of conventional structures like edges of a certain
label.

For multi-properties, there is a distinction to be made between multiple
properties with the same key and element, and single collection-valued
properties. This is something the PG Working Group has been grappling with.
I think both should be allowed.

IMO it's OK if URIs, in an RDF context, become Strings in a TP context. You
can think of URI as a constraint on String, which should be enforced at the
appropriate time, but does not require a vendor-specific class. Can you
concatenate two URIs? Sure... just concatenate the Strings, but also be
aware that the result is not a URI.

Josh



On Mon, Apr 15, 2019 at 5:06 AM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello,
>
> I have a consolidated approach to handling data structures in TP4. I would
> appreciate any feedback you many have.
>
>         1. Every object processed by TinkerPop has a TinkerPop-specific
> type.
>                 - TLong, TInteger, TString, TMap, TVertex, TEdge, TPath,
> TList, …
>                 - BENEFIT #1: A universal type system will protect us from
> language platform peculiarities (e.g. Python long vs Java long).
>                 - BENEFIT #2: The serialization format is constrained and
> consistent across all languages platforms. (no more coming across a
> MySpecialClass).
>         2. All primitive T-type data can be directly access via get().
>                 - TBoolean.get() -> java.lang.Boolean | System.Boolean |
> ...
>                 - TLong.get() -> java.lang.Long | System.Int64 | ...
>                 - TString.get() -> java.lang.String | System.String | …
>                 - TList.get() -> java.lang.ArrayList | .. // can only
> contain primitives
>                 - TMap.get() -> java.lang.LinkedHashMap | .. // can only
> contain primitives
>                 - ...
>         3. All complex T-types have no methods! (except those afforded by
> Object)
>                 - TVertex: no accessible methods.
>                 - TEdge: no accessible methods.
>                 - TRow: no accessible methods.
>                 - TDocument: no accessible methods.
>                 - TDocumentArray: no accessible methods. // a document
> list field that can contain complex objects
>                 - ...
>
> REQUIREMENT #1: We need to be able to support multiple graphdbs in the
> same query.
>                 - e.g., read from JanusGraph and write to Neo4j.
> REQUIREMENT #2: We need to make sure complex objects can not be queried
> client-side for properties/edges/etc. data.
>                 - e.g., vertices are universally assumed to be “detached."
> REQUIREMENT #3: We no longer want to maintain a structure test suite.
> Operational semantics should be verified via Bytecode ->
> Processor/Structure.
>                 - i.e., the only way to read/write vertices is via
> Bytecode as complex T-types don’t have APIs.
> REQUIREMENT #4: We should support other database data structures besides
> graph.
>                 - e.g., reading from MySQL and writing to JanusGraph.
>
> ———
>
> Assume the following TraversalSource:
>
> g.withStructure(JanusGraphStructure.class, config1).
>   withStructure(Neo4jStructure.class, conflg2)
>
> Now, assume the following traversal fragment:
>
>         outE(’knows’).has(’stars’,5).inV()
>
>  This would initially be written to Bytecode as:
>
>         [[outE,knows],[has,stars,5],[inV]]
>
> A decoration strategy realizes that there are two structures registered in
> the Bytecode source instructions and would rewrite the above as:
>
>         [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]
>
> A JanusGraph strategy would rewrite this as:
>
>
> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]
>
> A Neo4j strategy would rewrite this as:
>
>
> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>
> A finalization strategy would rewrite this as:
>
>
> [choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
>
> Now, when a TVertex gets to this CFunction, it will check its type, if its
> a JanusVertex, it goes down the JanusGraph-specific instruction branch. If
> the type is Neo4jVertex, it goes down the Neo4j-specific instruction branch.
>
>         REQUIREMENT #1 SOLVED
>
> The last instruction of the root bytecode can not return a complex object.
> If so, an exception is thrown. g.V() is illegal. g.V().id() is legal.
> Complex objects do not exist outside the TP4-VM. Only primitives can leave
> the VM-client barrier. If you want vertex property data (e.g.), you have to
> access it and return it within the traversal — e.g., g.V().valueMap().
>         BENEFIT #1: Language variant implementations are simple. Just
> primitives.
>         BENEFIT #2: The serialization specification is simple. Just
> primitives. (also, note that Bytecode is just a TList of primitives! —
> though TBytecode will exist.)
>         BENEFIT #3: The concept of a “DetachedVertex” is universally
> assumed.
>
>         REQUIREMENT #2 SOLVED
>
> It is completely up to the structure provider to use structure-specific
> instructions for dealing with their particular TVertex. They will have to
> provide CFunction implementations for out, in, both, has, outE, inE, bothE,
> drop, property, value, id, label … (seems like a lot, but out/in/both could
> be one parameterized CFunction).
>         BENEFIT #1: No more structure/ API and structure/ test suite.
>         BENEFIT #2: The structure provider has full control of where the
> vertex data is stored (cached in memory or fetch from the db or a cut
> vertex or …). No assumptions are made by the TP4-VM.
>         BENEFIT #3: The structure provider can safely assume their
> vertices will not be accessed outside the TP4-VM (outside the processor).
>
>         REQUIREMENT #3 SOLVED
>
> We can support TRow for relational databases. A TRow’s data is accessible
> via the instructions has, hasKey, value, property, id, ... The location of
> the data in TRow is completely up to the structure provider and its
> strategy analysis (if only ’name’ is accessed, then SELECT ’name’ FROM...).
> We can easily support TDocument for document databases. A TDocument’s data
> is accessible via the instructions has, hasKey, value, property, id, … A
> value() could return yet another TDocument (or a TDocumentArray containing
> TDocuments).
>
> Supporting a new complex type is simply a function of asking:
>
>         “Does the TP4 VM instruction set have the requisite
> instruction-types (semantically) to manipulate this structure?"
>
> We are no longer playing the language-specific object API game. We are
> playing the language-agnostic VM instruction game. The TP4-VM instruction
> set is the sole determiner of what complex objects can be processed. (i.e.
> what data structures can be processed without impedance mismatch).
>
>         REQUIREMENT #4 SOLVED
>
> ———
>
> The TP4-VM (and, in turn, Gremlin) can naturally support:
>
>         1. Property graphs: as currently supported in TP3.
>         2. RDF graphs: id() is a URI | Literal. g.V(1).value(‘foaf:name’)
> returns multi/meta-properties *or* g.V(1).out(‘foaf:name’) returns vertices
> whose id()s are xsd:string literals.
>         3. Hypergraphs: inV() can return more than one vertex.
>         4. Undirected graphs: in() and out() throw exceptions. Only both()
> works.
>         5. Meta-properties: value(‘name’) can return a TVertexProperty  (a
> special complex object that is structure provider specific — and that is
> okay!).
>         6. Multi-properties: value(‘name’) can return a TPropertyArray of
> TVertexProperty objects.
>
> This means that the same instruction can behave differently for different
> structures. This is okay as there can be property graph, RDF, hypergraph,
> etc. test suites.
>
> Since complex objects don’t leave the TP4-VM barrier, providers can create
> any complex objects they want — they just have to have corresponding
> strategies to create provider-unique bytecode instructions (and thus,
> CFunctions) for those complex objects.
>
> Finally. there are a few of problems to work out:
>         - There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]”
> representation. Is that bad? Perhaps not.
>         - What is the nature of a TPath? Its complex, but we want to
> return it.
>         - g.V().id() on an RDF graph can return a URI. Is a URI “simple”?
> No, the set of simple types should never grow!…. thus, URI => String. Is
> that wack?
>         - Do we add g.R() and g.D() to Gremlin to type-support TRow and
> TDocument objects. g.V() would be weird :( … Hmmmm?
>                 - However, there are only so many data structures……. or
> are there? TMatrix, TXML, …. whoa.
>
> Thanks for reading,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
>