You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2019/04/18 20:15:09 UTC

What makes 'graph traversals' and 'relational joins' the same?

Hi,

*** This is mainly for Kuppitz, but if others care. 

Was thinking last night about relational data and Gremlin. The T() step returns all the tables in the withStructure() RDBMS database. Tables are ‘complex values’ so they can't leave the VM (only a simple ‘toString’).

Below is a fake Gremlin session. (and these are just ideas…)
	tables -> a ListLike of rows
	rows -> a MapLike of primitives

gremlin> g.T()
==>t[people]
==>t[addresses]
gremlin> g.T(‘people’)
==>t[people]
gremlin> g.T(‘people’).values()
==>r[people:1]
==>r[people:2]
==>r[people:3]
gremlin> g.T(‘people’).values().asMap()
==>{name:marko,age:29}
==>{name:kuppitz,age:10}
==>{name:josh,age:35}
gremlin> g.T(‘people’).values().has(‘age’,gt(20))
==>r[people:1]
==>r[people:3]
gremlin> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)
==>marko
==>josh

Makes sense. Nice that values() and has() generally apply to all ListLike and MapLike structures. Also, note how asMap() is the valueMap() of TP4, but generalizes to anything that is MapLike so it can be turned into a primitive form as a data-rich result from the VM.

gremlin> g.T()
==>t[people]
==>t[addresses]
gremlin> g.T(‘addresses’).values().asMap()
==>{name:marko,city:santafe}
==>{name:kuppitz,city:tucson}
==>{name:josh,city:desertisland}
gremlin> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
             by(select(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
           values().asMap()
==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe}
==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}
==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}
gremlin> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
             by(’name’). // shorthand for equijoin on name column/key
           values().asMap()
==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe}
==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}
==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}
gremlin> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
             by(’name’)
==>t[people<-name->addresses]  // without asMap(), just the complex value ‘toString'
gremlin>

And of course, all of this is strategized into a SQL call so its joins aren’t necessarily computed using TP4-VM resources.

Anywho — what I hope to realize is the relationship between “links” (graph) and “joins” (tables). How can we make (bytecode-wise at least) RDBMS join operations and graph traversal operations ‘the same.’?

	Singleton: Integer, String, Float, Double, etc.
	Collection: List, Map (Vertex, Table, Document)
	Linkable: Vertex, Table

Vertices and Tables can be “linked.” Unlike Collections, they don’t maintain a “parent/child” relationship with the objects they reference. What does this mean……….?

Take care,
Marko.

http://rredux.com <http://rredux.com/>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Marko Rodriguez <ok...@gmail.com>.

Hey,

Thinking through things more and re-reading your emails.

Its like this:

	From an object you want to be able to go to the relations in which that object is a particular entry.
	From that relation you want to go to another object referenced in another entry.

For instance assume this set of 3-tuple relations:

talk_table
speaker  listener  statement
marko    josh      “sup bro"
marko    kuppitz   “dude man"

Lets say I’m at josh and I want to know what marko said to him:

	josh.adjacents(‘talk’,’listener’, …) // and this is why you have from().restrict().to()

Using your from()/restrict()/to() notation:

	josh.from(‘talk’,’listener’).restrict(‘speaker’,marko).to(‘statement’) => “sup bro”

I want to get some terminology down:

	Relation: a tuple with key/value entries. (basically a map)
	Key: A relation column name.
	Value: A relation column value.

So there are three operations:

	1. Get the relations in which the current object is a value for the specified key. [select] // like a back()
	2. Filter out those relations that don’t have a particular value for a particular key. [filter]
	3. Get those objects in the remaining relations associated with a particular key. [project] // like a forward()

What did Kuppitz hear from Marko?

	kuppitz.select(‘talk’,’listener’).filter(‘speaker’,marko).project(‘statement’) => “dude man”

So, how do we do this with just goto pointer chasing?

kuppitz.goto(‘listener’).filter(goto(‘speaker’).is(marko)).goto(‘statement’)

That is, I went from Kuppitz to all those relations in which he is a listener. I then filtered out those relations that don’t have marko as the speaker. I then went to the statements associated with those remaining relations. However, with this model, I’m assuming that “listener” is unique to the talk_table and this is not smart…

Anywho, is this more in line with what you are getting at?

Thanks for your patience,
Marko.

http://rredux.com <http://rredux.com/>




> On Apr 24, 2019, at 11:30 AM, Marko Rodriguez <ok...@gmail.com> wrote:
> 
> Hi,
> 
> I think I understand you now. The concept of local and non-local data is what made me go “ah!”
> 
> So let me reiterate what I think you are saying.
> 
> v[1] is guaranteed to have its id data local to it. All other information could be derived via id-based "equi-joins.” Thus, we can’t assume that a vertex will always have its properties and edges co-located with it. However, we can assume that it knows where to get its property and edge data when requested. Assume the following RDBMS-style data structure that is referenced by com.example.MyGraph.
> 
> vertex_table
> id label
> 1  person
> 2  person
> …
> 
> properties_table
> id  name   age
> 1   marko  29
> 2   josh   35
> …
> 
> edge_table
> id outV  label  inV
> 0  1    knows   2
> …
> 
> If we want to say that the above data structure is a graph, what is required of “ComplexType” such that we can satisfy both Neo4j-style and RDBMS-style graph encodings? Assume ComplexType is defined as:
> 
> interface ComplexType
>   Iterator<T> adjacents(String label, Object... identifiers)
> 
> Take this basic Gremlin traversal:
> 
> g.V(1).out(‘knows’).values(‘name’)
> 
> I now believe this should compile to the following:
> 
> [goto,V,1] [goto,outE,knows] [goto,inV] [goto,properties,name]
> 
> Given MyGraph/MyVertex/MyEdge all implement ComplexType and there is no local caching of data on these respective objects, then the bytecode isn’t rewritten and the following cascade of events occurs:
> 
> mygraph
> [goto,V,1] => 
>   mygraph.adjacents(“V”,1) => 
>     SELECT * FROM vertex_table WHERE id=1
> myvertex1
> [goto,outE,knows] => 
>   myvertex1.adjacents(“outE”,”knows”) => 
>     SELECT id FROM edge_table WHERE outV=1 AND label=knows
> myedge0
> [goto,inV,knows] => 
>   myedge1.adjacents(“inV”) => 
>     SELECT vertex_table.id FROM vertex_table, edge_table WHERE vertex_table.id=edge_table.inV AND edge_table.id=0
> myvertex2
> [goto,properties,name] => 
>   myvertex2.adjacents(“properties”,”name”) => 
>     SELECT name FROM properties_table WHERE id=2
> “josh"
> 
> Lets review the ComplexType adjacents()-method:
> 
> complexType.adjacents(label,identifiers...)
> 
> complexType must have sufficient information to represent the tail of the relation.
> label specifies the relation type (we will always assume that a single String is sufficient)
> identifiers... must contain sufficient information to identify the head of the relation.
> 
> The return of the the method adjacents() is then the object(s) on the other side of the relation(s).
> 
> Now, given the way I have my data structure organized, I could beef up the MyXXX implementation such that MyStrategy rewrites the base bytecode to:
> 
> [goto,V,1] [goto,out,knows][goto,properties,name]
> 
> The following cascade of events occurs:
> 
> mygraph
> [goto,V,1] => 
>   mygraph.adjacents(“V”,1) => 
>     SELECT * FROM vertex_table WHERE id=1
> myvertex1
> [goto,out,knows] => 
>   myvertex1.adjacents(“outE”,”knows”) => 
>     SELECT vertex_table.id FROM vertex_table,edge_table WHERE outV=1 AND label=knows AND inV=vertex_table.id
> myvertex2
> [goto,properties,name] => 
>   myvertex2.adjacents(“properties”,”name”) => 
>     SELECT name FROM properties_table WHERE id=2
> “josh"
> 
> Now, I could really beef up MyStrategy when I realize that no path information is used in the traversal. Thus, the base bytecode compiles to:
> 
> [my:sql,SELECT name FROM properties_table,vertex_table,edge_table WHERE … lots of join equalities]
> 
> This would then just emit “josh” given the mygraph object.
> 
> ——
> 
> To recap.
> 
> 	1. There are primitives.
> 	2. There are Maps and Lists.
> 	3. There are ComplexTypes.
> 	4. ComplexTypes are adjacent to other objects via relations.
> 		- These adjacent objects may be cached locally with the ComplexType instance.
> 		- These adjacent objects may require some database lookup.
> 		- Regardless, TP4 doesn’t care — its up to the provider’s ComplexType instance to decide how to resolve the adjacency.
> 	5. ComplexTypes don’t go over the wire — a ComplexTypeProxy with appropriately provided toString() is all that leaves the TP4 VM.
> 
> Finally, to solve the asMap()/asList() problem, we simply have:
> 
> asMap(’name’,’age’) => complexType.adjacents(‘asMap’,’name’,’age')
> asList() => complexType.adjacents(‘asList’)
> 
> It is up to the complexType to manifest a Map or List accordingly.
> 
> I see this as basically a big flatmap system. ComplexTypes just map from self to any number of logical neighbors as specified by the relation.
> 
> Am I getting it?,
> Marko.
> 
> http://rredux.com <http://rredux.com/>
> 
> 
> 
> 
>> On Apr 24, 2019, at 9:56 AM, Joshua Shinavier <josh@fortytwo.net <ma...@fortytwo.net>> wrote:
>> 
>> On Tue, Apr 23, 2019 at 10:28 AM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>>
>> wrote:
>> 
>>> Hi,
>>> 
>>> I think we are very close to something useable for TP4 structure/. Solving
>>> this problem elegantly will open the flood gates on tp4/ development.
>>> 
>> 
>> Yes, and formality often brings elegance. I don't think we can do much
>> better than relational algebra and relational calculus in terms of
>> formality, so to the extent we can reduce the fundamental TP4 traversal
>> steps to basic relational operations, the floodgates will also be open to
>> applications of query validation and query optimization from the last 40+
>> years of research.
>> 
>> 
>> 
>>> I still don’t grock your comeFrom().goto() stuff. I don’t get the benefit
>>> of having two instructions for “pointer chasing” instead of one.
>>> 
>> 
>> There are just a handful of basic operations in relational algebra.
>> Projection, selection, union, complement, Cartesian product. Joins, as well
>> as all other operations, can be derived from these. A lot of graph
>> traversal can be accomplished using only projection and selection, which is
>> why we were able to get away with only to/goto and from/comeFrom in the
>> examples above. However, I believe you do need both operations. You can
>> kind of get away without from() if you assume that each vertex has local
>> inE and outE references to incoming and outgoing edges, but I see that as a
>> kind of pre-materialized from()/select(). If you think of edges strictly as
>> relations, and represent them in a straightforward way with tables, you
>> don't need the local inE and outE; whether you have them depends on the
>> graph back-end.
>> 
>> 
>> 
>>> Lets put that aside for now and lets turn to modeling a Vertex. Go back to
>>> my original representation:
>>> 
>>> vertex.goto(‘label’)
>>> vertex.goto(‘id’)
>>> 
>> 
>> Local (in my view). All good.
>> 
>> 
>> 
>>> vertex.goto(‘outE’)
>>> vertex.goto(‘inE’)
>>> vertex.goto(‘properties’)
>>> 
>> 
>> Non-local (in my view). You can use goto(), but if the goal is to bring the
>> relational model into the fold, at a lower level you do have a select()
>> operation. Unless you make projections local to vertices instead of edges,
>> but then you just have the same problem in reverse. Am I making sense?
>> 
>> 
>> Any object can be converted into a Map. In TinkerPop3 we convert vertices
>>> into maps via:
>>> 
>>>        g.V().has(‘name’,’marko’).valueMap() => {name:marko,age:29}
>>>        g.V().has(‘name’,’marko’).valueMap(true) =>
>>> {id:1,label:person,name:marko,age:29}
>>> 
>> 
>> Maps are A-OK. In the case of properties, I think where we differ is that
>> you see a property like "name" as a key/value pair in a map local to the
>> vertex. I see the property as an element of type "name", with the vertex as
>> a value in its local map, logically if not physically. This allows maximum
>> flexibility in terms of meta-properties -- exotic beasts which seem to be
>> in a kind of limbo state in TP3, but if we're trying to be as general as
>> possible, some data models we might want to pull in, like GRAKN.AI, do
>> allow this kind of flexibility.
>> 
>> 
>> 
>>> In the spirit of instruction reuse, we should have an asMap() instruction
>>> that works for ANY object. (As a side: this gets back to ONLY sending
>>> primitives over the wire, no
>>> Vertex/Edge/Document/Table/Row/XML/ColumnFamily/etc.). Thus, the above is:
>>> 
>>>        g.V().has(‘name’,’marko’).properties().asMap() =>
>>> {name:marko,age:29}
>>>        g.V().has(‘name’,’marko’).asMap() =>
>>> {id:1,label:person,properties:{name:marko,age:29}}
>>> 
>> 
>> Again, no argument here, although I would think of a map as an
>> optimization. IMO, the fundamental projections from v[1] are id:1 and
>> label:Person. You could make a map out of these, or just use an offset,
>> since the keys are always the same. However, you can also build a map
>> including any key you can turn into a function. properties() is such a key.
>> 
>> 
>> You might ask, why didn’t it go to outE and inE and map-ify that data?
>>> Because those are "sibling” references, not “children” references.
>>> 
>>>        goto(‘outE’) is a “sibling” reference. (a vertex does not contain
>>> an edge)
>>>        goto(‘id’) is a “child” reference. (a vertex contains the id)
>>> 
>> 
>> I agree with both of those statements. A vertex does not contain the edges
>> incident on it. Again, I am thinking of properties a bit more like edges
>> for maximum generality.
>> 
>> 
>> 
>>> Where do we find sibling references?
>>>        Graphs: vertices don’t contain each other.
>>>        OO heaps: many objects don’t contain each other.
>>>        RDBMS: rows are linked by joins, but don’t contain each other.
>>> 
>> 
>> Yep.
>> 
>> 
>> So, the way in which we structure our references (pointers) determines the
>>> shape of the data and ultimately how different instructions will behave. We
>>> can’t assume that asMap() knows anything about
>>> vertices/edges/documents/rows/tables/etc. It will simply walk all
>>> child-references and create a map.
>>> 
>> 
>> Just to play devil's advocate, you *could* include "inE" and "outE" as keys
>> in the local map of a vertex; it's just a matter of what you choose to do.
>> inE and outE are perfectly good functions from a vertex to a set of edges.
>> 
>> 
>> We don’t want TP to get involved in “complex data types.”
>> 
>> 
>> Well, how do you feel about algebraic data types? They are simple, and
>> allow you to capture arbitrary relations as elements.
>> 
>> 
>> 
>>> We don’t care. You can propagate MyDatabaseObject through the TP4 VM
>>> pipeline and load your object up with methods for optimizations with your
>>> DB and all that, but for TP4, your object is just needs to implement:
>>> 
>>>        ComplexType
>>>                - Iterator<T> children(String label)
>>>                - Iterator<T> siblings(String label)
>>>                - default Iterator<T> references(String label) {
>>> IteratorUtils.concat(children(label), siblings(label)) }
>>>                - String toString()
>>> 
>> 
>> I don't think you need siblings(). I think you need a more generic
>> select(), but since this is graph traversal, select() only needs the
>> identifier of a type (e.g. "knows") and the name of a field (e.g. "out").
>> 
>> 
>> 
>>> When a ComplexType goes over the wire to the user, it just represented as
>>> a ComplexTypeProxy with a toString() like v[1],
>>> tinkergraph[vertices:10,edges:34], etc. All references are disconnected.
>>> Yes, even children references. We do not want language drivers having to
>>> know about random object types and have to deal with implementing
>>> serializers and all that non-sense. The TP4 serialization protocol is
>>> primitives, maps, lists, bytecode, and traversers. Thats it!
>>> 
>> 
>> No disagreement here. I think the only disconnect is about what keys are
>> local to what elements. Some keys are hard-local, like id and type for all
>> elements, and "in" and "out" for edges and properties. These *should* be
>> carried over the wire. Properties, incident edges, etc. possibly but not
>> necessarily.
>> 
>> 
>> 
>>> *** Only Maps and Lists (that don’t contain complex data types) maintain
>>> their child references “over the wire.”
>>> 
>> 
>> Sure.
>> 
>> 
>> 
>>> I don’t get your hypergraph example, so let me try another example:
>>> 
>>>        tp ==member==> marko, josh
>>> 
>>> TP is a vertex and there is a directed hyperedge with label “member”
>>> connecting to marko and josh vertices.
>>> 
>> 
>> That's kind of an unlabeled hyperedge; I am not sure we need to support
>> those. Look at the GRAKN data model, or at HypergraphDB or earlier
>> hypergraph data models. A hyperedge is essentially a tuple in which each
>> components has a label ("role", in GRAKN). In other words, it is a relation
>> in which some of the columns may be foreign keys. In your example, rather
>> than "member" connecting "tp" to a set of vertices, you might have
>> something like Collaborated{person1:marko, person2:josh, project=tp}. Then
>> a query like "who did marko collaborate with on tp?" becomes:
>> 
>>    tp.from("Collaborated", "project").restrict("person1",
>> "marko").to("person2")
>> 
>> Of course, if you want this relationship to be symmetrical, you can
>> introduce a constraint.
>> 
>> 
>> 
>>> tp.goto(“outE”).filter(goto(“label”).is(“member”)).goto(“inV”)
>>> 
>>> Looks exactly like a property graph query? However, its not because
>>> goto(“inV”) returns 2 vertices, not 1.
>> 
>> 
>> I think your example works well for the type of hypergraph you are
>> referring to. It's just different than the type of hypergraph I am
>> referring to. I think by now you know that I would rather see a from()
>> instead of that goto("outE"). I also agree you can make a function out of
>> outE, and expose it using a map, if you really want to. Under the hood,
>> however, I see this as traversing a projection head to tail rather than
>> tail to head.
>> 
>> 
>> 
>>> EdgeVertexFlatmapFunction works for property graphs and hypergraphs. It
>>> doesn’t care — it just follows goto() pointers! That is, it follows the
>>> ComplexType.references(“inV”). Multi-properties are the same as well.
>>> Likewise for meta-properties. These data model variations are not “special”
>>> to the TP4 VM. It just walks references whether there are 0,1,2, or N of
>>> them.
>>> 
>> 
>> At a high level, I agree with what you are saying. We should have a common
>> data model that unifies traditional property graphs, hypergraphs,
>> relational databases, and any other data model that can be modeled using
>> algebraic data types with references. We define a small set of basic
>> operations on this data model which can be combined into more complex
>> operations that are amenable to static analysis and optimization. We can
>> send graph data over the wire as collections of elements using the bare
>> minimum of local fields, and reconstruct the graph on the other end. We can
>> operate on streams of such elements under suitable conditions (elements
>> sent in an appropriate order). The basic operations are not tied to the
>> JVM, and should be straightforward to implement in other frameworks.
>> 
>> 
>> 
>>> 
>>> Thus, what is crucial to all this is the “shape of the data.” Using your
>>> pointers wisely so instructions produce useful results.
>>> 
>> 
>> +1
>> 
>> 
>> 
>>> Does any of what I wrote update your comeFrom().goto() stuff?
>> 
>> 
>> Sadly, no, though I appreciate that you are coming from a slightly
>> different place w.r.t. properties, hypergraphs, and most importantly, the
>> role of a type system.
>> 
>> 
>> 
>>> If not, can you please explain to me why comeFrom() is cool — sorry for
>>> being dense (aka “being Kuppitz" — thats right, I said it. boom!).
>>> 
>> 
>> Let's keep iterating until we reach a fixed point. Maybe Daniel's already
>> there.
>> 
>> Josh
>> 
>> 
>> 
>>> 
>>> Thanks,
>>> Marko.
>>> 
>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>
>>> 
>>> 
>>> 
>>> 
>>>> On Apr 23, 2019, at 10:25 AM, Joshua Shinavier <josh@fortytwo.net <ma...@fortytwo.net>>
>>> wrote:
>>>> 
>>>> On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>>
>>>> wrote:
>>>> 
>>>>> Hey Josh,
>>>>> 
>>>>> This gets to the notion I presented in “The Fabled GMachine.”
>>>>>       http://rredux.com/the-fabled-gmachine.html <http://rredux.com/the-fabled-gmachine.html> <
>>>>> http://rredux.com/the-fabled-gmachine.html <http://rredux.com/the-fabled-gmachine.html>> (first paragraph of
>>>>> “Structures, Processes, and Languages” section)
>>>>> 
>>>>> All that exists are memory addresses that contain either:
>>>>> 
>>>>>       1. A primitive
>>>>>       2. A set of labeled references to other references or primitives.
>>>>> 
>>>>> Using your work and the above, here is a super low-level ‘bytecode' for
>>>>> property graphs.
>>>>> 
>>>>> v.goto("id") => 1
>>>>> 
>>>> 
>>>> LGTM. An id is special because it is uniquely identifying / is a primary
>>>> key for the element. However, it is also just a field of the element,
>>> like
>>>> "in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id
>>> would
>>>> only really need to be unique among other elements of the same type. To
>>> the
>>>> above, I would add:
>>>> 
>>>> v.type() => Person
>>>> 
>>>> ...a special operation which takes you from an element to its type. This
>>> is
>>>> important if unions are supported; e.g. "name" in my example can apply
>>>> either to a Person or a Project.
>>>> 
>>>> 
>>>> v.goto("label") => person
>>>>> 
>>>> 
>>>> Or that. Like "id", "type"/"label" is special. You can think of it as a
>>>> field; it's just a different sort of field which will have the same value
>>>> for all elements of any given type.
>>>> 
>>>> 
>>>> 
>>>>> v.goto("properties").goto("name") => "marko"
>>>>> 
>>>> 
>>>> OK, properties. Are properties built-in as a separate kind of thing from
>>>> edges, or can we treat them the same as vertices and edges here? I think
>>> we
>>>> can treat them the same. A property, in the algebraic model I described
>>>> above, is just an element with two fields, the second of which is a
>>>> primitive value. As I said, I think we need two distinct traversal
>>>> operations -- projection and selection -- and here is where we can use
>>> the
>>>> latter. Here, I will call it "comeFrom".
>>>> 
>>>> v.comeFrom("name", "out").goto("in") => {"marko"}
>>>> 
>>>> You can think of this comeFrom as a special case of a select() function
>>>> which takes a type -- "name" -- and a set of key/value pairs {("out",
>>> v)}.
>>>> It returns all matching elements of the given type. You then project to
>>> the
>>>> "in" value using your goto. I wrote {"marko"} as a set, because comeFrom
>>>> can give you multiple properties, depending on whether multi-properties
>>> are
>>>> supported.
>>>> 
>>>> Note how similar this is to an edge traversal:
>>>> 
>>>> v.comeFrom("knows", "out").goto("in") => {v[2], v[4]}
>>>> 
>>>> Of course, you could define "properties" in such a way that a
>>>> goto("properties") does exactly this under the hood, but in terms of low
>>>> level instructions, you need something like comeFrom.
>>>> 
>>>> 
>>>> v.goto("properties").goto("name").goto(0) => "m"
>>>>> 
>>>> 
>>>> This is where the notion of optionals becomes handy. You can make
>>>> array/list indices into fields like this, but IMO you should also make
>>> them
>>>> safe. E.g. borrowing Haskell syntax for a moment:
>>>> 
>>>> v.goto("properties").goto("name").goto(0) => Just 'm'
>>>> 
>>>> v.goto("properties").goto("name").goto(5) => Nothing
>>>> 
>>>> 
>>>> v.goto("outE").goto("inV") => v[2], v[4]
>>>>> 
>>>> 
>>>> I am not a big fan of untyped "outE", but you can think of this as a
>>> union
>>>> of all v.comeFrom(x, "out").goto("in"), where x is any edge type. Only
>>>> "knows" and "created" are edge types which are applicable to "Person", so
>>>> you will only get {v[2], v[4]}. If you want to get really crazy, you can
>>>> allow x to be any type. Then you get {v[2], v[4], 29, "marko"}.
>>>> 
>>>> 
>>>> 
>>>>> g.goto("V").goto(1) => v[1]
>>>>> 
>>>> 
>>>> That, or you give every element a virtual field called "graph". So:
>>>> 
>>>> v.goto("graph") => g
>>>> 
>>>> g.comeFrom("Person", "graph") => {v[1], v[2], v[4], v[6]}
>>>> 
>>>> g.comeFrom("Person", "graph").restrict("id", 1)
>>>> 
>>>> ...where restrict() is the relational "sigma" operation as above, not to
>>> be
>>>> confused with TinkerPop's select(), filter(), or has() steps. Again, I
>>>> prefer to specify a type in comeFrom (i.e. we're looking specifically
>>> for a
>>>> Person with id of 1), but you could also do a comprehension g.comeFrom(x,
>>>> "graph"), letting x range over all types.
>>>> 
>>>> 
>>>> 
>>>>> The goto() instruction moves the “memory reference” (traverser) from the
>>>>> current “memory address” to the “memory address” referenced by the
>>> goto()
>>>>> argument.
>>>>> 
>>>> 
>>>> Agreed, if we also think of primitive values as memory references.
>>>> 
>>>> 
>>>> 
>>>>> The Gremlin expression:
>>>>> 
>>>>>       g.V().has(‘name’,’marko’).out(‘knows’).drop()
>>>>> 
>>>>> ..would compile to:
>>>>> 
>>>>> 
>>>>> 
>>> g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()
>>>>> 
>>>> 
>>>> 
>>>> In the alternate universe:
>>>> 
>>>> g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in",
>>>> "marko").goto("out").comeFrom("knows", "out").goto("in").free()
>>>> 
>>>> I have wimped out on free() and just left it as you had it, but I think
>>> it
>>>> would be worthwhile to explore a monadic syntax for traversals with
>>>> side-effects. Different topic.
>>>> 
>>>> Now, all of this "out", "in" business is getting pretty repetitive,
>>> right?
>>>> Well, the field names become more diverse if we allow hyper-edges and
>>>> generalized ADTs. E.g. in my Trip example, say I want to know all
>>> drop-off
>>>> locations for a given rider:
>>>> 
>>>> u.comeFrom("Trip", "rider").goto("dropoff").goto("place")
>>>> 
>>>> Done.
>>>> 
>>>> 
>>>> 
>>>>> If we can get things that “low-level” and still efficient to compile,
>>> then
>>>>> we can model every data structure. All you are doing is pointer chasing
>>>>> through a withStructure() data structure. .
>>>>> 
>>>> 
>>>> Agreed.
>>>> 
>>>> 
>>>> No one would ever want to write strategies for goto()-based Bytecode.
>>>> 
>>>> 
>>>> Also agreed.
>>>> 
>>>> 
>>>> 
>>>>> Thus, perhaps there could be a PropertyGraphDecorationStrategy that
>>> does:
>>>>> 
>>>>> [...]
>>>> 
>>>> 
>>>> No argument here, though the alternate-universe "bytecode" would look
>>>> slightly different. And the high-level syntax should also be able to deal
>>>> with generalized relations / data types gracefully. As a thought
>>>> experiment, suppose we were to define the steps to() as your goto(), and
>>>> from() as my comeFrom(). Then traversals like:
>>>> 
>>>> u.from("Trip", "rider").to("dropoff").to("time")
>>>> 
>>>> ...look pretty good as-is, and are not too low-level. However, ordinary
>>>> edge traversals like:
>>>> 
>>>> v.from("knows", "out").to("in")
>>>> 
>>>> ...do look a little Assembly-like. So in/out/both etc. remain as they
>>> are,
>>>> but are shorthand for from() and to() steps using "out" or "in":
>>>> 
>>>> v.out("knows") === v.outE("knows").inV() === v.from("knows",
>>> "out").to("in")
>>>> 
>>>> 
>>>> [I AM NOW GOING OFF THE RAILS]
>>>>> [sniiiiip]
>>>>> 
>>>> 
>>>> Sure. Again, I like the idea of wrapping side-effects in monads. What
>>> would
>>>> that look like in a Gremlinesque fluent syntax? I don't quite know, but
>>> if
>>>> we think of the dot as a monadic bind operation like Haskell's >>=, then
>>>> perhaps the monadic expressions look pretty similar to what you have just
>>>> sketched out. Might have to be careful about what it means to nest
>>>> operations as in your addEdge examples.
>>>> 
>>>> 
>>>> 
>>>> [I AM NOW BACK ON THE RAILS]
>>>>> 
>>>>> Its as if “properties”, “outE”, “label”, “inV”, etc. references mean
>>>>> something to property graph providers and they can do more intelligent
>>>>> stuff than what MongoDB would do with such information. However,
>>> someone,
>>>>> of course, can create a MongoDBPropertyGraphStrategy that would make
>>>>> documents look like vertices and edges and then use O(log(n)) lookups on
>>>>> ids to walk the graph. However, if that didn’t exist, it would still do
>>>>> something that works even if its horribly inefficient as every database
>>> can
>>>>> make primitives with references between them!
>>>>> 
>>>> 
>>>> I'm on the same same pair of rails.
>>>> 
>>>> 
>>>> 
>>>>> Anywho @Josh, I believe goto() is what you are doing with
>>> multi-references
>>>>> off an object. How do we make it all clean, easy, and universal?
>>>>> 
>>>> 
>>>> Let me know what you think of the above.
>>>> 
>>>> Josh
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Marko.
>>>>> 
>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <josh@fortytwo.net <ma...@fortytwo.net>>
>>> wrote:
>>>>>> 
>>>>>> Ah, glad you asked. It's all in the pictures. I have nowhere to put
>>> them
>>>>> online at the moment... maybe this attachment will go through to the
>>> list?
>>>>>> 
>>>>>> Btw. David Spivak gave his talk today at Uber; it was great. Juan
>>>>> Sequeda (relational <--> RDF mapping guy) was also here, and Ryan joined
>>>>> remotely. Really interesting discussion about databases vs. graphs, and
>>>>> what category theory brings to the table.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>
>>>>> <mailto:okrammarko@gmail.com <ma...@gmail.com>>> wrote:
>>>>>> Hey Josh,
>>>>>> 
>>>>>> I’m digging what you are saying, but the pictures didn’t come through
>>>>> for me ? … Can you provide them again (or if dev@ is filtering them,
>>> can
>>>>> you give me URLs to them)?
>>>>>> 
>>>>>> Thanks,
>>>>>> Marko.
>>>>>> 
>>>>>> 
>>>>>>> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <josh@fortytwo.net <ma...@fortytwo.net>
>>>>> <mailto:josh@fortytwo.net <ma...@fortytwo.net>>> wrote:
>>>>>>> 
>>>>>>> On the subject of "reified joins", maybe be a picture will be worth a
>>>>> few words. As I said in the thread <
>>>>> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ>
>>> <
>>>>> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ>
>>>>> 
>>>>> on property graph standardization, if you think of vertex labels, edge
>>>>> labels, and property keys as types, each with projections to two other
>>>>> types, there is a nice analogy with relations of two columns, and this
>>>>> analogy can be easily extended to hyper-edges. Here is what the schema
>>> of
>>>>> the TinkerPop classic graph looks like if you make each type (e.g.
>>> Person,
>>>>> Project, knows, name) into a relation:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I have made the vertex types salmon-colored, the edge types yellow,
>>>>> the property types green, and the data types blue. The "o" and "I"
>>> columns
>>>>> represent the out-type (e.g. out-vertex type of Person) and in-type
>>> (e.g.
>>>>> property value type of String) of each relation. More than two arrows
>>> from
>>>>> a column represent a coproduct, e.g. the out-type of "name" is Person OR
>>>>> Project. Now you can think of out() and in() as joins of two tables on a
>>>>> primary and foreign key.
>>>>>>> 
>>>>>>> We are not limited to "out" and "in", however. Here is the ternary
>>>>> relationship (hyper-edge) from hyper-edge slide <
>>>>> 
>>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49 <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>
>>>>> <
>>>>> 
>>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
>>>>> 
>>>>> of my Graph Day preso, which has three columns/roles/projections:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I have drawn Says in light blue to indicate that it is a generalized
>>>>> element; it has projections other than "out" and "in". Now the line
>>> between
>>>>> relations and edges begins to blur. E.g. in the following, is
>>> PlaceEvent a
>>>>> vertex or a property?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> With the right type system, we can just speak of graph elements, and
>>>>> use "vertex", "edge", "property" when it is convenient. In the
>>> relational
>>>>> model, they are relations. If you materialize them in a relational
>>>>> database, they are rows. In any case, you need two basic graph traversal
>>>>> operations:
>>>>>>> project() -- forward traversal of the arrows in the above diagrams.
>>>>> Takes you from an element to a component like in-vertex.
>>>>>>> select() -- reverse traversal of the arrows. Allows you to answer
>>>>> questions like "in which Trips is John Doe the rider?"
>>>>>>> 
>>>>>>> Josh
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <
>>> okrammarko@gmail.com
>>>>> <ma...@gmail.com> <mailto:okrammarko@gmail.com <mailto:
>>>>> okrammarko@gmail.com>>> wrote:
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I agree with everything you say. Here is my question:
>>>>>>> 
>>>>>>>       Relational database — join: Table x Table x equality function
>>>>> -> Table
>>>>>>>       Graph database — traverser: Vertex x edge label -> Vertex
>>>>>>> 
>>>>>>> I want a single function that does both. The only think was to
>>>>> represent traverser() in terms of join():
>>>>>>> 
>>>>>>>       Graph database — traverser: Vertices x Vertex x equality
>>>>> function -> Vertices
>>>>>>> 
>>>>>>> For example,
>>>>>>> 
>>>>>>> V().out(‘address’)
>>>>>>> 
>>>>>>>       ==>
>>>>>>> 
>>>>>>> g.join(V().hasLabel(‘person’).as(‘a’)
>>>>>>>      V().hasLabel(‘addresses’).as(‘b’)).
>>>>>>>        by(‘name’).select(?address vertex?)
>>>>>>> 
>>>>>>> That is, join the vertices with themselves based on some predicate to
>>>>> go from vertices to vertices.
>>>>>>> 
>>>>>>> However, I would like instead to transform the relational database
>>>>> join() concept into a traverser() concept. Kuppitz and I were talking
>>> the
>>>>> other day about a link() type operator that says: “try and link to this
>>>>> thing in some specified way.” .. ?? The problem we ran into is again,
>>> “link
>>>>> it to what?”
>>>>>>> 
>>>>>>>       - in graph, the ‘to what’ is hardcoded so you don’t need to
>>>>> specify anything.
>>>>>>>       - in rdbms, the ’to what’ is some other specified table.
>>>>>>> 
>>>>>>> So what does the link() operator look like?
>>>>>>> 
>>>>>>> ——
>>>>>>> 
>>>>>>> Some other random thoughts….
>>>>>>> 
>>>>>>> Relational databases join on the table (the whole collection)
>>>>>>> Graph databases traverser on the vertex (an element of the whole
>>>>> collection)
>>>>>>> 
>>>>>>> We can make a relational database join on single row (by providing a
>>>>> filter to a particular primary key). This is the same as a table with
>>> one
>>>>> row. Likewise, for graph in the join() context above:
>>>>>>> 
>>>>>>> V(1).out(‘address’)
>>>>>>> 
>>>>>>>       ==>
>>>>>>> 
>>>>>>> g.join(V(1).as(‘a’)
>>>>>>>      V().hasLabel(‘addresses’).as(‘b’)).
>>>>>>>        by(‘name’).select(?address vertex?)
>>>>>>> 
>>>>>>> More thoughts please….
>>>>>>> 
>>>>>>> Marko.
>>>>>>> 
>>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
>>>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>>>>> http://rredux.com/ <http://rredux.com/>>>
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.martin@gmail.com
>>>>> <ma...@gmail.com> <mailto:pieter.martin@gmail.com
>>> <mailto:
>>>>> pieter.martin@gmail.com>>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> The way I saw it is that the big difference is that graph's have
>>>>>>>> reified joins. This is both a blessing and a curse.
>>>>>>>> A blessing because its much easier (less text to type, less mistakes,
>>>>>>>> clearer semantics...) to traverse an edge than to construct a manual
>>>>>>>> join.A curse because there are almost always far more ways to
>>>>> traverse
>>>>>>>> a data set than just by the edges some architect might have
>>>>> considered
>>>>>>>> when creating the data set. Often the architect is not the domain
>>>>>>>> expert and the edges are a hardcoded layout of the dataset, which
>>>>>>>> almost certainly won't survive the real world's demands. In graphs,
>>>>> if
>>>>>>>> their are no edges then the data is not reachable, except via indexed
>>>>>>>> lookups. This is the standard engineering problem of database design,
>>>>>>>> but it is important and useful that data can be traversed, joined,
>>>>>>>> without having reified edges.
>>>>>>>> In Sqlg at least, but I suspect it generalizes, I want to create the
>>>>>>>> notion of a "virtual edge". Which in meta data describes the join and
>>>>>>>> then the standard to(direction, "virtualEdgeName") will work.
>>>>>>>> In a way this is precisely to keep the graphy nature of gremlin, i.e.
>>>>>>>> traversing edges, and avoid using the manual join syntax you
>>>>> described.
>>>>>>>> CheersPieter
>>>>>>>> 
>>>>>>>> On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
>>>>>>>>> Hi,
>>>>>>>>> *** This is mainly for Kuppitz, but if others care.
>>>>>>>>> Was thinking last night about relational data and Gremlin. The T()
>>>>>>>>> step returns all the tables in the withStructure() RDBMS database.
>>>>>>>>> Tables are ‘complex values’ so they can't leave the VM (only a
>>>>> simple
>>>>>>>>> ‘toString’).
>>>>>>>>> Below is a fake Gremlin session. (and these are just ideas…) tables
>>>>>>>>> -> a ListLike of rows        rows -> a MapLike of primitives
>>>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>>>>>>>>> g.T(‘people’)==>t[people]gremlin>
>>>>>>>>> 
>>>>> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
>>>>>>>>> in>
>>>>>>>>> 
>>>>> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
>>>>>>>>> age:10}==>{name:josh,age:35}gremlin>
>>>>>>>>> 
>>>>> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
>>>>>>>>> remlin>
>>>>>>>>> 
>>>>> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
>>>>>>>>> h
>>>>>>>>> Makes sense. Nice that values() and has() generally apply to all
>>>>>>>>> ListLike and MapLike structures. Also, note how asMap() is the
>>>>>>>>> valueMap() of TP4, but generalizes to anything that is MapLike so it
>>>>>>>>> can be turned into a primitive form as a data-rich result from the
>>>>>>>>> VM.
>>>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>>>>>>>>> 
>>>>> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
>>>>>>>>> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
>>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>>>>> by(se
>>>>>>>>> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
>>>>> 
>>>>>>>>> values().asMap()==>{a.name:marko,a.age:29,b.name:
>>>>> marko,b.city:santafe
>>>>>>>>> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{
>>>>> a.name <http://a.name/> <http://a.name/ <http://a.name/>>:
>>>>>>>>> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
>>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>>>>> by(’n
>>>>>>>>> ame’). // shorthand for equijoin on name
>>>>>>>>> column/key           values().asMap()==>{a.name:marko,a.age:29,
>>>>> b.name <http://b.name/> <http://b.name/ <http://b.name/>>
>>>>>>>>> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,
>>>>> b.ci <http://b.ci/> <http://b.ci/ <http://b.ci/>>
>>>>>>>>> ty:tucson}==>{a.name:josh,a.age:35,b.name:
>>>>> josh,b.city:desertisland}gr
>>>>>>>>> emlin>
>>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>>>>> by(’n
>>>>>>>>> ame’)==>t[people<-name->addresses]  // without asMap(), just the
>>>>>>>>> complex value ‘toString'gremlin>
>>>>>>>>> And of course, all of this is strategized into a SQL call so its
>>>>>>>>> joins aren’t necessarily computed using TP4-VM resources.
>>>>>>>>> Anywho — what I hope to realize is the relationship between “links”
>>>>>>>>> (graph) and “joins” (tables). How can we make (bytecode-wise at
>>>>>>>>> least) RDBMS join operations and graph traversal operations ‘the
>>>>>>>>> same.’?
>>>>>>>>>    Singleton: Integer, String, Float, Double, etc. Collection:
>>>>>>>>> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
>>>>>>>>> Vertices and Tables can be “linked.” Unlike Collections, they don’t
>>>>>>>>> maintain a “parent/child” relationship with the objects they
>>>>>>>>> reference. What does this mean……….?
>>>>>>>>> Take care,Marko.
>>>>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
>>>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>>>>> http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ <
>>>>> http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <
>>>>> http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <
>>>>> http://rredux.com/>>>>
>>>>>> 
>>>>>> <diagrams.zip>
>>>>> 
>>>>> 
>>> 
>>> 
>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Stephen Mallette <sp...@gmail.com>.

Trying to catch up on threads a bit...enjoying the discussion and I hope
I'm following along fully because it's sounding really nice. Letting the
type system be so open in previous versions of TinkerPop has created so
many inconsistencies and inelegant solutions which have only be exaggerated
by Gremlin Language Variants. Anyway, regarding:

>>         5. ComplexTypes don’t go over the wire — a ComplexTypeProxy with
>> appropriately provided toString() is all that leaves the TP4 VM.
>>

> As a tuple, ComplexTypes / ADTs go over the wire. The values of their
> primitive fields should probably go with them. However, the values of
their
> element / entity fields are just references; the attached element doesn't
> go with them.

I think I'd agree with Josh that we'd send these back over the wire,
especially if there is agreement that they are just a tuple form which
means that providers won't need to get into low-level serializer
development for custom types. TinkerPop would just know how to deal with
them for network transport. I guess providers would just have to provide
libraries with the ComplexType/ADT implementations in the programming
languages they wanted to support. In cases where they didn't, a user could
be left to work with a raw TinkerPop ComplexType/ADT instance which could
arguably be a better state than where they are left now which would be
serialization errors.



On Thu, Apr 25, 2019 at 2:07 PM Joshua Shinavier <jo...@fortytwo.net> wrote:

> Hi Marko. Responses inline.
>
> On Wed, Apr 24, 2019 at 10:30 AM Marko Rodriguez <ok...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I think I understand you now. The concept of local and non-local data is
> > what made me go “ah!”
> >
>
> Nice. I also brought this up yesterday in the Property Graph Schema Working
> Group, where there is a discussion going on about whether/how graph
> databases can contain multiple graphs. Can an element belong to multiple
> graphs, can it have different properties in different graphs, etc. If each
> graph element is atomic, referencing other graph elements but not
> containing them, then it is very straightforward to think of a property
> graph as a simple set of elements. Graph relations are just set relations,
> making it easy to pull graphs apart and put graphs together (e.g. when
> building a stream, merging streams, etc.). If you are willing to make the
> open world assumption (e.g. "I know e[7] is a 'knows' edge, but I don't
> know what its out- and in-vertices are"), then you can't even partition a
> graph in such a way that the partitions are not valid graphs.
>
>
> So let me reiterate what I think you are saying.
> >
> > v[1] is guaranteed to have its id data local to it. All other information
> > could be derived via id-based "equi-joins.” Thus, we can’t assume that a
> > vertex will always have its properties and edges co-located with it.
>
>
> Yes indeed. A particular graph vendor may choose to co-locate properties
> with a vertex and edges with out- or in-vertex (or both, e.g. as JanusGraph
> does), but this is an optimization. At a logical level, you can think of an
> element and its dependents as belonging to separate relations.
>
>
>
> > However, we can assume that it knows where to get its property and edge
> > data when requested.
>
>
> Yes; you need to be able to select().
>
>
>
> > Assume the following RDBMS-style data structure that is referenced by
> > com.example.MyGraph.
> >
> > vertex_table
> > id label
> > 1  person
> > 2  person
> > …
>
>
> That is one way to go. I believe this scheme is what Ryan and David would
> call the Grothendieck construction; all relations of a given arity are
> marked with their type and concatenated into a single relation. I am still
> a little sketchy on the Grothendieck construction, so I hope that is a
> correct statement.
>
> However, you can also think of distinct element types (edge labels, vertex
> labels, property keys, hyperedge signatures) as distinct relations. So you
> instead of vertex_table, you would have
>
> person_table
> id
> 1
> 2
>
> Vertices are such trivial relations that they don't need to be stored as a
> tables. Edges are more interesting:
>
> knows_table
> out in
> 1 2
> 1 4
>
> Properties are similar:
>
> name_table
> out out_label out in
> 1 person marko
> 2 person vadas
> 3 project lop
> 4 person josh
> 5 project ripple
> 6 person peter
>
> The property table has a bit of a twist, because its out-label is a
> disjoint union of "person" and "project"; both persons and projects can
> have names, so you tag the out-element with its label/type. This is not
> necessary for "knows" because the out-label is always "person".
>
>
>
> > properties_table
> > id  name   age
> > 1   marko  29
> > 2   josh   35
> > …
> >
> > edge_table
> > id outV  label  inV
> > 0  1    knows   2
> > …
> >
>
> Yes, this also works, and is equivalent to what I wrote above, with one
> tweak: if tagged unions are supported (which IMO they should be, so we have
> both a "times" and a "plus" in our type algebra), then property_table
> should also include a "label" column, and edge_table should include
> "outLabel" and "inLabel", i.e.:
>
> properties_table
> id  label name   age
> 1   person marko  29
> 2   person vadas   27
> …
>
> and
>
> edge_table
> id label outV  outLabel  inV inLabel
> 0  knows 1 person  2 person
> …
>
> In a tagged union, you mark the type of a field along with the value or
> reference, for the sake of type checking and pattern matching.
>
> Hard to say whether the table-per-relation or the table-per-arity approach
> is better. FWIW, at Uber, we use a table per relation for the sake of
> better data isolation. If you want to take advantage of the physical types
> of the database, you may want multiple properties_tables, one per datatype
> (so you're not storing every property value as a string).
>
>
> If we want to say that the above data structure is a graph, what is
> > required of “ComplexType” such that we can satisfy both Neo4j-style and
> > RDBMS-style graph encodings? Assume ComplexType is defined as:
> >
> > interface ComplexType
> >   Iterator<T> adjacents(String label, Object... identifiers)
> >
>
> You can think of a ComplexType as a row in a database. It just has the
> local fields specific to the type. In order to access attached elements,
> you need a select(), and your adjacents() looks pretty close to that. I
> would write:
>
> Iterator<T> adjacents(String label, String field)
>
> So for example, adjacents("knows", "out") from v[1] gives you an iterator
> of "knows" edges for which v[1] is the out-vertex. Btw. adjacents() here is
> the same as from() / comeFrom() in previous emails.
>
>
>
> > Take this basic Gremlin traversal:
> >
> > g.V(1).out(‘knows’).values(‘name’)
> >
> > I now believe this should compile to the following:
> >
> > [goto,V,1] [goto,outE,knows] [goto,inV] [goto,properties,name]
> >
>
> Polymorphism is cool. Your two-argument goto() appears to be my to(),
> whereas your three-argument goto() appears to be my from(). The minimal
> tweaks I would make to your syntax are:
>
> [goto,V,1] [goto,out,knows] [goto,in] [goto,out,name][goto,in]
>
> I might go a step further and say:
>
> [const,1] [goto,id,person] [goto,out,knows] [goto,in]
> [goto,out,name][goto,in]
>
>
>
> > Given MyGraph/MyVertex/MyEdge all implement ComplexType and there is no
> > local caching of data on these respective objects, then the bytecode
> isn’t
> > rewritten and the following cascade of events occurs:
> >
> > [...]
> >
>
> Looks pretty good.
>
>
>
> > Lets review the ComplexType adjacents()-method:
> >
> > complexType.adjacents(label,identifiers...)
> >
> > complexType must have sufficient information to represent the tail of the
> > relation.
> >
>
> Yes; it need to know what relation type you are matching, and on what field
> (e.g. "out"/"in") in that relation. Note that the table-per-relation
> approach is most appropriate when traversals are always strongly typed.
> E.g. when your step is v.out("knows") as opposed to v.out(). For v.out() to
> be supported efficiently, the monolithic table, or an element-to-type
> table, makes sense.
>
>
>
> > label specifies the relation type (we will always assume that a single
> > String is sufficient)
> >
>
> Exactly. And yeah, I think it is safe to assume that types can be
> identified by strings. Want namespaces? Use a qualified name syntax
> appropriate for your application.
>
>
>
> > identifiers... must contain sufficient information to identify the head
> of
> > the relation.
> >
>
> Yes.
>
>
> The return of the the method adjacents() is then the object(s) on the other
> > side of the relation(s).
> >
>
> Yeah. We're taking an element and then iterating through all of the
> incoming projections, of a given label, to that element. The label is the
> name of the relation together with the name of the field/column.
>
>
>
> > Now, given the way I have my data structure organized, I could beef up
> the
> > MyXXX implementation such that MyStrategy rewrites the base bytecode to:
> >
> > [...]
> > Now, I could really beef up MyStrategy when I realize that no path
> > information is used in the traversal. Thus, the base bytecode compiles
> to:
> >
> > [my:sql,SELECT name FROM properties_table,vertex_table,edge_table WHERE …
> > lots of join equalities]
> >
>
> Something of the kind.
>
>
>
> > [...]
> > To recap.
> >
> >         1. There are primitives.
> >
>
> +1
>
>
> >         2. There are Maps and Lists.
> >
>
> Sure. Lists of primitives, and maps of primitives to primitives.
>
>
>
> >         3. There are ComplexTypes.
> >
>
> I like the fancy term "algebraic data types". They are just tuples in which
> each field is either:
> 1) a primitive value (possibly tagged with a type), or
> 2) an element reference (possibly tagged with a type)
>
> You also need a special "unit" type for optionals.
>
>
>
> >         4. ComplexTypes are adjacent to other objects via relations.
> >                 - These adjacent objects may be cached locally with the
> > ComplexType instance.
> >                 - These adjacent objects may require some database
> lookup.
> >                 - Regardless, TP4 doesn’t care — its up to the provider’s
> > ComplexType instance to decide how to resolve the adjacency.
> >
>
> +1
>
>
> >         5. ComplexTypes don’t go over the wire — a ComplexTypeProxy with
> > appropriately provided toString() is all that leaves the TP4 VM.
> >
>
> As a tuple, ComplexTypes / ADTs go over the wire. The values of their
> primitive fields should probably go with them. However, the values of their
> element / entity fields are just references; the attached element doesn't
> go with them.
>
>
>
> > Finally, to solve the asMap()/asList() problem, we simply have:
> >
> > asMap(’name’,’age’) => complexType.adjacents(‘asMap’,’name’,’age')
> > asList() => complexType.adjacents(‘asList’)
> >
>
> I think I need an example of asList(), but I agree that we can make
> properties into key/value maps. If we want to access metaproperties, then
> we don't use asMap().
>
>
>
> It is up to the complexType to manifest a Map or List accordingly.
> >
> > I see this as basically a big flatmap system. ComplexTypes just map from
> > self to any number of logical neighbors as specified by the relation.
> >
> > Am I getting it?,
> >
>
> Yeah, and I think I am getting how you break down traversals into basic
> instructions. Go go GMachine.
>
> Josh
>
>
>
>
> > Marko.
> >
> > http://rredux.com <http://rredux.com/>
> >
> >
> >
> >
> > > On Apr 24, 2019, at 9:56 AM, Joshua Shinavier <jo...@fortytwo.net>
> wrote:
> > >
> > > On Tue, Apr 23, 2019 at 10:28 AM Marko Rodriguez <okrammarko@gmail.com
> >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I think we are very close to something useable for TP4 structure/.
> > Solving
> > >> this problem elegantly will open the flood gates on tp4/ development.
> > >>
> > >
> > > Yes, and formality often brings elegance. I don't think we can do much
> > > better than relational algebra and relational calculus in terms of
> > > formality, so to the extent we can reduce the fundamental TP4 traversal
> > > steps to basic relational operations, the floodgates will also be open
> to
> > > applications of query validation and query optimization from the last
> 40+
> > > years of research.
> > >
> > >
> > >
> > >> I still don’t grock your comeFrom().goto() stuff. I don’t get the
> > benefit
> > >> of having two instructions for “pointer chasing” instead of one.
> > >>
> > >
> > > There are just a handful of basic operations in relational algebra.
> > > Projection, selection, union, complement, Cartesian product. Joins, as
> > well
> > > as all other operations, can be derived from these. A lot of graph
> > > traversal can be accomplished using only projection and selection,
> which
> > is
> > > why we were able to get away with only to/goto and from/comeFrom in the
> > > examples above. However, I believe you do need both operations. You can
> > > kind of get away without from() if you assume that each vertex has
> local
> > > inE and outE references to incoming and outgoing edges, but I see that
> > as a
> > > kind of pre-materialized from()/select(). If you think of edges
> strictly
> > as
> > > relations, and represent them in a straightforward way with tables, you
> > > don't need the local inE and outE; whether you have them depends on the
> > > graph back-end.
> > >
> > >
> > >
> > >> Lets put that aside for now and lets turn to modeling a Vertex. Go
> back
> > to
> > >> my original representation:
> > >>
> > >> vertex.goto(‘label’)
> > >> vertex.goto(‘id’)
> > >>
> > >
> > > Local (in my view). All good.
> > >
> > >
> > >
> > >> vertex.goto(‘outE’)
> > >> vertex.goto(‘inE’)
> > >> vertex.goto(‘properties’)
> > >>
> > >
> > > Non-local (in my view). You can use goto(), but if the goal is to bring
> > the
> > > relational model into the fold, at a lower level you do have a select()
> > > operation. Unless you make projections local to vertices instead of
> > edges,
> > > but then you just have the same problem in reverse. Am I making sense?
> > >
> > >
> > > Any object can be converted into a Map. In TinkerPop3 we convert
> vertices
> > >> into maps via:
> > >>
> > >>        g.V().has(‘name’,’marko’).valueMap() => {name:marko,age:29}
> > >>        g.V().has(‘name’,’marko’).valueMap(true) =>
> > >> {id:1,label:person,name:marko,age:29}
> > >>
> > >
> > > Maps are A-OK. In the case of properties, I think where we differ is
> that
> > > you see a property like "name" as a key/value pair in a map local to
> the
> > > vertex. I see the property as an element of type "name", with the
> vertex
> > as
> > > a value in its local map, logically if not physically. This allows
> > maximum
> > > flexibility in terms of meta-properties -- exotic beasts which seem to
> be
> > > in a kind of limbo state in TP3, but if we're trying to be as general
> as
> > > possible, some data models we might want to pull in, like GRAKN.AI, do
> > > allow this kind of flexibility.
> > >
> > >
> > >
> > >> In the spirit of instruction reuse, we should have an asMap()
> > instruction
> > >> that works for ANY object. (As a side: this gets back to ONLY sending
> > >> primitives over the wire, no
> > >> Vertex/Edge/Document/Table/Row/XML/ColumnFamily/etc.). Thus, the above
> > is:
> > >>
> > >>        g.V().has(‘name’,’marko’).properties().asMap() =>
> > >> {name:marko,age:29}
> > >>        g.V().has(‘name’,’marko’).asMap() =>
> > >> {id:1,label:person,properties:{name:marko,age:29}}
> > >>
> > >
> > > Again, no argument here, although I would think of a map as an
> > > optimization. IMO, the fundamental projections from v[1] are id:1 and
> > > label:Person. You could make a map out of these, or just use an offset,
> > > since the keys are always the same. However, you can also build a map
> > > including any key you can turn into a function. properties() is such a
> > key.
> > >
> > >
> > > You might ask, why didn’t it go to outE and inE and map-ify that data?
> > >> Because those are "sibling” references, not “children” references.
> > >>
> > >>        goto(‘outE’) is a “sibling” reference. (a vertex does not
> contain
> > >> an edge)
> > >>        goto(‘id’) is a “child” reference. (a vertex contains the id)
> > >>
> > >
> > > I agree with both of those statements. A vertex does not contain the
> > edges
> > > incident on it. Again, I am thinking of properties a bit more like
> edges
> > > for maximum generality.
> > >
> > >
> > >
> > >> Where do we find sibling references?
> > >>        Graphs: vertices don’t contain each other.
> > >>        OO heaps: many objects don’t contain each other.
> > >>        RDBMS: rows are linked by joins, but don’t contain each other.
> > >>
> > >
> > > Yep.
> > >
> > >
> > > So, the way in which we structure our references (pointers) determines
> > the
> > >> shape of the data and ultimately how different instructions will
> > behave. We
> > >> can’t assume that asMap() knows anything about
> > >> vertices/edges/documents/rows/tables/etc. It will simply walk all
> > >> child-references and create a map.
> > >>
> > >
> > > Just to play devil's advocate, you *could* include "inE" and "outE" as
> > keys
> > > in the local map of a vertex; it's just a matter of what you choose to
> > do.
> > > inE and outE are perfectly good functions from a vertex to a set of
> > edges.
> > >
> > >
> > > We don’t want TP to get involved in “complex data types.”
> > >
> > >
> > > Well, how do you feel about algebraic data types? They are simple, and
> > > allow you to capture arbitrary relations as elements.
> > >
> > >
> > >
> > >> We don’t care. You can propagate MyDatabaseObject through the TP4 VM
> > >> pipeline and load your object up with methods for optimizations with
> > your
> > >> DB and all that, but for TP4, your object is just needs to implement:
> > >>
> > >>        ComplexType
> > >>                - Iterator<T> children(String label)
> > >>                - Iterator<T> siblings(String label)
> > >>                - default Iterator<T> references(String label) {
> > >> IteratorUtils.concat(children(label), siblings(label)) }
> > >>                - String toString()
> > >>
> > >
> > > I don't think you need siblings(). I think you need a more generic
> > > select(), but since this is graph traversal, select() only needs the
> > > identifier of a type (e.g. "knows") and the name of a field (e.g.
> "out").
> > >
> > >
> > >
> > >> When a ComplexType goes over the wire to the user, it just represented
> > as
> > >> a ComplexTypeProxy with a toString() like v[1],
> > >> tinkergraph[vertices:10,edges:34], etc. All references are
> disconnected.
> > >> Yes, even children references. We do not want language drivers having
> to
> > >> know about random object types and have to deal with implementing
> > >> serializers and all that non-sense. The TP4 serialization protocol is
> > >> primitives, maps, lists, bytecode, and traversers. Thats it!
> > >>
> > >
> > > No disagreement here. I think the only disconnect is about what keys
> are
> > > local to what elements. Some keys are hard-local, like id and type for
> > all
> > > elements, and "in" and "out" for edges and properties. These *should*
> be
> > > carried over the wire. Properties, incident edges, etc. possibly but
> not
> > > necessarily.
> > >
> > >
> > >
> > >> *** Only Maps and Lists (that don’t contain complex data types)
> maintain
> > >> their child references “over the wire.”
> > >>
> > >
> > > Sure.
> > >
> > >
> > >
> > >> I don’t get your hypergraph example, so let me try another example:
> > >>
> > >>        tp ==member==> marko, josh
> > >>
> > >> TP is a vertex and there is a directed hyperedge with label “member”
> > >> connecting to marko and josh vertices.
> > >>
> > >
> > > That's kind of an unlabeled hyperedge; I am not sure we need to support
> > > those. Look at the GRAKN data model, or at HypergraphDB or earlier
> > > hypergraph data models. A hyperedge is essentially a tuple in which
> each
> > > components has a label ("role", in GRAKN). In other words, it is a
> > relation
> > > in which some of the columns may be foreign keys. In your example,
> rather
> > > than "member" connecting "tp" to a set of vertices, you might have
> > > something like Collaborated{person1:marko, person2:josh, project=tp}.
> > Then
> > > a query like "who did marko collaborate with on tp?" becomes:
> > >
> > >    tp.from("Collaborated", "project").restrict("person1",
> > > "marko").to("person2")
> > >
> > > Of course, if you want this relationship to be symmetrical, you can
> > > introduce a constraint.
> > >
> > >
> > >
> > >> tp.goto(“outE”).filter(goto(“label”).is(“member”)).goto(“inV”)
> > >>
> > >> Looks exactly like a property graph query? However, its not because
> > >> goto(“inV”) returns 2 vertices, not 1.
> > >
> > >
> > > I think your example works well for the type of hypergraph you are
> > > referring to. It's just different than the type of hypergraph I am
> > > referring to. I think by now you know that I would rather see a from()
> > > instead of that goto("outE"). I also agree you can make a function out
> of
> > > outE, and expose it using a map, if you really want to. Under the hood,
> > > however, I see this as traversing a projection head to tail rather than
> > > tail to head.
> > >
> > >
> > >
> > >> EdgeVertexFlatmapFunction works for property graphs and hypergraphs.
> It
> > >> doesn’t care — it just follows goto() pointers! That is, it follows
> the
> > >> ComplexType.references(“inV”). Multi-properties are the same as well.
> > >> Likewise for meta-properties. These data model variations are not
> > “special”
> > >> to the TP4 VM. It just walks references whether there are 0,1,2, or N
> of
> > >> them.
> > >>
> > >
> > > At a high level, I agree with what you are saying. We should have a
> > common
> > > data model that unifies traditional property graphs, hypergraphs,
> > > relational databases, and any other data model that can be modeled
> using
> > > algebraic data types with references. We define a small set of basic
> > > operations on this data model which can be combined into more complex
> > > operations that are amenable to static analysis and optimization. We
> can
> > > send graph data over the wire as collections of elements using the bare
> > > minimum of local fields, and reconstruct the graph on the other end. We
> > can
> > > operate on streams of such elements under suitable conditions (elements
> > > sent in an appropriate order). The basic operations are not tied to the
> > > JVM, and should be straightforward to implement in other frameworks.
> > >
> > >
> > >
> > >>
> > >> Thus, what is crucial to all this is the “shape of the data.” Using
> your
> > >> pointers wisely so instructions produce useful results.
> > >>
> > >
> > > +1
> > >
> > >
> > >
> > >> Does any of what I wrote update your comeFrom().goto() stuff?
> > >
> > >
> > > Sadly, no, though I appreciate that you are coming from a slightly
> > > different place w.r.t. properties, hypergraphs, and most importantly,
> the
> > > role of a type system.
> > >
> > >
> > >
> > >> If not, can you please explain to me why comeFrom() is cool — sorry
> for
> > >> being dense (aka “being Kuppitz" — thats right, I said it. boom!).
> > >>
> > >
> > > Let's keep iterating until we reach a fixed point. Maybe Daniel's
> already
> > > there.
> > >
> > > Josh
> > >
> > >
> > >
> > >>
> > >> Thanks,
> > >> Marko.
> > >>
> > >> http://rredux.com <http://rredux.com/>
> > >>
> > >>
> > >>
> > >>
> > >>> On Apr 23, 2019, at 10:25 AM, Joshua Shinavier <jo...@fortytwo.net>
> > >> wrote:
> > >>>
> > >>> On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez <
> okrammarko@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hey Josh,
> > >>>>
> > >>>> This gets to the notion I presented in “The Fabled GMachine.”
> > >>>>       http://rredux.com/the-fabled-gmachine.html <
> > >>>> http://rredux.com/the-fabled-gmachine.html> (first paragraph of
> > >>>> “Structures, Processes, and Languages” section)
> > >>>>
> > >>>> All that exists are memory addresses that contain either:
> > >>>>
> > >>>>       1. A primitive
> > >>>>       2. A set of labeled references to other references or
> > primitives.
> > >>>>
> > >>>> Using your work and the above, here is a super low-level ‘bytecode'
> > for
> > >>>> property graphs.
> > >>>>
> > >>>> v.goto("id") => 1
> > >>>>
> > >>>
> > >>> LGTM. An id is special because it is uniquely identifying / is a
> > primary
> > >>> key for the element. However, it is also just a field of the element,
> > >> like
> > >>> "in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id
> > >> would
> > >>> only really need to be unique among other elements of the same type.
> To
> > >> the
> > >>> above, I would add:
> > >>>
> > >>> v.type() => Person
> > >>>
> > >>> ...a special operation which takes you from an element to its type.
> > This
> > >> is
> > >>> important if unions are supported; e.g. "name" in my example can
> apply
> > >>> either to a Person or a Project.
> > >>>
> > >>>
> > >>> v.goto("label") => person
> > >>>>
> > >>>
> > >>> Or that. Like "id", "type"/"label" is special. You can think of it
> as a
> > >>> field; it's just a different sort of field which will have the same
> > value
> > >>> for all elements of any given type.
> > >>>
> > >>>
> > >>>
> > >>>> v.goto("properties").goto("name") => "marko"
> > >>>>
> > >>>
> > >>> OK, properties. Are properties built-in as a separate kind of thing
> > from
> > >>> edges, or can we treat them the same as vertices and edges here? I
> > think
> > >> we
> > >>> can treat them the same. A property, in the algebraic model I
> described
> > >>> above, is just an element with two fields, the second of which is a
> > >>> primitive value. As I said, I think we need two distinct traversal
> > >>> operations -- projection and selection -- and here is where we can
> use
> > >> the
> > >>> latter. Here, I will call it "comeFrom".
> > >>>
> > >>> v.comeFrom("name", "out").goto("in") => {"marko"}
> > >>>
> > >>> You can think of this comeFrom as a special case of a select()
> function
> > >>> which takes a type -- "name" -- and a set of key/value pairs {("out",
> > >> v)}.
> > >>> It returns all matching elements of the given type. You then project
> to
> > >> the
> > >>> "in" value using your goto. I wrote {"marko"} as a set, because
> > comeFrom
> > >>> can give you multiple properties, depending on whether
> multi-properties
> > >> are
> > >>> supported.
> > >>>
> > >>> Note how similar this is to an edge traversal:
> > >>>
> > >>> v.comeFrom("knows", "out").goto("in") => {v[2], v[4]}
> > >>>
> > >>> Of course, you could define "properties" in such a way that a
> > >>> goto("properties") does exactly this under the hood, but in terms of
> > low
> > >>> level instructions, you need something like comeFrom.
> > >>>
> > >>>
> > >>> v.goto("properties").goto("name").goto(0) => "m"
> > >>>>
> > >>>
> > >>> This is where the notion of optionals becomes handy. You can make
> > >>> array/list indices into fields like this, but IMO you should also
> make
> > >> them
> > >>> safe. E.g. borrowing Haskell syntax for a moment:
> > >>>
> > >>> v.goto("properties").goto("name").goto(0) => Just 'm'
> > >>>
> > >>> v.goto("properties").goto("name").goto(5) => Nothing
> > >>>
> > >>>
> > >>> v.goto("outE").goto("inV") => v[2], v[4]
> > >>>>
> > >>>
> > >>> I am not a big fan of untyped "outE", but you can think of this as a
> > >> union
> > >>> of all v.comeFrom(x, "out").goto("in"), where x is any edge type.
> Only
> > >>> "knows" and "created" are edge types which are applicable to
> "Person",
> > so
> > >>> you will only get {v[2], v[4]}. If you want to get really crazy, you
> > can
> > >>> allow x to be any type. Then you get {v[2], v[4], 29, "marko"}.
> > >>>
> > >>>
> > >>>
> > >>>> g.goto("V").goto(1) => v[1]
> > >>>>
> > >>>
> > >>> That, or you give every element a virtual field called "graph". So:
> > >>>
> > >>> v.goto("graph") => g
> > >>>
> > >>> g.comeFrom("Person", "graph") => {v[1], v[2], v[4], v[6]}
> > >>>
> > >>> g.comeFrom("Person", "graph").restrict("id", 1)
> > >>>
> > >>> ...where restrict() is the relational "sigma" operation as above, not
> > to
> > >> be
> > >>> confused with TinkerPop's select(), filter(), or has() steps. Again,
> I
> > >>> prefer to specify a type in comeFrom (i.e. we're looking specifically
> > >> for a
> > >>> Person with id of 1), but you could also do a comprehension
> > g.comeFrom(x,
> > >>> "graph"), letting x range over all types.
> > >>>
> > >>>
> > >>>
> > >>>> The goto() instruction moves the “memory reference” (traverser) from
> > the
> > >>>> current “memory address” to the “memory address” referenced by the
> > >> goto()
> > >>>> argument.
> > >>>>
> > >>>
> > >>> Agreed, if we also think of primitive values as memory references.
> > >>>
> > >>>
> > >>>
> > >>>> The Gremlin expression:
> > >>>>
> > >>>>       g.V().has(‘name’,’marko’).out(‘knows’).drop()
> > >>>>
> > >>>> ..would compile to:
> > >>>>
> > >>>>
> > >>>>
> > >>
> >
> g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()
> > >>>>
> > >>>
> > >>>
> > >>> In the alternate universe:
> > >>>
> > >>> g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in",
> > >>> "marko").goto("out").comeFrom("knows", "out").goto("in").free()
> > >>>
> > >>> I have wimped out on free() and just left it as you had it, but I
> think
> > >> it
> > >>> would be worthwhile to explore a monadic syntax for traversals with
> > >>> side-effects. Different topic.
> > >>>
> > >>> Now, all of this "out", "in" business is getting pretty repetitive,
> > >> right?
> > >>> Well, the field names become more diverse if we allow hyper-edges and
> > >>> generalized ADTs. E.g. in my Trip example, say I want to know all
> > >> drop-off
> > >>> locations for a given rider:
> > >>>
> > >>> u.comeFrom("Trip", "rider").goto("dropoff").goto("place")
> > >>>
> > >>> Done.
> > >>>
> > >>>
> > >>>
> > >>>> If we can get things that “low-level” and still efficient to
> compile,
> > >> then
> > >>>> we can model every data structure. All you are doing is pointer
> > chasing
> > >>>> through a withStructure() data structure. .
> > >>>>
> > >>>
> > >>> Agreed.
> > >>>
> > >>>
> > >>> No one would ever want to write strategies for goto()-based Bytecode.
> > >>>
> > >>>
> > >>> Also agreed.
> > >>>
> > >>>
> > >>>
> > >>>> Thus, perhaps there could be a PropertyGraphDecorationStrategy that
> > >> does:
> > >>>>
> > >>>> [...]
> > >>>
> > >>>
> > >>> No argument here, though the alternate-universe "bytecode" would look
> > >>> slightly different. And the high-level syntax should also be able to
> > deal
> > >>> with generalized relations / data types gracefully. As a thought
> > >>> experiment, suppose we were to define the steps to() as your goto(),
> > and
> > >>> from() as my comeFrom(). Then traversals like:
> > >>>
> > >>> u.from("Trip", "rider").to("dropoff").to("time")
> > >>>
> > >>> ...look pretty good as-is, and are not too low-level. However,
> ordinary
> > >>> edge traversals like:
> > >>>
> > >>> v.from("knows", "out").to("in")
> > >>>
> > >>> ...do look a little Assembly-like. So in/out/both etc. remain as they
> > >> are,
> > >>> but are shorthand for from() and to() steps using "out" or "in":
> > >>>
> > >>> v.out("knows") === v.outE("knows").inV() === v.from("knows",
> > >> "out").to("in")
> > >>>
> > >>>
> > >>> [I AM NOW GOING OFF THE RAILS]
> > >>>> [sniiiiip]
> > >>>>
> > >>>
> > >>> Sure. Again, I like the idea of wrapping side-effects in monads. What
> > >> would
> > >>> that look like in a Gremlinesque fluent syntax? I don't quite know,
> but
> > >> if
> > >>> we think of the dot as a monadic bind operation like Haskell's >>=,
> > then
> > >>> perhaps the monadic expressions look pretty similar to what you have
> > just
> > >>> sketched out. Might have to be careful about what it means to nest
> > >>> operations as in your addEdge examples.
> > >>>
> > >>>
> > >>>
> > >>> [I AM NOW BACK ON THE RAILS]
> > >>>>
> > >>>> Its as if “properties”, “outE”, “label”, “inV”, etc. references mean
> > >>>> something to property graph providers and they can do more
> intelligent
> > >>>> stuff than what MongoDB would do with such information. However,
> > >> someone,
> > >>>> of course, can create a MongoDBPropertyGraphStrategy that would make
> > >>>> documents look like vertices and edges and then use O(log(n))
> lookups
> > on
> > >>>> ids to walk the graph. However, if that didn’t exist, it would still
> > do
> > >>>> something that works even if its horribly inefficient as every
> > database
> > >> can
> > >>>> make primitives with references between them!
> > >>>>
> > >>>
> > >>> I'm on the same same pair of rails.
> > >>>
> > >>>
> > >>>
> > >>>> Anywho @Josh, I believe goto() is what you are doing with
> > >> multi-references
> > >>>> off an object. How do we make it all clean, easy, and universal?
> > >>>>
> > >>>
> > >>> Let me know what you think of the above.
> > >>>
> > >>> Josh
> > >>>
> > >>>
> > >>>
> > >>>>
> > >>>> Marko.
> > >>>>
> > >>>> http://rredux.com <http://rredux.com/>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <jo...@fortytwo.net>
> > >> wrote:
> > >>>>>
> > >>>>> Ah, glad you asked. It's all in the pictures. I have nowhere to put
> > >> them
> > >>>> online at the moment... maybe this attachment will go through to the
> > >> list?
> > >>>>>
> > >>>>> Btw. David Spivak gave his talk today at Uber; it was great. Juan
> > >>>> Sequeda (relational <--> RDF mapping guy) was also here, and Ryan
> > joined
> > >>>> remotely. Really interesting discussion about databases vs. graphs,
> > and
> > >>>> what category theory brings to the table.
> > >>>>>
> > >>>>>
> > >>>>> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <
> > okrammarko@gmail.com
> > >>>> <ma...@gmail.com>> wrote:
> > >>>>> Hey Josh,
> > >>>>>
> > >>>>> I’m digging what you are saying, but the pictures didn’t come
> through
> > >>>> for me ? … Can you provide them again (or if dev@ is filtering
> them,
> > >> can
> > >>>> you give me URLs to them)?
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Marko.
> > >>>>>
> > >>>>>
> > >>>>>> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <josh@fortytwo.net
> > >>>> <ma...@fortytwo.net>> wrote:
> > >>>>>>
> > >>>>>> On the subject of "reified joins", maybe be a picture will be
> worth
> > a
> > >>>> few words. As I said in the thread <
> > >>>>
> > https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ
> > >> <
> > >>>>
> > https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ
> > >>>>
> > >>>> on property graph standardization, if you think of vertex labels,
> edge
> > >>>> labels, and property keys as types, each with projections to two
> other
> > >>>> types, there is a nice analogy with relations of two columns, and
> this
> > >>>> analogy can be easily extended to hyper-edges. Here is what the
> schema
> > >> of
> > >>>> the TinkerPop classic graph looks like if you make each type (e.g.
> > >> Person,
> > >>>> Project, knows, name) into a relation:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> I have made the vertex types salmon-colored, the edge types
> yellow,
> > >>>> the property types green, and the data types blue. The "o" and "I"
> > >> columns
> > >>>> represent the out-type (e.g. out-vertex type of Person) and in-type
> > >> (e.g.
> > >>>> property value type of String) of each relation. More than two
> arrows
> > >> from
> > >>>> a column represent a coproduct, e.g. the out-type of "name" is
> Person
> > OR
> > >>>> Project. Now you can think of out() and in() as joins of two tables
> > on a
> > >>>> primary and foreign key.
> > >>>>>>
> > >>>>>> We are not limited to "out" and "in", however. Here is the ternary
> > >>>> relationship (hyper-edge) from hyper-edge slide <
> > >>>>
> > >>
> >
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
> > >>>> <
> > >>>>
> > >>
> >
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
> > >>>>
> > >>>> of my Graph Day preso, which has three columns/roles/projections:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> I have drawn Says in light blue to indicate that it is a
> generalized
> > >>>> element; it has projections other than "out" and "in". Now the line
> > >> between
> > >>>> relations and edges begins to blur. E.g. in the following, is
> > >> PlaceEvent a
> > >>>> vertex or a property?
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> With the right type system, we can just speak of graph elements,
> and
> > >>>> use "vertex", "edge", "property" when it is convenient. In the
> > >> relational
> > >>>> model, they are relations. If you materialize them in a relational
> > >>>> database, they are rows. In any case, you need two basic graph
> > traversal
> > >>>> operations:
> > >>>>>> project() -- forward traversal of the arrows in the above
> diagrams.
> > >>>> Takes you from an element to a component like in-vertex.
> > >>>>>> select() -- reverse traversal of the arrows. Allows you to answer
> > >>>> questions like "in which Trips is John Doe the rider?"
> > >>>>>>
> > >>>>>> Josh
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <
> > >> okrammarko@gmail.com
> > >>>> <ma...@gmail.com> <mailto:okrammarko@gmail.com <mailto:
> > >>>> okrammarko@gmail.com>>> wrote:
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> I agree with everything you say. Here is my question:
> > >>>>>>
> > >>>>>>       Relational database — join: Table x Table x equality
> function
> > >>>> -> Table
> > >>>>>>       Graph database — traverser: Vertex x edge label -> Vertex
> > >>>>>>
> > >>>>>> I want a single function that does both. The only think was to
> > >>>> represent traverser() in terms of join():
> > >>>>>>
> > >>>>>>       Graph database — traverser: Vertices x Vertex x equality
> > >>>> function -> Vertices
> > >>>>>>
> > >>>>>> For example,
> > >>>>>>
> > >>>>>> V().out(‘address’)
> > >>>>>>
> > >>>>>>       ==>
> > >>>>>>
> > >>>>>> g.join(V().hasLabel(‘person’).as(‘a’)
> > >>>>>>      V().hasLabel(‘addresses’).as(‘b’)).
> > >>>>>>        by(‘name’).select(?address vertex?)
> > >>>>>>
> > >>>>>> That is, join the vertices with themselves based on some predicate
> > to
> > >>>> go from vertices to vertices.
> > >>>>>>
> > >>>>>> However, I would like instead to transform the relational database
> > >>>> join() concept into a traverser() concept. Kuppitz and I were
> talking
> > >> the
> > >>>> other day about a link() type operator that says: “try and link to
> > this
> > >>>> thing in some specified way.” .. ?? The problem we ran into is
> again,
> > >> “link
> > >>>> it to what?”
> > >>>>>>
> > >>>>>>       - in graph, the ‘to what’ is hardcoded so you don’t need to
> > >>>> specify anything.
> > >>>>>>       - in rdbms, the ’to what’ is some other specified table.
> > >>>>>>
> > >>>>>> So what does the link() operator look like?
> > >>>>>>
> > >>>>>> ——
> > >>>>>>
> > >>>>>> Some other random thoughts….
> > >>>>>>
> > >>>>>> Relational databases join on the table (the whole collection)
> > >>>>>> Graph databases traverser on the vertex (an element of the whole
> > >>>> collection)
> > >>>>>>
> > >>>>>> We can make a relational database join on single row (by
> providing a
> > >>>> filter to a particular primary key). This is the same as a table
> with
> > >> one
> > >>>> row. Likewise, for graph in the join() context above:
> > >>>>>>
> > >>>>>> V(1).out(‘address’)
> > >>>>>>
> > >>>>>>       ==>
> > >>>>>>
> > >>>>>> g.join(V(1).as(‘a’)
> > >>>>>>      V().hasLabel(‘addresses’).as(‘b’)).
> > >>>>>>        by(‘name’).select(?address vertex?)
> > >>>>>>
> > >>>>>> More thoughts please….
> > >>>>>>
> > >>>>>> Marko.
> > >>>>>>
> > >>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> > >>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
> > >>>> http://rredux.com/ <http://rredux.com/>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Apr 19, 2019, at 4:20 AM, pieter martin <
> > pieter.martin@gmail.com
> > >>>> <ma...@gmail.com> <mailto:pieter.martin@gmail.com
> > >> <mailto:
> > >>>> pieter.martin@gmail.com>>> wrote:
> > >>>>>>>
> > >>>>>>> Hi,
> > >>>>>>> The way I saw it is that the big difference is that graph's have
> > >>>>>>> reified joins. This is both a blessing and a curse.
> > >>>>>>> A blessing because its much easier (less text to type, less
> > mistakes,
> > >>>>>>> clearer semantics...) to traverse an edge than to construct a
> > manual
> > >>>>>>> join.A curse because there are almost always far more ways to
> > >>>> traverse
> > >>>>>>> a data set than just by the edges some architect might have
> > >>>> considered
> > >>>>>>> when creating the data set. Often the architect is not the domain
> > >>>>>>> expert and the edges are a hardcoded layout of the dataset, which
> > >>>>>>> almost certainly won't survive the real world's demands. In
> graphs,
> > >>>> if
> > >>>>>>> their are no edges then the data is not reachable, except via
> > indexed
> > >>>>>>> lookups. This is the standard engineering problem of database
> > design,
> > >>>>>>> but it is important and useful that data can be traversed,
> joined,
> > >>>>>>> without having reified edges.
> > >>>>>>> In Sqlg at least, but I suspect it generalizes, I want to create
> > the
> > >>>>>>> notion of a "virtual edge". Which in meta data describes the join
> > and
> > >>>>>>> then the standard to(direction, "virtualEdgeName") will work.
> > >>>>>>> In a way this is precisely to keep the graphy nature of gremlin,
> > i.e.
> > >>>>>>> traversing edges, and avoid using the manual join syntax you
> > >>>> described.
> > >>>>>>> CheersPieter
> > >>>>>>>
> > >>>>>>> On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> > >>>>>>>> Hi,
> > >>>>>>>> *** This is mainly for Kuppitz, but if others care.
> > >>>>>>>> Was thinking last night about relational data and Gremlin. The
> T()
> > >>>>>>>> step returns all the tables in the withStructure() RDBMS
> database.
> > >>>>>>>> Tables are ‘complex values’ so they can't leave the VM (only a
> > >>>> simple
> > >>>>>>>> ‘toString’).
> > >>>>>>>> Below is a fake Gremlin session. (and these are just ideas…)
> > tables
> > >>>>>>>> -> a ListLike of rows        rows -> a MapLike of primitives
> > >>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> > >>>>>>>> g.T(‘people’)==>t[people]gremlin>
> > >>>>>>>>
> > >>>>
> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> > >>>>>>>> in>
> > >>>>>>>>
> > >>>>
> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> > >>>>>>>> age:10}==>{name:josh,age:35}gremlin>
> > >>>>>>>>
> > >>>>
> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> > >>>>>>>> remlin>
> > >>>>>>>>
> > >>>>
> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> > >>>>>>>> h
> > >>>>>>>> Makes sense. Nice that values() and has() generally apply to all
> > >>>>>>>> ListLike and MapLike structures. Also, note how asMap() is the
> > >>>>>>>> valueMap() of TP4, but generalizes to anything that is MapLike
> so
> > it
> > >>>>>>>> can be turned into a primitive form as a data-rich result from
> the
> > >>>>>>>> VM.
> > >>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> > >>>>>>>>
> > >>>>
> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> > >>>>>>>> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> > >>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
> > >>>> by(se
> > >>>>>>>> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
> > >>>>
> > >>>>>>>> values().asMap()==>{a.name:marko,a.age:29,b.name:
> > >>>> marko,b.city:santafe
> > >>>>>>>> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{
> > >>>> a.name <http://a.name/> <http://a.name/ <http://a.name/>>:
> > >>>>>>>> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> > >>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
> > >>>> by(’n
> > >>>>>>>> ame’). // shorthand for equijoin on name
> > >>>>>>>> column/key           values().asMap()==>{a.name:marko,a.age:29,
> > >>>> b.name <http://b.name/> <http://b.name/ <http://b.name/>>
> > >>>>>>>> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:
> kuppitz,
> > >>>> b.ci <http://b.ci/> <http://b.ci/ <http://b.ci/>>
> > >>>>>>>> ty:tucson}==>{a.name:josh,a.age:35,b.name:
> > >>>> josh,b.city:desertisland}gr
> > >>>>>>>> emlin>
> > >>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
> > >>>> by(’n
> > >>>>>>>> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> > >>>>>>>> complex value ‘toString'gremlin>
> > >>>>>>>> And of course, all of this is strategized into a SQL call so its
> > >>>>>>>> joins aren’t necessarily computed using TP4-VM resources.
> > >>>>>>>> Anywho — what I hope to realize is the relationship between
> > “links”
> > >>>>>>>> (graph) and “joins” (tables). How can we make (bytecode-wise at
> > >>>>>>>> least) RDBMS join operations and graph traversal operations ‘the
> > >>>>>>>> same.’?
> > >>>>>>>>    Singleton: Integer, String, Float, Double, etc. Collection:
> > >>>>>>>> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> > >>>>>>>> Vertices and Tables can be “linked.” Unlike Collections, they
> > don’t
> > >>>>>>>> maintain a “parent/child” relationship with the objects they
> > >>>>>>>> reference. What does this mean……….?
> > >>>>>>>> Take care,Marko.
> > >>>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> > >>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
> > >>>> http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ <
> > >>>> http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <
> > >>>> http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <
> > >>>> http://rredux.com/>>>>
> > >>>>>
> > >>>>> <diagrams.zip>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Joshua Shinavier <jo...@fortytwo.net>.

Hi Marko. Responses inline.

On Wed, Apr 24, 2019 at 10:30 AM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi,
>
> I think I understand you now. The concept of local and non-local data is
> what made me go “ah!”
>

Nice. I also brought this up yesterday in the Property Graph Schema Working
Group, where there is a discussion going on about whether/how graph
databases can contain multiple graphs. Can an element belong to multiple
graphs, can it have different properties in different graphs, etc. If each
graph element is atomic, referencing other graph elements but not
containing them, then it is very straightforward to think of a property
graph as a simple set of elements. Graph relations are just set relations,
making it easy to pull graphs apart and put graphs together (e.g. when
building a stream, merging streams, etc.). If you are willing to make the
open world assumption (e.g. "I know e[7] is a 'knows' edge, but I don't
know what its out- and in-vertices are"), then you can't even partition a
graph in such a way that the partitions are not valid graphs.


So let me reiterate what I think you are saying.
>
> v[1] is guaranteed to have its id data local to it. All other information
> could be derived via id-based "equi-joins.” Thus, we can’t assume that a
> vertex will always have its properties and edges co-located with it.


Yes indeed. A particular graph vendor may choose to co-locate properties
with a vertex and edges with out- or in-vertex (or both, e.g. as JanusGraph
does), but this is an optimization. At a logical level, you can think of an
element and its dependents as belonging to separate relations.



> However, we can assume that it knows where to get its property and edge
> data when requested.


Yes; you need to be able to select().



> Assume the following RDBMS-style data structure that is referenced by
> com.example.MyGraph.
>
> vertex_table
> id label
> 1  person
> 2  person
> …


That is one way to go. I believe this scheme is what Ryan and David would
call the Grothendieck construction; all relations of a given arity are
marked with their type and concatenated into a single relation. I am still
a little sketchy on the Grothendieck construction, so I hope that is a
correct statement.

However, you can also think of distinct element types (edge labels, vertex
labels, property keys, hyperedge signatures) as distinct relations. So you
instead of vertex_table, you would have

person_table
id
1
2

Vertices are such trivial relations that they don't need to be stored as a
tables. Edges are more interesting:

knows_table
out in
1 2
1 4

Properties are similar:

name_table
out out_label out in
1 person marko
2 person vadas
3 project lop
4 person josh
5 project ripple
6 person peter

The property table has a bit of a twist, because its out-label is a
disjoint union of "person" and "project"; both persons and projects can
have names, so you tag the out-element with its label/type. This is not
necessary for "knows" because the out-label is always "person".



> properties_table
> id  name   age
> 1   marko  29
> 2   josh   35
> …
>
> edge_table
> id outV  label  inV
> 0  1    knows   2
> …
>

Yes, this also works, and is equivalent to what I wrote above, with one
tweak: if tagged unions are supported (which IMO they should be, so we have
both a "times" and a "plus" in our type algebra), then property_table
should also include a "label" column, and edge_table should include
"outLabel" and "inLabel", i.e.:

properties_table
id  label name   age
1   person marko  29
2   person vadas   27
…

and

edge_table
id label outV  outLabel  inV inLabel
0  knows 1 person  2 person
…

In a tagged union, you mark the type of a field along with the value or
reference, for the sake of type checking and pattern matching.

Hard to say whether the table-per-relation or the table-per-arity approach
is better. FWIW, at Uber, we use a table per relation for the sake of
better data isolation. If you want to take advantage of the physical types
of the database, you may want multiple properties_tables, one per datatype
(so you're not storing every property value as a string).


If we want to say that the above data structure is a graph, what is
> required of “ComplexType” such that we can satisfy both Neo4j-style and
> RDBMS-style graph encodings? Assume ComplexType is defined as:
>
> interface ComplexType
>   Iterator<T> adjacents(String label, Object... identifiers)
>

You can think of a ComplexType as a row in a database. It just has the
local fields specific to the type. In order to access attached elements,
you need a select(), and your adjacents() looks pretty close to that. I
would write:

Iterator<T> adjacents(String label, String field)

So for example, adjacents("knows", "out") from v[1] gives you an iterator
of "knows" edges for which v[1] is the out-vertex. Btw. adjacents() here is
the same as from() / comeFrom() in previous emails.



> Take this basic Gremlin traversal:
>
> g.V(1).out(‘knows’).values(‘name’)
>
> I now believe this should compile to the following:
>
> [goto,V,1] [goto,outE,knows] [goto,inV] [goto,properties,name]
>

Polymorphism is cool. Your two-argument goto() appears to be my to(),
whereas your three-argument goto() appears to be my from(). The minimal
tweaks I would make to your syntax are:

[goto,V,1] [goto,out,knows] [goto,in] [goto,out,name][goto,in]

I might go a step further and say:

[const,1] [goto,id,person] [goto,out,knows] [goto,in]
[goto,out,name][goto,in]



> Given MyGraph/MyVertex/MyEdge all implement ComplexType and there is no
> local caching of data on these respective objects, then the bytecode isn’t
> rewritten and the following cascade of events occurs:
>
> [...]
>

Looks pretty good.



> Lets review the ComplexType adjacents()-method:
>
> complexType.adjacents(label,identifiers...)
>
> complexType must have sufficient information to represent the tail of the
> relation.
>

Yes; it need to know what relation type you are matching, and on what field
(e.g. "out"/"in") in that relation. Note that the table-per-relation
approach is most appropriate when traversals are always strongly typed.
E.g. when your step is v.out("knows") as opposed to v.out(). For v.out() to
be supported efficiently, the monolithic table, or an element-to-type
table, makes sense.



> label specifies the relation type (we will always assume that a single
> String is sufficient)
>

Exactly. And yeah, I think it is safe to assume that types can be
identified by strings. Want namespaces? Use a qualified name syntax
appropriate for your application.



> identifiers... must contain sufficient information to identify the head of
> the relation.
>

Yes.


The return of the the method adjacents() is then the object(s) on the other
> side of the relation(s).
>

Yeah. We're taking an element and then iterating through all of the
incoming projections, of a given label, to that element. The label is the
name of the relation together with the name of the field/column.



> Now, given the way I have my data structure organized, I could beef up the
> MyXXX implementation such that MyStrategy rewrites the base bytecode to:
>
> [...]
> Now, I could really beef up MyStrategy when I realize that no path
> information is used in the traversal. Thus, the base bytecode compiles to:
>
> [my:sql,SELECT name FROM properties_table,vertex_table,edge_table WHERE …
> lots of join equalities]
>

Something of the kind.



> [...]
> To recap.
>
>         1. There are primitives.
>

+1


>         2. There are Maps and Lists.
>

Sure. Lists of primitives, and maps of primitives to primitives.



>         3. There are ComplexTypes.
>

I like the fancy term "algebraic data types". They are just tuples in which
each field is either:
1) a primitive value (possibly tagged with a type), or
2) an element reference (possibly tagged with a type)

You also need a special "unit" type for optionals.



>         4. ComplexTypes are adjacent to other objects via relations.
>                 - These adjacent objects may be cached locally with the
> ComplexType instance.
>                 - These adjacent objects may require some database lookup.
>                 - Regardless, TP4 doesn’t care — its up to the provider’s
> ComplexType instance to decide how to resolve the adjacency.
>

+1


>         5. ComplexTypes don’t go over the wire — a ComplexTypeProxy with
> appropriately provided toString() is all that leaves the TP4 VM.
>

As a tuple, ComplexTypes / ADTs go over the wire. The values of their
primitive fields should probably go with them. However, the values of their
element / entity fields are just references; the attached element doesn't
go with them.



> Finally, to solve the asMap()/asList() problem, we simply have:
>
> asMap(’name’,’age’) => complexType.adjacents(‘asMap’,’name’,’age')
> asList() => complexType.adjacents(‘asList’)
>

I think I need an example of asList(), but I agree that we can make
properties into key/value maps. If we want to access metaproperties, then
we don't use asMap().



It is up to the complexType to manifest a Map or List accordingly.
>
> I see this as basically a big flatmap system. ComplexTypes just map from
> self to any number of logical neighbors as specified by the relation.
>
> Am I getting it?,
>

Yeah, and I think I am getting how you break down traversals into basic
instructions. Go go GMachine.

Josh




> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
> > On Apr 24, 2019, at 9:56 AM, Joshua Shinavier <jo...@fortytwo.net> wrote:
> >
> > On Tue, Apr 23, 2019 at 10:28 AM Marko Rodriguez <ok...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I think we are very close to something useable for TP4 structure/.
> Solving
> >> this problem elegantly will open the flood gates on tp4/ development.
> >>
> >
> > Yes, and formality often brings elegance. I don't think we can do much
> > better than relational algebra and relational calculus in terms of
> > formality, so to the extent we can reduce the fundamental TP4 traversal
> > steps to basic relational operations, the floodgates will also be open to
> > applications of query validation and query optimization from the last 40+
> > years of research.
> >
> >
> >
> >> I still don’t grock your comeFrom().goto() stuff. I don’t get the
> benefit
> >> of having two instructions for “pointer chasing” instead of one.
> >>
> >
> > There are just a handful of basic operations in relational algebra.
> > Projection, selection, union, complement, Cartesian product. Joins, as
> well
> > as all other operations, can be derived from these. A lot of graph
> > traversal can be accomplished using only projection and selection, which
> is
> > why we were able to get away with only to/goto and from/comeFrom in the
> > examples above. However, I believe you do need both operations. You can
> > kind of get away without from() if you assume that each vertex has local
> > inE and outE references to incoming and outgoing edges, but I see that
> as a
> > kind of pre-materialized from()/select(). If you think of edges strictly
> as
> > relations, and represent them in a straightforward way with tables, you
> > don't need the local inE and outE; whether you have them depends on the
> > graph back-end.
> >
> >
> >
> >> Lets put that aside for now and lets turn to modeling a Vertex. Go back
> to
> >> my original representation:
> >>
> >> vertex.goto(‘label’)
> >> vertex.goto(‘id’)
> >>
> >
> > Local (in my view). All good.
> >
> >
> >
> >> vertex.goto(‘outE’)
> >> vertex.goto(‘inE’)
> >> vertex.goto(‘properties’)
> >>
> >
> > Non-local (in my view). You can use goto(), but if the goal is to bring
> the
> > relational model into the fold, at a lower level you do have a select()
> > operation. Unless you make projections local to vertices instead of
> edges,
> > but then you just have the same problem in reverse. Am I making sense?
> >
> >
> > Any object can be converted into a Map. In TinkerPop3 we convert vertices
> >> into maps via:
> >>
> >>        g.V().has(‘name’,’marko’).valueMap() => {name:marko,age:29}
> >>        g.V().has(‘name’,’marko’).valueMap(true) =>
> >> {id:1,label:person,name:marko,age:29}
> >>
> >
> > Maps are A-OK. In the case of properties, I think where we differ is that
> > you see a property like "name" as a key/value pair in a map local to the
> > vertex. I see the property as an element of type "name", with the vertex
> as
> > a value in its local map, logically if not physically. This allows
> maximum
> > flexibility in terms of meta-properties -- exotic beasts which seem to be
> > in a kind of limbo state in TP3, but if we're trying to be as general as
> > possible, some data models we might want to pull in, like GRAKN.AI, do
> > allow this kind of flexibility.
> >
> >
> >
> >> In the spirit of instruction reuse, we should have an asMap()
> instruction
> >> that works for ANY object. (As a side: this gets back to ONLY sending
> >> primitives over the wire, no
> >> Vertex/Edge/Document/Table/Row/XML/ColumnFamily/etc.). Thus, the above
> is:
> >>
> >>        g.V().has(‘name’,’marko’).properties().asMap() =>
> >> {name:marko,age:29}
> >>        g.V().has(‘name’,’marko’).asMap() =>
> >> {id:1,label:person,properties:{name:marko,age:29}}
> >>
> >
> > Again, no argument here, although I would think of a map as an
> > optimization. IMO, the fundamental projections from v[1] are id:1 and
> > label:Person. You could make a map out of these, or just use an offset,
> > since the keys are always the same. However, you can also build a map
> > including any key you can turn into a function. properties() is such a
> key.
> >
> >
> > You might ask, why didn’t it go to outE and inE and map-ify that data?
> >> Because those are "sibling” references, not “children” references.
> >>
> >>        goto(‘outE’) is a “sibling” reference. (a vertex does not contain
> >> an edge)
> >>        goto(‘id’) is a “child” reference. (a vertex contains the id)
> >>
> >
> > I agree with both of those statements. A vertex does not contain the
> edges
> > incident on it. Again, I am thinking of properties a bit more like edges
> > for maximum generality.
> >
> >
> >
> >> Where do we find sibling references?
> >>        Graphs: vertices don’t contain each other.
> >>        OO heaps: many objects don’t contain each other.
> >>        RDBMS: rows are linked by joins, but don’t contain each other.
> >>
> >
> > Yep.
> >
> >
> > So, the way in which we structure our references (pointers) determines
> the
> >> shape of the data and ultimately how different instructions will
> behave. We
> >> can’t assume that asMap() knows anything about
> >> vertices/edges/documents/rows/tables/etc. It will simply walk all
> >> child-references and create a map.
> >>
> >
> > Just to play devil's advocate, you *could* include "inE" and "outE" as
> keys
> > in the local map of a vertex; it's just a matter of what you choose to
> do.
> > inE and outE are perfectly good functions from a vertex to a set of
> edges.
> >
> >
> > We don’t want TP to get involved in “complex data types.”
> >
> >
> > Well, how do you feel about algebraic data types? They are simple, and
> > allow you to capture arbitrary relations as elements.
> >
> >
> >
> >> We don’t care. You can propagate MyDatabaseObject through the TP4 VM
> >> pipeline and load your object up with methods for optimizations with
> your
> >> DB and all that, but for TP4, your object is just needs to implement:
> >>
> >>        ComplexType
> >>                - Iterator<T> children(String label)
> >>                - Iterator<T> siblings(String label)
> >>                - default Iterator<T> references(String label) {
> >> IteratorUtils.concat(children(label), siblings(label)) }
> >>                - String toString()
> >>
> >
> > I don't think you need siblings(). I think you need a more generic
> > select(), but since this is graph traversal, select() only needs the
> > identifier of a type (e.g. "knows") and the name of a field (e.g. "out").
> >
> >
> >
> >> When a ComplexType goes over the wire to the user, it just represented
> as
> >> a ComplexTypeProxy with a toString() like v[1],
> >> tinkergraph[vertices:10,edges:34], etc. All references are disconnected.
> >> Yes, even children references. We do not want language drivers having to
> >> know about random object types and have to deal with implementing
> >> serializers and all that non-sense. The TP4 serialization protocol is
> >> primitives, maps, lists, bytecode, and traversers. Thats it!
> >>
> >
> > No disagreement here. I think the only disconnect is about what keys are
> > local to what elements. Some keys are hard-local, like id and type for
> all
> > elements, and "in" and "out" for edges and properties. These *should* be
> > carried over the wire. Properties, incident edges, etc. possibly but not
> > necessarily.
> >
> >
> >
> >> *** Only Maps and Lists (that don’t contain complex data types) maintain
> >> their child references “over the wire.”
> >>
> >
> > Sure.
> >
> >
> >
> >> I don’t get your hypergraph example, so let me try another example:
> >>
> >>        tp ==member==> marko, josh
> >>
> >> TP is a vertex and there is a directed hyperedge with label “member”
> >> connecting to marko and josh vertices.
> >>
> >
> > That's kind of an unlabeled hyperedge; I am not sure we need to support
> > those. Look at the GRAKN data model, or at HypergraphDB or earlier
> > hypergraph data models. A hyperedge is essentially a tuple in which each
> > components has a label ("role", in GRAKN). In other words, it is a
> relation
> > in which some of the columns may be foreign keys. In your example, rather
> > than "member" connecting "tp" to a set of vertices, you might have
> > something like Collaborated{person1:marko, person2:josh, project=tp}.
> Then
> > a query like "who did marko collaborate with on tp?" becomes:
> >
> >    tp.from("Collaborated", "project").restrict("person1",
> > "marko").to("person2")
> >
> > Of course, if you want this relationship to be symmetrical, you can
> > introduce a constraint.
> >
> >
> >
> >> tp.goto(“outE”).filter(goto(“label”).is(“member”)).goto(“inV”)
> >>
> >> Looks exactly like a property graph query? However, its not because
> >> goto(“inV”) returns 2 vertices, not 1.
> >
> >
> > I think your example works well for the type of hypergraph you are
> > referring to. It's just different than the type of hypergraph I am
> > referring to. I think by now you know that I would rather see a from()
> > instead of that goto("outE"). I also agree you can make a function out of
> > outE, and expose it using a map, if you really want to. Under the hood,
> > however, I see this as traversing a projection head to tail rather than
> > tail to head.
> >
> >
> >
> >> EdgeVertexFlatmapFunction works for property graphs and hypergraphs. It
> >> doesn’t care — it just follows goto() pointers! That is, it follows the
> >> ComplexType.references(“inV”). Multi-properties are the same as well.
> >> Likewise for meta-properties. These data model variations are not
> “special”
> >> to the TP4 VM. It just walks references whether there are 0,1,2, or N of
> >> them.
> >>
> >
> > At a high level, I agree with what you are saying. We should have a
> common
> > data model that unifies traditional property graphs, hypergraphs,
> > relational databases, and any other data model that can be modeled using
> > algebraic data types with references. We define a small set of basic
> > operations on this data model which can be combined into more complex
> > operations that are amenable to static analysis and optimization. We can
> > send graph data over the wire as collections of elements using the bare
> > minimum of local fields, and reconstruct the graph on the other end. We
> can
> > operate on streams of such elements under suitable conditions (elements
> > sent in an appropriate order). The basic operations are not tied to the
> > JVM, and should be straightforward to implement in other frameworks.
> >
> >
> >
> >>
> >> Thus, what is crucial to all this is the “shape of the data.” Using your
> >> pointers wisely so instructions produce useful results.
> >>
> >
> > +1
> >
> >
> >
> >> Does any of what I wrote update your comeFrom().goto() stuff?
> >
> >
> > Sadly, no, though I appreciate that you are coming from a slightly
> > different place w.r.t. properties, hypergraphs, and most importantly, the
> > role of a type system.
> >
> >
> >
> >> If not, can you please explain to me why comeFrom() is cool — sorry for
> >> being dense (aka “being Kuppitz" — thats right, I said it. boom!).
> >>
> >
> > Let's keep iterating until we reach a fixed point. Maybe Daniel's already
> > there.
> >
> > Josh
> >
> >
> >
> >>
> >> Thanks,
> >> Marko.
> >>
> >> http://rredux.com <http://rredux.com/>
> >>
> >>
> >>
> >>
> >>> On Apr 23, 2019, at 10:25 AM, Joshua Shinavier <jo...@fortytwo.net>
> >> wrote:
> >>>
> >>> On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez <ok...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hey Josh,
> >>>>
> >>>> This gets to the notion I presented in “The Fabled GMachine.”
> >>>>       http://rredux.com/the-fabled-gmachine.html <
> >>>> http://rredux.com/the-fabled-gmachine.html> (first paragraph of
> >>>> “Structures, Processes, and Languages” section)
> >>>>
> >>>> All that exists are memory addresses that contain either:
> >>>>
> >>>>       1. A primitive
> >>>>       2. A set of labeled references to other references or
> primitives.
> >>>>
> >>>> Using your work and the above, here is a super low-level ‘bytecode'
> for
> >>>> property graphs.
> >>>>
> >>>> v.goto("id") => 1
> >>>>
> >>>
> >>> LGTM. An id is special because it is uniquely identifying / is a
> primary
> >>> key for the element. However, it is also just a field of the element,
> >> like
> >>> "in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id
> >> would
> >>> only really need to be unique among other elements of the same type. To
> >> the
> >>> above, I would add:
> >>>
> >>> v.type() => Person
> >>>
> >>> ...a special operation which takes you from an element to its type.
> This
> >> is
> >>> important if unions are supported; e.g. "name" in my example can apply
> >>> either to a Person or a Project.
> >>>
> >>>
> >>> v.goto("label") => person
> >>>>
> >>>
> >>> Or that. Like "id", "type"/"label" is special. You can think of it as a
> >>> field; it's just a different sort of field which will have the same
> value
> >>> for all elements of any given type.
> >>>
> >>>
> >>>
> >>>> v.goto("properties").goto("name") => "marko"
> >>>>
> >>>
> >>> OK, properties. Are properties built-in as a separate kind of thing
> from
> >>> edges, or can we treat them the same as vertices and edges here? I
> think
> >> we
> >>> can treat them the same. A property, in the algebraic model I described
> >>> above, is just an element with two fields, the second of which is a
> >>> primitive value. As I said, I think we need two distinct traversal
> >>> operations -- projection and selection -- and here is where we can use
> >> the
> >>> latter. Here, I will call it "comeFrom".
> >>>
> >>> v.comeFrom("name", "out").goto("in") => {"marko"}
> >>>
> >>> You can think of this comeFrom as a special case of a select() function
> >>> which takes a type -- "name" -- and a set of key/value pairs {("out",
> >> v)}.
> >>> It returns all matching elements of the given type. You then project to
> >> the
> >>> "in" value using your goto. I wrote {"marko"} as a set, because
> comeFrom
> >>> can give you multiple properties, depending on whether multi-properties
> >> are
> >>> supported.
> >>>
> >>> Note how similar this is to an edge traversal:
> >>>
> >>> v.comeFrom("knows", "out").goto("in") => {v[2], v[4]}
> >>>
> >>> Of course, you could define "properties" in such a way that a
> >>> goto("properties") does exactly this under the hood, but in terms of
> low
> >>> level instructions, you need something like comeFrom.
> >>>
> >>>
> >>> v.goto("properties").goto("name").goto(0) => "m"
> >>>>
> >>>
> >>> This is where the notion of optionals becomes handy. You can make
> >>> array/list indices into fields like this, but IMO you should also make
> >> them
> >>> safe. E.g. borrowing Haskell syntax for a moment:
> >>>
> >>> v.goto("properties").goto("name").goto(0) => Just 'm'
> >>>
> >>> v.goto("properties").goto("name").goto(5) => Nothing
> >>>
> >>>
> >>> v.goto("outE").goto("inV") => v[2], v[4]
> >>>>
> >>>
> >>> I am not a big fan of untyped "outE", but you can think of this as a
> >> union
> >>> of all v.comeFrom(x, "out").goto("in"), where x is any edge type. Only
> >>> "knows" and "created" are edge types which are applicable to "Person",
> so
> >>> you will only get {v[2], v[4]}. If you want to get really crazy, you
> can
> >>> allow x to be any type. Then you get {v[2], v[4], 29, "marko"}.
> >>>
> >>>
> >>>
> >>>> g.goto("V").goto(1) => v[1]
> >>>>
> >>>
> >>> That, or you give every element a virtual field called "graph". So:
> >>>
> >>> v.goto("graph") => g
> >>>
> >>> g.comeFrom("Person", "graph") => {v[1], v[2], v[4], v[6]}
> >>>
> >>> g.comeFrom("Person", "graph").restrict("id", 1)
> >>>
> >>> ...where restrict() is the relational "sigma" operation as above, not
> to
> >> be
> >>> confused with TinkerPop's select(), filter(), or has() steps. Again, I
> >>> prefer to specify a type in comeFrom (i.e. we're looking specifically
> >> for a
> >>> Person with id of 1), but you could also do a comprehension
> g.comeFrom(x,
> >>> "graph"), letting x range over all types.
> >>>
> >>>
> >>>
> >>>> The goto() instruction moves the “memory reference” (traverser) from
> the
> >>>> current “memory address” to the “memory address” referenced by the
> >> goto()
> >>>> argument.
> >>>>
> >>>
> >>> Agreed, if we also think of primitive values as memory references.
> >>>
> >>>
> >>>
> >>>> The Gremlin expression:
> >>>>
> >>>>       g.V().has(‘name’,’marko’).out(‘knows’).drop()
> >>>>
> >>>> ..would compile to:
> >>>>
> >>>>
> >>>>
> >>
> g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()
> >>>>
> >>>
> >>>
> >>> In the alternate universe:
> >>>
> >>> g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in",
> >>> "marko").goto("out").comeFrom("knows", "out").goto("in").free()
> >>>
> >>> I have wimped out on free() and just left it as you had it, but I think
> >> it
> >>> would be worthwhile to explore a monadic syntax for traversals with
> >>> side-effects. Different topic.
> >>>
> >>> Now, all of this "out", "in" business is getting pretty repetitive,
> >> right?
> >>> Well, the field names become more diverse if we allow hyper-edges and
> >>> generalized ADTs. E.g. in my Trip example, say I want to know all
> >> drop-off
> >>> locations for a given rider:
> >>>
> >>> u.comeFrom("Trip", "rider").goto("dropoff").goto("place")
> >>>
> >>> Done.
> >>>
> >>>
> >>>
> >>>> If we can get things that “low-level” and still efficient to compile,
> >> then
> >>>> we can model every data structure. All you are doing is pointer
> chasing
> >>>> through a withStructure() data structure. .
> >>>>
> >>>
> >>> Agreed.
> >>>
> >>>
> >>> No one would ever want to write strategies for goto()-based Bytecode.
> >>>
> >>>
> >>> Also agreed.
> >>>
> >>>
> >>>
> >>>> Thus, perhaps there could be a PropertyGraphDecorationStrategy that
> >> does:
> >>>>
> >>>> [...]
> >>>
> >>>
> >>> No argument here, though the alternate-universe "bytecode" would look
> >>> slightly different. And the high-level syntax should also be able to
> deal
> >>> with generalized relations / data types gracefully. As a thought
> >>> experiment, suppose we were to define the steps to() as your goto(),
> and
> >>> from() as my comeFrom(). Then traversals like:
> >>>
> >>> u.from("Trip", "rider").to("dropoff").to("time")
> >>>
> >>> ...look pretty good as-is, and are not too low-level. However, ordinary
> >>> edge traversals like:
> >>>
> >>> v.from("knows", "out").to("in")
> >>>
> >>> ...do look a little Assembly-like. So in/out/both etc. remain as they
> >> are,
> >>> but are shorthand for from() and to() steps using "out" or "in":
> >>>
> >>> v.out("knows") === v.outE("knows").inV() === v.from("knows",
> >> "out").to("in")
> >>>
> >>>
> >>> [I AM NOW GOING OFF THE RAILS]
> >>>> [sniiiiip]
> >>>>
> >>>
> >>> Sure. Again, I like the idea of wrapping side-effects in monads. What
> >> would
> >>> that look like in a Gremlinesque fluent syntax? I don't quite know, but
> >> if
> >>> we think of the dot as a monadic bind operation like Haskell's >>=,
> then
> >>> perhaps the monadic expressions look pretty similar to what you have
> just
> >>> sketched out. Might have to be careful about what it means to nest
> >>> operations as in your addEdge examples.
> >>>
> >>>
> >>>
> >>> [I AM NOW BACK ON THE RAILS]
> >>>>
> >>>> Its as if “properties”, “outE”, “label”, “inV”, etc. references mean
> >>>> something to property graph providers and they can do more intelligent
> >>>> stuff than what MongoDB would do with such information. However,
> >> someone,
> >>>> of course, can create a MongoDBPropertyGraphStrategy that would make
> >>>> documents look like vertices and edges and then use O(log(n)) lookups
> on
> >>>> ids to walk the graph. However, if that didn’t exist, it would still
> do
> >>>> something that works even if its horribly inefficient as every
> database
> >> can
> >>>> make primitives with references between them!
> >>>>
> >>>
> >>> I'm on the same same pair of rails.
> >>>
> >>>
> >>>
> >>>> Anywho @Josh, I believe goto() is what you are doing with
> >> multi-references
> >>>> off an object. How do we make it all clean, easy, and universal?
> >>>>
> >>>
> >>> Let me know what you think of the above.
> >>>
> >>> Josh
> >>>
> >>>
> >>>
> >>>>
> >>>> Marko.
> >>>>
> >>>> http://rredux.com <http://rredux.com/>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <jo...@fortytwo.net>
> >> wrote:
> >>>>>
> >>>>> Ah, glad you asked. It's all in the pictures. I have nowhere to put
> >> them
> >>>> online at the moment... maybe this attachment will go through to the
> >> list?
> >>>>>
> >>>>> Btw. David Spivak gave his talk today at Uber; it was great. Juan
> >>>> Sequeda (relational <--> RDF mapping guy) was also here, and Ryan
> joined
> >>>> remotely. Really interesting discussion about databases vs. graphs,
> and
> >>>> what category theory brings to the table.
> >>>>>
> >>>>>
> >>>>> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <
> okrammarko@gmail.com
> >>>> <ma...@gmail.com>> wrote:
> >>>>> Hey Josh,
> >>>>>
> >>>>> I’m digging what you are saying, but the pictures didn’t come through
> >>>> for me ? … Can you provide them again (or if dev@ is filtering them,
> >> can
> >>>> you give me URLs to them)?
> >>>>>
> >>>>> Thanks,
> >>>>> Marko.
> >>>>>
> >>>>>
> >>>>>> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <josh@fortytwo.net
> >>>> <ma...@fortytwo.net>> wrote:
> >>>>>>
> >>>>>> On the subject of "reified joins", maybe be a picture will be worth
> a
> >>>> few words. As I said in the thread <
> >>>>
> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ
> >> <
> >>>>
> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ
> >>>>
> >>>> on property graph standardization, if you think of vertex labels, edge
> >>>> labels, and property keys as types, each with projections to two other
> >>>> types, there is a nice analogy with relations of two columns, and this
> >>>> analogy can be easily extended to hyper-edges. Here is what the schema
> >> of
> >>>> the TinkerPop classic graph looks like if you make each type (e.g.
> >> Person,
> >>>> Project, knows, name) into a relation:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I have made the vertex types salmon-colored, the edge types yellow,
> >>>> the property types green, and the data types blue. The "o" and "I"
> >> columns
> >>>> represent the out-type (e.g. out-vertex type of Person) and in-type
> >> (e.g.
> >>>> property value type of String) of each relation. More than two arrows
> >> from
> >>>> a column represent a coproduct, e.g. the out-type of "name" is Person
> OR
> >>>> Project. Now you can think of out() and in() as joins of two tables
> on a
> >>>> primary and foreign key.
> >>>>>>
> >>>>>> We are not limited to "out" and "in", however. Here is the ternary
> >>>> relationship (hyper-edge) from hyper-edge slide <
> >>>>
> >>
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
> >>>> <
> >>>>
> >>
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
> >>>>
> >>>> of my Graph Day preso, which has three columns/roles/projections:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I have drawn Says in light blue to indicate that it is a generalized
> >>>> element; it has projections other than "out" and "in". Now the line
> >> between
> >>>> relations and edges begins to blur. E.g. in the following, is
> >> PlaceEvent a
> >>>> vertex or a property?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> With the right type system, we can just speak of graph elements, and
> >>>> use "vertex", "edge", "property" when it is convenient. In the
> >> relational
> >>>> model, they are relations. If you materialize them in a relational
> >>>> database, they are rows. In any case, you need two basic graph
> traversal
> >>>> operations:
> >>>>>> project() -- forward traversal of the arrows in the above diagrams.
> >>>> Takes you from an element to a component like in-vertex.
> >>>>>> select() -- reverse traversal of the arrows. Allows you to answer
> >>>> questions like "in which Trips is John Doe the rider?"
> >>>>>>
> >>>>>> Josh
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <
> >> okrammarko@gmail.com
> >>>> <ma...@gmail.com> <mailto:okrammarko@gmail.com <mailto:
> >>>> okrammarko@gmail.com>>> wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> I agree with everything you say. Here is my question:
> >>>>>>
> >>>>>>       Relational database — join: Table x Table x equality function
> >>>> -> Table
> >>>>>>       Graph database — traverser: Vertex x edge label -> Vertex
> >>>>>>
> >>>>>> I want a single function that does both. The only think was to
> >>>> represent traverser() in terms of join():
> >>>>>>
> >>>>>>       Graph database — traverser: Vertices x Vertex x equality
> >>>> function -> Vertices
> >>>>>>
> >>>>>> For example,
> >>>>>>
> >>>>>> V().out(‘address’)
> >>>>>>
> >>>>>>       ==>
> >>>>>>
> >>>>>> g.join(V().hasLabel(‘person’).as(‘a’)
> >>>>>>      V().hasLabel(‘addresses’).as(‘b’)).
> >>>>>>        by(‘name’).select(?address vertex?)
> >>>>>>
> >>>>>> That is, join the vertices with themselves based on some predicate
> to
> >>>> go from vertices to vertices.
> >>>>>>
> >>>>>> However, I would like instead to transform the relational database
> >>>> join() concept into a traverser() concept. Kuppitz and I were talking
> >> the
> >>>> other day about a link() type operator that says: “try and link to
> this
> >>>> thing in some specified way.” .. ?? The problem we ran into is again,
> >> “link
> >>>> it to what?”
> >>>>>>
> >>>>>>       - in graph, the ‘to what’ is hardcoded so you don’t need to
> >>>> specify anything.
> >>>>>>       - in rdbms, the ’to what’ is some other specified table.
> >>>>>>
> >>>>>> So what does the link() operator look like?
> >>>>>>
> >>>>>> ——
> >>>>>>
> >>>>>> Some other random thoughts….
> >>>>>>
> >>>>>> Relational databases join on the table (the whole collection)
> >>>>>> Graph databases traverser on the vertex (an element of the whole
> >>>> collection)
> >>>>>>
> >>>>>> We can make a relational database join on single row (by providing a
> >>>> filter to a particular primary key). This is the same as a table with
> >> one
> >>>> row. Likewise, for graph in the join() context above:
> >>>>>>
> >>>>>> V(1).out(‘address’)
> >>>>>>
> >>>>>>       ==>
> >>>>>>
> >>>>>> g.join(V(1).as(‘a’)
> >>>>>>      V().hasLabel(‘addresses’).as(‘b’)).
> >>>>>>        by(‘name’).select(?address vertex?)
> >>>>>>
> >>>>>> More thoughts please….
> >>>>>>
> >>>>>> Marko.
> >>>>>>
> >>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> >>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
> >>>> http://rredux.com/ <http://rredux.com/>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Apr 19, 2019, at 4:20 AM, pieter martin <
> pieter.martin@gmail.com
> >>>> <ma...@gmail.com> <mailto:pieter.martin@gmail.com
> >> <mailto:
> >>>> pieter.martin@gmail.com>>> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>> The way I saw it is that the big difference is that graph's have
> >>>>>>> reified joins. This is both a blessing and a curse.
> >>>>>>> A blessing because its much easier (less text to type, less
> mistakes,
> >>>>>>> clearer semantics...) to traverse an edge than to construct a
> manual
> >>>>>>> join.A curse because there are almost always far more ways to
> >>>> traverse
> >>>>>>> a data set than just by the edges some architect might have
> >>>> considered
> >>>>>>> when creating the data set. Often the architect is not the domain
> >>>>>>> expert and the edges are a hardcoded layout of the dataset, which
> >>>>>>> almost certainly won't survive the real world's demands. In graphs,
> >>>> if
> >>>>>>> their are no edges then the data is not reachable, except via
> indexed
> >>>>>>> lookups. This is the standard engineering problem of database
> design,
> >>>>>>> but it is important and useful that data can be traversed, joined,
> >>>>>>> without having reified edges.
> >>>>>>> In Sqlg at least, but I suspect it generalizes, I want to create
> the
> >>>>>>> notion of a "virtual edge". Which in meta data describes the join
> and
> >>>>>>> then the standard to(direction, "virtualEdgeName") will work.
> >>>>>>> In a way this is precisely to keep the graphy nature of gremlin,
> i.e.
> >>>>>>> traversing edges, and avoid using the manual join syntax you
> >>>> described.
> >>>>>>> CheersPieter
> >>>>>>>
> >>>>>>> On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> >>>>>>>> Hi,
> >>>>>>>> *** This is mainly for Kuppitz, but if others care.
> >>>>>>>> Was thinking last night about relational data and Gremlin. The T()
> >>>>>>>> step returns all the tables in the withStructure() RDBMS database.
> >>>>>>>> Tables are ‘complex values’ so they can't leave the VM (only a
> >>>> simple
> >>>>>>>> ‘toString’).
> >>>>>>>> Below is a fake Gremlin session. (and these are just ideas…)
> tables
> >>>>>>>> -> a ListLike of rows        rows -> a MapLike of primitives
> >>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >>>>>>>> g.T(‘people’)==>t[people]gremlin>
> >>>>>>>>
> >>>> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> >>>>>>>> in>
> >>>>>>>>
> >>>> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> >>>>>>>> age:10}==>{name:josh,age:35}gremlin>
> >>>>>>>>
> >>>> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> >>>>>>>> remlin>
> >>>>>>>>
> >>>> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> >>>>>>>> h
> >>>>>>>> Makes sense. Nice that values() and has() generally apply to all
> >>>>>>>> ListLike and MapLike structures. Also, note how asMap() is the
> >>>>>>>> valueMap() of TP4, but generalizes to anything that is MapLike so
> it
> >>>>>>>> can be turned into a primitive form as a data-rich result from the
> >>>>>>>> VM.
> >>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >>>>>>>>
> >>>> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> >>>>>>>> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> >>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
> >>>> by(se
> >>>>>>>> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
> >>>>
> >>>>>>>> values().asMap()==>{a.name:marko,a.age:29,b.name:
> >>>> marko,b.city:santafe
> >>>>>>>> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{
> >>>> a.name <http://a.name/> <http://a.name/ <http://a.name/>>:
> >>>>>>>> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> >>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
> >>>> by(’n
> >>>>>>>> ame’). // shorthand for equijoin on name
> >>>>>>>> column/key           values().asMap()==>{a.name:marko,a.age:29,
> >>>> b.name <http://b.name/> <http://b.name/ <http://b.name/>>
> >>>>>>>> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,
> >>>> b.ci <http://b.ci/> <http://b.ci/ <http://b.ci/>>
> >>>>>>>> ty:tucson}==>{a.name:josh,a.age:35,b.name:
> >>>> josh,b.city:desertisland}gr
> >>>>>>>> emlin>
> >>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
> >>>> by(’n
> >>>>>>>> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> >>>>>>>> complex value ‘toString'gremlin>
> >>>>>>>> And of course, all of this is strategized into a SQL call so its
> >>>>>>>> joins aren’t necessarily computed using TP4-VM resources.
> >>>>>>>> Anywho — what I hope to realize is the relationship between
> “links”
> >>>>>>>> (graph) and “joins” (tables). How can we make (bytecode-wise at
> >>>>>>>> least) RDBMS join operations and graph traversal operations ‘the
> >>>>>>>> same.’?
> >>>>>>>>    Singleton: Integer, String, Float, Double, etc. Collection:
> >>>>>>>> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> >>>>>>>> Vertices and Tables can be “linked.” Unlike Collections, they
> don’t
> >>>>>>>> maintain a “parent/child” relationship with the objects they
> >>>>>>>> reference. What does this mean……….?
> >>>>>>>> Take care,Marko.
> >>>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> >>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
> >>>> http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ <
> >>>> http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <
> >>>> http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <
> >>>> http://rredux.com/>>>>
> >>>>>
> >>>>> <diagrams.zip>
> >>>>
> >>>>
> >>
> >>
>
>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi,

I think I understand you now. The concept of local and non-local data is what made me go “ah!”

So let me reiterate what I think you are saying.

v[1] is guaranteed to have its id data local to it. All other information could be derived via id-based "equi-joins.” Thus, we can’t assume that a vertex will always have its properties and edges co-located with it. However, we can assume that it knows where to get its property and edge data when requested. Assume the following RDBMS-style data structure that is referenced by com.example.MyGraph.

vertex_table
id label
1  person
2  person
…

properties_table
id  name   age
1   marko  29
2   josh   35
…

edge_table
id outV  label  inV
0  1    knows   2
…

If we want to say that the above data structure is a graph, what is required of “ComplexType” such that we can satisfy both Neo4j-style and RDBMS-style graph encodings? Assume ComplexType is defined as:

interface ComplexType
  Iterator<T> adjacents(String label, Object... identifiers)

Take this basic Gremlin traversal:

g.V(1).out(‘knows’).values(‘name’)

I now believe this should compile to the following:

[goto,V,1] [goto,outE,knows] [goto,inV] [goto,properties,name]

Given MyGraph/MyVertex/MyEdge all implement ComplexType and there is no local caching of data on these respective objects, then the bytecode isn’t rewritten and the following cascade of events occurs:

mygraph
[goto,V,1] => 
  mygraph.adjacents(“V”,1) => 
    SELECT * FROM vertex_table WHERE id=1
myvertex1
[goto,outE,knows] => 
  myvertex1.adjacents(“outE”,”knows”) => 
    SELECT id FROM edge_table WHERE outV=1 AND label=knows
myedge0
[goto,inV,knows] => 
  myedge1.adjacents(“inV”) => 
    SELECT vertex_table.id FROM vertex_table, edge_table WHERE vertex_table.id=edge_table.inV AND edge_table.id=0
myvertex2
[goto,properties,name] => 
  myvertex2.adjacents(“properties”,”name”) => 
    SELECT name FROM properties_table WHERE id=2
“josh"

Lets review the ComplexType adjacents()-method:

complexType.adjacents(label,identifiers...)

complexType must have sufficient information to represent the tail of the relation.
label specifies the relation type (we will always assume that a single String is sufficient)
identifiers... must contain sufficient information to identify the head of the relation.

The return of the the method adjacents() is then the object(s) on the other side of the relation(s).

Now, given the way I have my data structure organized, I could beef up the MyXXX implementation such that MyStrategy rewrites the base bytecode to:

[goto,V,1] [goto,out,knows][goto,properties,name]

The following cascade of events occurs:

mygraph
[goto,V,1] => 
  mygraph.adjacents(“V”,1) => 
    SELECT * FROM vertex_table WHERE id=1
myvertex1
[goto,out,knows] => 
  myvertex1.adjacents(“outE”,”knows”) => 
    SELECT vertex_table.id FROM vertex_table,edge_table WHERE outV=1 AND label=knows AND inV=vertex_table.id
myvertex2
[goto,properties,name] => 
  myvertex2.adjacents(“properties”,”name”) => 
    SELECT name FROM properties_table WHERE id=2
“josh"

Now, I could really beef up MyStrategy when I realize that no path information is used in the traversal. Thus, the base bytecode compiles to:

[my:sql,SELECT name FROM properties_table,vertex_table,edge_table WHERE … lots of join equalities]

This would then just emit “josh” given the mygraph object.

——

To recap.

	1. There are primitives.
	2. There are Maps and Lists.
	3. There are ComplexTypes.
	4. ComplexTypes are adjacent to other objects via relations.
		- These adjacent objects may be cached locally with the ComplexType instance.
		- These adjacent objects may require some database lookup.
		- Regardless, TP4 doesn’t care — its up to the provider’s ComplexType instance to decide how to resolve the adjacency.
	5. ComplexTypes don’t go over the wire — a ComplexTypeProxy with appropriately provided toString() is all that leaves the TP4 VM.

Finally, to solve the asMap()/asList() problem, we simply have:

asMap(’name’,’age’) => complexType.adjacents(‘asMap’,’name’,’age')
asList() => complexType.adjacents(‘asList’)

It is up to the complexType to manifest a Map or List accordingly.

I see this as basically a big flatmap system. ComplexTypes just map from self to any number of logical neighbors as specified by the relation.

Am I getting it?,
Marko.

http://rredux.com <http://rredux.com/>




> On Apr 24, 2019, at 9:56 AM, Joshua Shinavier <jo...@fortytwo.net> wrote:
> 
> On Tue, Apr 23, 2019 at 10:28 AM Marko Rodriguez <ok...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> I think we are very close to something useable for TP4 structure/. Solving
>> this problem elegantly will open the flood gates on tp4/ development.
>> 
> 
> Yes, and formality often brings elegance. I don't think we can do much
> better than relational algebra and relational calculus in terms of
> formality, so to the extent we can reduce the fundamental TP4 traversal
> steps to basic relational operations, the floodgates will also be open to
> applications of query validation and query optimization from the last 40+
> years of research.
> 
> 
> 
>> I still don’t grock your comeFrom().goto() stuff. I don’t get the benefit
>> of having two instructions for “pointer chasing” instead of one.
>> 
> 
> There are just a handful of basic operations in relational algebra.
> Projection, selection, union, complement, Cartesian product. Joins, as well
> as all other operations, can be derived from these. A lot of graph
> traversal can be accomplished using only projection and selection, which is
> why we were able to get away with only to/goto and from/comeFrom in the
> examples above. However, I believe you do need both operations. You can
> kind of get away without from() if you assume that each vertex has local
> inE and outE references to incoming and outgoing edges, but I see that as a
> kind of pre-materialized from()/select(). If you think of edges strictly as
> relations, and represent them in a straightforward way with tables, you
> don't need the local inE and outE; whether you have them depends on the
> graph back-end.
> 
> 
> 
>> Lets put that aside for now and lets turn to modeling a Vertex. Go back to
>> my original representation:
>> 
>> vertex.goto(‘label’)
>> vertex.goto(‘id’)
>> 
> 
> Local (in my view). All good.
> 
> 
> 
>> vertex.goto(‘outE’)
>> vertex.goto(‘inE’)
>> vertex.goto(‘properties’)
>> 
> 
> Non-local (in my view). You can use goto(), but if the goal is to bring the
> relational model into the fold, at a lower level you do have a select()
> operation. Unless you make projections local to vertices instead of edges,
> but then you just have the same problem in reverse. Am I making sense?
> 
> 
> Any object can be converted into a Map. In TinkerPop3 we convert vertices
>> into maps via:
>> 
>>        g.V().has(‘name’,’marko’).valueMap() => {name:marko,age:29}
>>        g.V().has(‘name’,’marko’).valueMap(true) =>
>> {id:1,label:person,name:marko,age:29}
>> 
> 
> Maps are A-OK. In the case of properties, I think where we differ is that
> you see a property like "name" as a key/value pair in a map local to the
> vertex. I see the property as an element of type "name", with the vertex as
> a value in its local map, logically if not physically. This allows maximum
> flexibility in terms of meta-properties -- exotic beasts which seem to be
> in a kind of limbo state in TP3, but if we're trying to be as general as
> possible, some data models we might want to pull in, like GRAKN.AI, do
> allow this kind of flexibility.
> 
> 
> 
>> In the spirit of instruction reuse, we should have an asMap() instruction
>> that works for ANY object. (As a side: this gets back to ONLY sending
>> primitives over the wire, no
>> Vertex/Edge/Document/Table/Row/XML/ColumnFamily/etc.). Thus, the above is:
>> 
>>        g.V().has(‘name’,’marko’).properties().asMap() =>
>> {name:marko,age:29}
>>        g.V().has(‘name’,’marko’).asMap() =>
>> {id:1,label:person,properties:{name:marko,age:29}}
>> 
> 
> Again, no argument here, although I would think of a map as an
> optimization. IMO, the fundamental projections from v[1] are id:1 and
> label:Person. You could make a map out of these, or just use an offset,
> since the keys are always the same. However, you can also build a map
> including any key you can turn into a function. properties() is such a key.
> 
> 
> You might ask, why didn’t it go to outE and inE and map-ify that data?
>> Because those are "sibling” references, not “children” references.
>> 
>>        goto(‘outE’) is a “sibling” reference. (a vertex does not contain
>> an edge)
>>        goto(‘id’) is a “child” reference. (a vertex contains the id)
>> 
> 
> I agree with both of those statements. A vertex does not contain the edges
> incident on it. Again, I am thinking of properties a bit more like edges
> for maximum generality.
> 
> 
> 
>> Where do we find sibling references?
>>        Graphs: vertices don’t contain each other.
>>        OO heaps: many objects don’t contain each other.
>>        RDBMS: rows are linked by joins, but don’t contain each other.
>> 
> 
> Yep.
> 
> 
> So, the way in which we structure our references (pointers) determines the
>> shape of the data and ultimately how different instructions will behave. We
>> can’t assume that asMap() knows anything about
>> vertices/edges/documents/rows/tables/etc. It will simply walk all
>> child-references and create a map.
>> 
> 
> Just to play devil's advocate, you *could* include "inE" and "outE" as keys
> in the local map of a vertex; it's just a matter of what you choose to do.
> inE and outE are perfectly good functions from a vertex to a set of edges.
> 
> 
> We don’t want TP to get involved in “complex data types.”
> 
> 
> Well, how do you feel about algebraic data types? They are simple, and
> allow you to capture arbitrary relations as elements.
> 
> 
> 
>> We don’t care. You can propagate MyDatabaseObject through the TP4 VM
>> pipeline and load your object up with methods for optimizations with your
>> DB and all that, but for TP4, your object is just needs to implement:
>> 
>>        ComplexType
>>                - Iterator<T> children(String label)
>>                - Iterator<T> siblings(String label)
>>                - default Iterator<T> references(String label) {
>> IteratorUtils.concat(children(label), siblings(label)) }
>>                - String toString()
>> 
> 
> I don't think you need siblings(). I think you need a more generic
> select(), but since this is graph traversal, select() only needs the
> identifier of a type (e.g. "knows") and the name of a field (e.g. "out").
> 
> 
> 
>> When a ComplexType goes over the wire to the user, it just represented as
>> a ComplexTypeProxy with a toString() like v[1],
>> tinkergraph[vertices:10,edges:34], etc. All references are disconnected.
>> Yes, even children references. We do not want language drivers having to
>> know about random object types and have to deal with implementing
>> serializers and all that non-sense. The TP4 serialization protocol is
>> primitives, maps, lists, bytecode, and traversers. Thats it!
>> 
> 
> No disagreement here. I think the only disconnect is about what keys are
> local to what elements. Some keys are hard-local, like id and type for all
> elements, and "in" and "out" for edges and properties. These *should* be
> carried over the wire. Properties, incident edges, etc. possibly but not
> necessarily.
> 
> 
> 
>> *** Only Maps and Lists (that don’t contain complex data types) maintain
>> their child references “over the wire.”
>> 
> 
> Sure.
> 
> 
> 
>> I don’t get your hypergraph example, so let me try another example:
>> 
>>        tp ==member==> marko, josh
>> 
>> TP is a vertex and there is a directed hyperedge with label “member”
>> connecting to marko and josh vertices.
>> 
> 
> That's kind of an unlabeled hyperedge; I am not sure we need to support
> those. Look at the GRAKN data model, or at HypergraphDB or earlier
> hypergraph data models. A hyperedge is essentially a tuple in which each
> components has a label ("role", in GRAKN). In other words, it is a relation
> in which some of the columns may be foreign keys. In your example, rather
> than "member" connecting "tp" to a set of vertices, you might have
> something like Collaborated{person1:marko, person2:josh, project=tp}. Then
> a query like "who did marko collaborate with on tp?" becomes:
> 
>    tp.from("Collaborated", "project").restrict("person1",
> "marko").to("person2")
> 
> Of course, if you want this relationship to be symmetrical, you can
> introduce a constraint.
> 
> 
> 
>> tp.goto(“outE”).filter(goto(“label”).is(“member”)).goto(“inV”)
>> 
>> Looks exactly like a property graph query? However, its not because
>> goto(“inV”) returns 2 vertices, not 1.
> 
> 
> I think your example works well for the type of hypergraph you are
> referring to. It's just different than the type of hypergraph I am
> referring to. I think by now you know that I would rather see a from()
> instead of that goto("outE"). I also agree you can make a function out of
> outE, and expose it using a map, if you really want to. Under the hood,
> however, I see this as traversing a projection head to tail rather than
> tail to head.
> 
> 
> 
>> EdgeVertexFlatmapFunction works for property graphs and hypergraphs. It
>> doesn’t care — it just follows goto() pointers! That is, it follows the
>> ComplexType.references(“inV”). Multi-properties are the same as well.
>> Likewise for meta-properties. These data model variations are not “special”
>> to the TP4 VM. It just walks references whether there are 0,1,2, or N of
>> them.
>> 
> 
> At a high level, I agree with what you are saying. We should have a common
> data model that unifies traditional property graphs, hypergraphs,
> relational databases, and any other data model that can be modeled using
> algebraic data types with references. We define a small set of basic
> operations on this data model which can be combined into more complex
> operations that are amenable to static analysis and optimization. We can
> send graph data over the wire as collections of elements using the bare
> minimum of local fields, and reconstruct the graph on the other end. We can
> operate on streams of such elements under suitable conditions (elements
> sent in an appropriate order). The basic operations are not tied to the
> JVM, and should be straightforward to implement in other frameworks.
> 
> 
> 
>> 
>> Thus, what is crucial to all this is the “shape of the data.” Using your
>> pointers wisely so instructions produce useful results.
>> 
> 
> +1
> 
> 
> 
>> Does any of what I wrote update your comeFrom().goto() stuff?
> 
> 
> Sadly, no, though I appreciate that you are coming from a slightly
> different place w.r.t. properties, hypergraphs, and most importantly, the
> role of a type system.
> 
> 
> 
>> If not, can you please explain to me why comeFrom() is cool — sorry for
>> being dense (aka “being Kuppitz" — thats right, I said it. boom!).
>> 
> 
> Let's keep iterating until we reach a fixed point. Maybe Daniel's already
> there.
> 
> Josh
> 
> 
> 
>> 
>> Thanks,
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/>
>> 
>> 
>> 
>> 
>>> On Apr 23, 2019, at 10:25 AM, Joshua Shinavier <jo...@fortytwo.net>
>> wrote:
>>> 
>>> On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez <ok...@gmail.com>
>>> wrote:
>>> 
>>>> Hey Josh,
>>>> 
>>>> This gets to the notion I presented in “The Fabled GMachine.”
>>>>       http://rredux.com/the-fabled-gmachine.html <
>>>> http://rredux.com/the-fabled-gmachine.html> (first paragraph of
>>>> “Structures, Processes, and Languages” section)
>>>> 
>>>> All that exists are memory addresses that contain either:
>>>> 
>>>>       1. A primitive
>>>>       2. A set of labeled references to other references or primitives.
>>>> 
>>>> Using your work and the above, here is a super low-level ‘bytecode' for
>>>> property graphs.
>>>> 
>>>> v.goto("id") => 1
>>>> 
>>> 
>>> LGTM. An id is special because it is uniquely identifying / is a primary
>>> key for the element. However, it is also just a field of the element,
>> like
>>> "in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id
>> would
>>> only really need to be unique among other elements of the same type. To
>> the
>>> above, I would add:
>>> 
>>> v.type() => Person
>>> 
>>> ...a special operation which takes you from an element to its type. This
>> is
>>> important if unions are supported; e.g. "name" in my example can apply
>>> either to a Person or a Project.
>>> 
>>> 
>>> v.goto("label") => person
>>>> 
>>> 
>>> Or that. Like "id", "type"/"label" is special. You can think of it as a
>>> field; it's just a different sort of field which will have the same value
>>> for all elements of any given type.
>>> 
>>> 
>>> 
>>>> v.goto("properties").goto("name") => "marko"
>>>> 
>>> 
>>> OK, properties. Are properties built-in as a separate kind of thing from
>>> edges, or can we treat them the same as vertices and edges here? I think
>> we
>>> can treat them the same. A property, in the algebraic model I described
>>> above, is just an element with two fields, the second of which is a
>>> primitive value. As I said, I think we need two distinct traversal
>>> operations -- projection and selection -- and here is where we can use
>> the
>>> latter. Here, I will call it "comeFrom".
>>> 
>>> v.comeFrom("name", "out").goto("in") => {"marko"}
>>> 
>>> You can think of this comeFrom as a special case of a select() function
>>> which takes a type -- "name" -- and a set of key/value pairs {("out",
>> v)}.
>>> It returns all matching elements of the given type. You then project to
>> the
>>> "in" value using your goto. I wrote {"marko"} as a set, because comeFrom
>>> can give you multiple properties, depending on whether multi-properties
>> are
>>> supported.
>>> 
>>> Note how similar this is to an edge traversal:
>>> 
>>> v.comeFrom("knows", "out").goto("in") => {v[2], v[4]}
>>> 
>>> Of course, you could define "properties" in such a way that a
>>> goto("properties") does exactly this under the hood, but in terms of low
>>> level instructions, you need something like comeFrom.
>>> 
>>> 
>>> v.goto("properties").goto("name").goto(0) => "m"
>>>> 
>>> 
>>> This is where the notion of optionals becomes handy. You can make
>>> array/list indices into fields like this, but IMO you should also make
>> them
>>> safe. E.g. borrowing Haskell syntax for a moment:
>>> 
>>> v.goto("properties").goto("name").goto(0) => Just 'm'
>>> 
>>> v.goto("properties").goto("name").goto(5) => Nothing
>>> 
>>> 
>>> v.goto("outE").goto("inV") => v[2], v[4]
>>>> 
>>> 
>>> I am not a big fan of untyped "outE", but you can think of this as a
>> union
>>> of all v.comeFrom(x, "out").goto("in"), where x is any edge type. Only
>>> "knows" and "created" are edge types which are applicable to "Person", so
>>> you will only get {v[2], v[4]}. If you want to get really crazy, you can
>>> allow x to be any type. Then you get {v[2], v[4], 29, "marko"}.
>>> 
>>> 
>>> 
>>>> g.goto("V").goto(1) => v[1]
>>>> 
>>> 
>>> That, or you give every element a virtual field called "graph". So:
>>> 
>>> v.goto("graph") => g
>>> 
>>> g.comeFrom("Person", "graph") => {v[1], v[2], v[4], v[6]}
>>> 
>>> g.comeFrom("Person", "graph").restrict("id", 1)
>>> 
>>> ...where restrict() is the relational "sigma" operation as above, not to
>> be
>>> confused with TinkerPop's select(), filter(), or has() steps. Again, I
>>> prefer to specify a type in comeFrom (i.e. we're looking specifically
>> for a
>>> Person with id of 1), but you could also do a comprehension g.comeFrom(x,
>>> "graph"), letting x range over all types.
>>> 
>>> 
>>> 
>>>> The goto() instruction moves the “memory reference” (traverser) from the
>>>> current “memory address” to the “memory address” referenced by the
>> goto()
>>>> argument.
>>>> 
>>> 
>>> Agreed, if we also think of primitive values as memory references.
>>> 
>>> 
>>> 
>>>> The Gremlin expression:
>>>> 
>>>>       g.V().has(‘name’,’marko’).out(‘knows’).drop()
>>>> 
>>>> ..would compile to:
>>>> 
>>>> 
>>>> 
>> g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()
>>>> 
>>> 
>>> 
>>> In the alternate universe:
>>> 
>>> g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in",
>>> "marko").goto("out").comeFrom("knows", "out").goto("in").free()
>>> 
>>> I have wimped out on free() and just left it as you had it, but I think
>> it
>>> would be worthwhile to explore a monadic syntax for traversals with
>>> side-effects. Different topic.
>>> 
>>> Now, all of this "out", "in" business is getting pretty repetitive,
>> right?
>>> Well, the field names become more diverse if we allow hyper-edges and
>>> generalized ADTs. E.g. in my Trip example, say I want to know all
>> drop-off
>>> locations for a given rider:
>>> 
>>> u.comeFrom("Trip", "rider").goto("dropoff").goto("place")
>>> 
>>> Done.
>>> 
>>> 
>>> 
>>>> If we can get things that “low-level” and still efficient to compile,
>> then
>>>> we can model every data structure. All you are doing is pointer chasing
>>>> through a withStructure() data structure. .
>>>> 
>>> 
>>> Agreed.
>>> 
>>> 
>>> No one would ever want to write strategies for goto()-based Bytecode.
>>> 
>>> 
>>> Also agreed.
>>> 
>>> 
>>> 
>>>> Thus, perhaps there could be a PropertyGraphDecorationStrategy that
>> does:
>>>> 
>>>> [...]
>>> 
>>> 
>>> No argument here, though the alternate-universe "bytecode" would look
>>> slightly different. And the high-level syntax should also be able to deal
>>> with generalized relations / data types gracefully. As a thought
>>> experiment, suppose we were to define the steps to() as your goto(), and
>>> from() as my comeFrom(). Then traversals like:
>>> 
>>> u.from("Trip", "rider").to("dropoff").to("time")
>>> 
>>> ...look pretty good as-is, and are not too low-level. However, ordinary
>>> edge traversals like:
>>> 
>>> v.from("knows", "out").to("in")
>>> 
>>> ...do look a little Assembly-like. So in/out/both etc. remain as they
>> are,
>>> but are shorthand for from() and to() steps using "out" or "in":
>>> 
>>> v.out("knows") === v.outE("knows").inV() === v.from("knows",
>> "out").to("in")
>>> 
>>> 
>>> [I AM NOW GOING OFF THE RAILS]
>>>> [sniiiiip]
>>>> 
>>> 
>>> Sure. Again, I like the idea of wrapping side-effects in monads. What
>> would
>>> that look like in a Gremlinesque fluent syntax? I don't quite know, but
>> if
>>> we think of the dot as a monadic bind operation like Haskell's >>=, then
>>> perhaps the monadic expressions look pretty similar to what you have just
>>> sketched out. Might have to be careful about what it means to nest
>>> operations as in your addEdge examples.
>>> 
>>> 
>>> 
>>> [I AM NOW BACK ON THE RAILS]
>>>> 
>>>> Its as if “properties”, “outE”, “label”, “inV”, etc. references mean
>>>> something to property graph providers and they can do more intelligent
>>>> stuff than what MongoDB would do with such information. However,
>> someone,
>>>> of course, can create a MongoDBPropertyGraphStrategy that would make
>>>> documents look like vertices and edges and then use O(log(n)) lookups on
>>>> ids to walk the graph. However, if that didn’t exist, it would still do
>>>> something that works even if its horribly inefficient as every database
>> can
>>>> make primitives with references between them!
>>>> 
>>> 
>>> I'm on the same same pair of rails.
>>> 
>>> 
>>> 
>>>> Anywho @Josh, I believe goto() is what you are doing with
>> multi-references
>>>> off an object. How do we make it all clean, easy, and universal?
>>>> 
>>> 
>>> Let me know what you think of the above.
>>> 
>>> Josh
>>> 
>>> 
>>> 
>>>> 
>>>> Marko.
>>>> 
>>>> http://rredux.com <http://rredux.com/>
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <jo...@fortytwo.net>
>> wrote:
>>>>> 
>>>>> Ah, glad you asked. It's all in the pictures. I have nowhere to put
>> them
>>>> online at the moment... maybe this attachment will go through to the
>> list?
>>>>> 
>>>>> Btw. David Spivak gave his talk today at Uber; it was great. Juan
>>>> Sequeda (relational <--> RDF mapping guy) was also here, and Ryan joined
>>>> remotely. Really interesting discussion about databases vs. graphs, and
>>>> what category theory brings to the table.
>>>>> 
>>>>> 
>>>>> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <okrammarko@gmail.com
>>>> <ma...@gmail.com>> wrote:
>>>>> Hey Josh,
>>>>> 
>>>>> I’m digging what you are saying, but the pictures didn’t come through
>>>> for me ? … Can you provide them again (or if dev@ is filtering them,
>> can
>>>> you give me URLs to them)?
>>>>> 
>>>>> Thanks,
>>>>> Marko.
>>>>> 
>>>>> 
>>>>>> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <josh@fortytwo.net
>>>> <ma...@fortytwo.net>> wrote:
>>>>>> 
>>>>>> On the subject of "reified joins", maybe be a picture will be worth a
>>>> few words. As I said in the thread <
>>>> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ
>> <
>>>> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ
>>>> 
>>>> on property graph standardization, if you think of vertex labels, edge
>>>> labels, and property keys as types, each with projections to two other
>>>> types, there is a nice analogy with relations of two columns, and this
>>>> analogy can be easily extended to hyper-edges. Here is what the schema
>> of
>>>> the TinkerPop classic graph looks like if you make each type (e.g.
>> Person,
>>>> Project, knows, name) into a relation:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I have made the vertex types salmon-colored, the edge types yellow,
>>>> the property types green, and the data types blue. The "o" and "I"
>> columns
>>>> represent the out-type (e.g. out-vertex type of Person) and in-type
>> (e.g.
>>>> property value type of String) of each relation. More than two arrows
>> from
>>>> a column represent a coproduct, e.g. the out-type of "name" is Person OR
>>>> Project. Now you can think of out() and in() as joins of two tables on a
>>>> primary and foreign key.
>>>>>> 
>>>>>> We are not limited to "out" and "in", however. Here is the ternary
>>>> relationship (hyper-edge) from hyper-edge slide <
>>>> 
>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
>>>> <
>>>> 
>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
>>>> 
>>>> of my Graph Day preso, which has three columns/roles/projections:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I have drawn Says in light blue to indicate that it is a generalized
>>>> element; it has projections other than "out" and "in". Now the line
>> between
>>>> relations and edges begins to blur. E.g. in the following, is
>> PlaceEvent a
>>>> vertex or a property?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> With the right type system, we can just speak of graph elements, and
>>>> use "vertex", "edge", "property" when it is convenient. In the
>> relational
>>>> model, they are relations. If you materialize them in a relational
>>>> database, they are rows. In any case, you need two basic graph traversal
>>>> operations:
>>>>>> project() -- forward traversal of the arrows in the above diagrams.
>>>> Takes you from an element to a component like in-vertex.
>>>>>> select() -- reverse traversal of the arrows. Allows you to answer
>>>> questions like "in which Trips is John Doe the rider?"
>>>>>> 
>>>>>> Josh
>>>>>> 
>>>>>> 
>>>>>> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <
>> okrammarko@gmail.com
>>>> <ma...@gmail.com> <mailto:okrammarko@gmail.com <mailto:
>>>> okrammarko@gmail.com>>> wrote:
>>>>>> Hello,
>>>>>> 
>>>>>> I agree with everything you say. Here is my question:
>>>>>> 
>>>>>>       Relational database — join: Table x Table x equality function
>>>> -> Table
>>>>>>       Graph database — traverser: Vertex x edge label -> Vertex
>>>>>> 
>>>>>> I want a single function that does both. The only think was to
>>>> represent traverser() in terms of join():
>>>>>> 
>>>>>>       Graph database — traverser: Vertices x Vertex x equality
>>>> function -> Vertices
>>>>>> 
>>>>>> For example,
>>>>>> 
>>>>>> V().out(‘address’)
>>>>>> 
>>>>>>       ==>
>>>>>> 
>>>>>> g.join(V().hasLabel(‘person’).as(‘a’)
>>>>>>      V().hasLabel(‘addresses’).as(‘b’)).
>>>>>>        by(‘name’).select(?address vertex?)
>>>>>> 
>>>>>> That is, join the vertices with themselves based on some predicate to
>>>> go from vertices to vertices.
>>>>>> 
>>>>>> However, I would like instead to transform the relational database
>>>> join() concept into a traverser() concept. Kuppitz and I were talking
>> the
>>>> other day about a link() type operator that says: “try and link to this
>>>> thing in some specified way.” .. ?? The problem we ran into is again,
>> “link
>>>> it to what?”
>>>>>> 
>>>>>>       - in graph, the ‘to what’ is hardcoded so you don’t need to
>>>> specify anything.
>>>>>>       - in rdbms, the ’to what’ is some other specified table.
>>>>>> 
>>>>>> So what does the link() operator look like?
>>>>>> 
>>>>>> ——
>>>>>> 
>>>>>> Some other random thoughts….
>>>>>> 
>>>>>> Relational databases join on the table (the whole collection)
>>>>>> Graph databases traverser on the vertex (an element of the whole
>>>> collection)
>>>>>> 
>>>>>> We can make a relational database join on single row (by providing a
>>>> filter to a particular primary key). This is the same as a table with
>> one
>>>> row. Likewise, for graph in the join() context above:
>>>>>> 
>>>>>> V(1).out(‘address’)
>>>>>> 
>>>>>>       ==>
>>>>>> 
>>>>>> g.join(V(1).as(‘a’)
>>>>>>      V().hasLabel(‘addresses’).as(‘b’)).
>>>>>>        by(‘name’).select(?address vertex?)
>>>>>> 
>>>>>> More thoughts please….
>>>>>> 
>>>>>> Marko.
>>>>>> 
>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
>>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>>>> http://rredux.com/ <http://rredux.com/>>>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.martin@gmail.com
>>>> <ma...@gmail.com> <mailto:pieter.martin@gmail.com
>> <mailto:
>>>> pieter.martin@gmail.com>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> The way I saw it is that the big difference is that graph's have
>>>>>>> reified joins. This is both a blessing and a curse.
>>>>>>> A blessing because its much easier (less text to type, less mistakes,
>>>>>>> clearer semantics...) to traverse an edge than to construct a manual
>>>>>>> join.A curse because there are almost always far more ways to
>>>> traverse
>>>>>>> a data set than just by the edges some architect might have
>>>> considered
>>>>>>> when creating the data set. Often the architect is not the domain
>>>>>>> expert and the edges are a hardcoded layout of the dataset, which
>>>>>>> almost certainly won't survive the real world's demands. In graphs,
>>>> if
>>>>>>> their are no edges then the data is not reachable, except via indexed
>>>>>>> lookups. This is the standard engineering problem of database design,
>>>>>>> but it is important and useful that data can be traversed, joined,
>>>>>>> without having reified edges.
>>>>>>> In Sqlg at least, but I suspect it generalizes, I want to create the
>>>>>>> notion of a "virtual edge". Which in meta data describes the join and
>>>>>>> then the standard to(direction, "virtualEdgeName") will work.
>>>>>>> In a way this is precisely to keep the graphy nature of gremlin, i.e.
>>>>>>> traversing edges, and avoid using the manual join syntax you
>>>> described.
>>>>>>> CheersPieter
>>>>>>> 
>>>>>>> On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
>>>>>>>> Hi,
>>>>>>>> *** This is mainly for Kuppitz, but if others care.
>>>>>>>> Was thinking last night about relational data and Gremlin. The T()
>>>>>>>> step returns all the tables in the withStructure() RDBMS database.
>>>>>>>> Tables are ‘complex values’ so they can't leave the VM (only a
>>>> simple
>>>>>>>> ‘toString’).
>>>>>>>> Below is a fake Gremlin session. (and these are just ideas…) tables
>>>>>>>> -> a ListLike of rows        rows -> a MapLike of primitives
>>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>>>>>>>> g.T(‘people’)==>t[people]gremlin>
>>>>>>>> 
>>>> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
>>>>>>>> in>
>>>>>>>> 
>>>> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
>>>>>>>> age:10}==>{name:josh,age:35}gremlin>
>>>>>>>> 
>>>> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
>>>>>>>> remlin>
>>>>>>>> 
>>>> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
>>>>>>>> h
>>>>>>>> Makes sense. Nice that values() and has() generally apply to all
>>>>>>>> ListLike and MapLike structures. Also, note how asMap() is the
>>>>>>>> valueMap() of TP4, but generalizes to anything that is MapLike so it
>>>>>>>> can be turned into a primitive form as a data-rich result from the
>>>>>>>> VM.
>>>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>>>>>>>> 
>>>> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
>>>>>>>> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>>>> by(se
>>>>>>>> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
>>>> 
>>>>>>>> values().asMap()==>{a.name:marko,a.age:29,b.name:
>>>> marko,b.city:santafe
>>>>>>>> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{
>>>> a.name <http://a.name/> <http://a.name/ <http://a.name/>>:
>>>>>>>> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>>>> by(’n
>>>>>>>> ame’). // shorthand for equijoin on name
>>>>>>>> column/key           values().asMap()==>{a.name:marko,a.age:29,
>>>> b.name <http://b.name/> <http://b.name/ <http://b.name/>>
>>>>>>>> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,
>>>> b.ci <http://b.ci/> <http://b.ci/ <http://b.ci/>>
>>>>>>>> ty:tucson}==>{a.name:josh,a.age:35,b.name:
>>>> josh,b.city:desertisland}gr
>>>>>>>> emlin>
>>>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>>>> by(’n
>>>>>>>> ame’)==>t[people<-name->addresses]  // without asMap(), just the
>>>>>>>> complex value ‘toString'gremlin>
>>>>>>>> And of course, all of this is strategized into a SQL call so its
>>>>>>>> joins aren’t necessarily computed using TP4-VM resources.
>>>>>>>> Anywho — what I hope to realize is the relationship between “links”
>>>>>>>> (graph) and “joins” (tables). How can we make (bytecode-wise at
>>>>>>>> least) RDBMS join operations and graph traversal operations ‘the
>>>>>>>> same.’?
>>>>>>>>    Singleton: Integer, String, Float, Double, etc. Collection:
>>>>>>>> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
>>>>>>>> Vertices and Tables can be “linked.” Unlike Collections, they don’t
>>>>>>>> maintain a “parent/child” relationship with the objects they
>>>>>>>> reference. What does this mean……….?
>>>>>>>> Take care,Marko.
>>>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
>>>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>>>> http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ <
>>>> http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <
>>>> http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <
>>>> http://rredux.com/>>>>
>>>>> 
>>>>> <diagrams.zip>
>>>> 
>>>> 
>> 
>>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Joshua Shinavier <jo...@fortytwo.net>.

On Tue, Apr 23, 2019 at 10:28 AM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi,
>
> I think we are very close to something useable for TP4 structure/. Solving
> this problem elegantly will open the flood gates on tp4/ development.
>

Yes, and formality often brings elegance. I don't think we can do much
better than relational algebra and relational calculus in terms of
formality, so to the extent we can reduce the fundamental TP4 traversal
steps to basic relational operations, the floodgates will also be open to
applications of query validation and query optimization from the last 40+
years of research.



> I still don’t grock your comeFrom().goto() stuff. I don’t get the benefit
> of having two instructions for “pointer chasing” instead of one.
>

There are just a handful of basic operations in relational algebra.
Projection, selection, union, complement, Cartesian product. Joins, as well
as all other operations, can be derived from these. A lot of graph
traversal can be accomplished using only projection and selection, which is
why we were able to get away with only to/goto and from/comeFrom in the
examples above. However, I believe you do need both operations. You can
kind of get away without from() if you assume that each vertex has local
inE and outE references to incoming and outgoing edges, but I see that as a
kind of pre-materialized from()/select(). If you think of edges strictly as
relations, and represent them in a straightforward way with tables, you
don't need the local inE and outE; whether you have them depends on the
graph back-end.



> Lets put that aside for now and lets turn to modeling a Vertex. Go back to
> my original representation:
>
> vertex.goto(‘label’)
> vertex.goto(‘id’)
>

Local (in my view). All good.



> vertex.goto(‘outE’)
> vertex.goto(‘inE’)
> vertex.goto(‘properties’)
>

Non-local (in my view). You can use goto(), but if the goal is to bring the
relational model into the fold, at a lower level you do have a select()
operation. Unless you make projections local to vertices instead of edges,
but then you just have the same problem in reverse. Am I making sense?


Any object can be converted into a Map. In TinkerPop3 we convert vertices
> into maps via:
>
>         g.V().has(‘name’,’marko’).valueMap() => {name:marko,age:29}
>         g.V().has(‘name’,’marko’).valueMap(true) =>
> {id:1,label:person,name:marko,age:29}
>

Maps are A-OK. In the case of properties, I think where we differ is that
you see a property like "name" as a key/value pair in a map local to the
vertex. I see the property as an element of type "name", with the vertex as
a value in its local map, logically if not physically. This allows maximum
flexibility in terms of meta-properties -- exotic beasts which seem to be
in a kind of limbo state in TP3, but if we're trying to be as general as
possible, some data models we might want to pull in, like GRAKN.AI, do
allow this kind of flexibility.



> In the spirit of instruction reuse, we should have an asMap() instruction
> that works for ANY object. (As a side: this gets back to ONLY sending
> primitives over the wire, no
> Vertex/Edge/Document/Table/Row/XML/ColumnFamily/etc.). Thus, the above is:
>
>         g.V().has(‘name’,’marko’).properties().asMap() =>
> {name:marko,age:29}
>         g.V().has(‘name’,’marko’).asMap() =>
> {id:1,label:person,properties:{name:marko,age:29}}
>

Again, no argument here, although I would think of a map as an
optimization. IMO, the fundamental projections from v[1] are id:1 and
label:Person. You could make a map out of these, or just use an offset,
since the keys are always the same. However, you can also build a map
including any key you can turn into a function. properties() is such a key.


You might ask, why didn’t it go to outE and inE and map-ify that data?
> Because those are "sibling” references, not “children” references.
>
>         goto(‘outE’) is a “sibling” reference. (a vertex does not contain
> an edge)
>         goto(‘id’) is a “child” reference. (a vertex contains the id)
>

I agree with both of those statements. A vertex does not contain the edges
incident on it. Again, I am thinking of properties a bit more like edges
for maximum generality.



> Where do we find sibling references?
>         Graphs: vertices don’t contain each other.
>         OO heaps: many objects don’t contain each other.
>         RDBMS: rows are linked by joins, but don’t contain each other.
>

Yep.


So, the way in which we structure our references (pointers) determines the
> shape of the data and ultimately how different instructions will behave. We
> can’t assume that asMap() knows anything about
> vertices/edges/documents/rows/tables/etc. It will simply walk all
> child-references and create a map.
>

Just to play devil's advocate, you *could* include "inE" and "outE" as keys
in the local map of a vertex; it's just a matter of what you choose to do.
inE and outE are perfectly good functions from a vertex to a set of edges.


We don’t want TP to get involved in “complex data types.”


Well, how do you feel about algebraic data types? They are simple, and
allow you to capture arbitrary relations as elements.



> We don’t care. You can propagate MyDatabaseObject through the TP4 VM
> pipeline and load your object up with methods for optimizations with your
> DB and all that, but for TP4, your object is just needs to implement:
>
>         ComplexType
>                 - Iterator<T> children(String label)
>                 - Iterator<T> siblings(String label)
>                 - default Iterator<T> references(String label) {
> IteratorUtils.concat(children(label), siblings(label)) }
>                 - String toString()
>

I don't think you need siblings(). I think you need a more generic
select(), but since this is graph traversal, select() only needs the
identifier of a type (e.g. "knows") and the name of a field (e.g. "out").



> When a ComplexType goes over the wire to the user, it just represented as
> a ComplexTypeProxy with a toString() like v[1],
> tinkergraph[vertices:10,edges:34], etc. All references are disconnected.
> Yes, even children references. We do not want language drivers having to
> know about random object types and have to deal with implementing
> serializers and all that non-sense. The TP4 serialization protocol is
> primitives, maps, lists, bytecode, and traversers. Thats it!
>

No disagreement here. I think the only disconnect is about what keys are
local to what elements. Some keys are hard-local, like id and type for all
elements, and "in" and "out" for edges and properties. These *should* be
carried over the wire. Properties, incident edges, etc. possibly but not
necessarily.



> *** Only Maps and Lists (that don’t contain complex data types) maintain
> their child references “over the wire.”
>

Sure.



> I don’t get your hypergraph example, so let me try another example:
>
>         tp ==member==> marko, josh
>
> TP is a vertex and there is a directed hyperedge with label “member”
> connecting to marko and josh vertices.
>

That's kind of an unlabeled hyperedge; I am not sure we need to support
those. Look at the GRAKN data model, or at HypergraphDB or earlier
hypergraph data models. A hyperedge is essentially a tuple in which each
components has a label ("role", in GRAKN). In other words, it is a relation
in which some of the columns may be foreign keys. In your example, rather
than "member" connecting "tp" to a set of vertices, you might have
something like Collaborated{person1:marko, person2:josh, project=tp}. Then
a query like "who did marko collaborate with on tp?" becomes:

    tp.from("Collaborated", "project").restrict("person1",
"marko").to("person2")

Of course, if you want this relationship to be symmetrical, you can
introduce a constraint.



> tp.goto(“outE”).filter(goto(“label”).is(“member”)).goto(“inV”)
>
> Looks exactly like a property graph query? However, its not because
> goto(“inV”) returns 2 vertices, not 1.


I think your example works well for the type of hypergraph you are
referring to. It's just different than the type of hypergraph I am
referring to. I think by now you know that I would rather see a from()
instead of that goto("outE"). I also agree you can make a function out of
outE, and expose it using a map, if you really want to. Under the hood,
however, I see this as traversing a projection head to tail rather than
tail to head.



> EdgeVertexFlatmapFunction works for property graphs and hypergraphs. It
> doesn’t care — it just follows goto() pointers! That is, it follows the
> ComplexType.references(“inV”). Multi-properties are the same as well.
> Likewise for meta-properties. These data model variations are not “special”
> to the TP4 VM. It just walks references whether there are 0,1,2, or N of
> them.
>

At a high level, I agree with what you are saying. We should have a common
data model that unifies traditional property graphs, hypergraphs,
relational databases, and any other data model that can be modeled using
algebraic data types with references. We define a small set of basic
operations on this data model which can be combined into more complex
operations that are amenable to static analysis and optimization. We can
send graph data over the wire as collections of elements using the bare
minimum of local fields, and reconstruct the graph on the other end. We can
operate on streams of such elements under suitable conditions (elements
sent in an appropriate order). The basic operations are not tied to the
JVM, and should be straightforward to implement in other frameworks.



>
> Thus, what is crucial to all this is the “shape of the data.” Using your
> pointers wisely so instructions produce useful results.
>

+1



> Does any of what I wrote update your comeFrom().goto() stuff?


Sadly, no, though I appreciate that you are coming from a slightly
different place w.r.t. properties, hypergraphs, and most importantly, the
role of a type system.



> If not, can you please explain to me why comeFrom() is cool — sorry for
> being dense (aka “being Kuppitz" — thats right, I said it. boom!).
>

Let's keep iterating until we reach a fixed point. Maybe Daniel's already
there.

Josh



>
> Thanks,
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
> > On Apr 23, 2019, at 10:25 AM, Joshua Shinavier <jo...@fortytwo.net>
> wrote:
> >
> > On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez <ok...@gmail.com>
> > wrote:
> >
> >> Hey Josh,
> >>
> >> This gets to the notion I presented in “The Fabled GMachine.”
> >>        http://rredux.com/the-fabled-gmachine.html <
> >> http://rredux.com/the-fabled-gmachine.html> (first paragraph of
> >> “Structures, Processes, and Languages” section)
> >>
> >> All that exists are memory addresses that contain either:
> >>
> >>        1. A primitive
> >>        2. A set of labeled references to other references or primitives.
> >>
> >> Using your work and the above, here is a super low-level ‘bytecode' for
> >> property graphs.
> >>
> >> v.goto("id") => 1
> >>
> >
> > LGTM. An id is special because it is uniquely identifying / is a primary
> > key for the element. However, it is also just a field of the element,
> like
> > "in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id
> would
> > only really need to be unique among other elements of the same type. To
> the
> > above, I would add:
> >
> > v.type() => Person
> >
> > ...a special operation which takes you from an element to its type. This
> is
> > important if unions are supported; e.g. "name" in my example can apply
> > either to a Person or a Project.
> >
> >
> > v.goto("label") => person
> >>
> >
> > Or that. Like "id", "type"/"label" is special. You can think of it as a
> > field; it's just a different sort of field which will have the same value
> > for all elements of any given type.
> >
> >
> >
> >> v.goto("properties").goto("name") => "marko"
> >>
> >
> > OK, properties. Are properties built-in as a separate kind of thing from
> > edges, or can we treat them the same as vertices and edges here? I think
> we
> > can treat them the same. A property, in the algebraic model I described
> > above, is just an element with two fields, the second of which is a
> > primitive value. As I said, I think we need two distinct traversal
> > operations -- projection and selection -- and here is where we can use
> the
> > latter. Here, I will call it "comeFrom".
> >
> > v.comeFrom("name", "out").goto("in") => {"marko"}
> >
> > You can think of this comeFrom as a special case of a select() function
> > which takes a type -- "name" -- and a set of key/value pairs {("out",
> v)}.
> > It returns all matching elements of the given type. You then project to
> the
> > "in" value using your goto. I wrote {"marko"} as a set, because comeFrom
> > can give you multiple properties, depending on whether multi-properties
> are
> > supported.
> >
> > Note how similar this is to an edge traversal:
> >
> > v.comeFrom("knows", "out").goto("in") => {v[2], v[4]}
> >
> > Of course, you could define "properties" in such a way that a
> > goto("properties") does exactly this under the hood, but in terms of low
> > level instructions, you need something like comeFrom.
> >
> >
> > v.goto("properties").goto("name").goto(0) => "m"
> >>
> >
> > This is where the notion of optionals becomes handy. You can make
> > array/list indices into fields like this, but IMO you should also make
> them
> > safe. E.g. borrowing Haskell syntax for a moment:
> >
> > v.goto("properties").goto("name").goto(0) => Just 'm'
> >
> > v.goto("properties").goto("name").goto(5) => Nothing
> >
> >
> > v.goto("outE").goto("inV") => v[2], v[4]
> >>
> >
> > I am not a big fan of untyped "outE", but you can think of this as a
> union
> > of all v.comeFrom(x, "out").goto("in"), where x is any edge type. Only
> > "knows" and "created" are edge types which are applicable to "Person", so
> > you will only get {v[2], v[4]}. If you want to get really crazy, you can
> > allow x to be any type. Then you get {v[2], v[4], 29, "marko"}.
> >
> >
> >
> >> g.goto("V").goto(1) => v[1]
> >>
> >
> > That, or you give every element a virtual field called "graph". So:
> >
> > v.goto("graph") => g
> >
> > g.comeFrom("Person", "graph") => {v[1], v[2], v[4], v[6]}
> >
> > g.comeFrom("Person", "graph").restrict("id", 1)
> >
> > ...where restrict() is the relational "sigma" operation as above, not to
> be
> > confused with TinkerPop's select(), filter(), or has() steps. Again, I
> > prefer to specify a type in comeFrom (i.e. we're looking specifically
> for a
> > Person with id of 1), but you could also do a comprehension g.comeFrom(x,
> > "graph"), letting x range over all types.
> >
> >
> >
> >> The goto() instruction moves the “memory reference” (traverser) from the
> >> current “memory address” to the “memory address” referenced by the
> goto()
> >> argument.
> >>
> >
> > Agreed, if we also think of primitive values as memory references.
> >
> >
> >
> >> The Gremlin expression:
> >>
> >>        g.V().has(‘name’,’marko’).out(‘knows’).drop()
> >>
> >> ..would compile to:
> >>
> >>
> >>
> g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()
> >>
> >
> >
> > In the alternate universe:
> >
> > g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in",
> > "marko").goto("out").comeFrom("knows", "out").goto("in").free()
> >
> > I have wimped out on free() and just left it as you had it, but I think
> it
> > would be worthwhile to explore a monadic syntax for traversals with
> > side-effects. Different topic.
> >
> > Now, all of this "out", "in" business is getting pretty repetitive,
> right?
> > Well, the field names become more diverse if we allow hyper-edges and
> > generalized ADTs. E.g. in my Trip example, say I want to know all
> drop-off
> > locations for a given rider:
> >
> > u.comeFrom("Trip", "rider").goto("dropoff").goto("place")
> >
> > Done.
> >
> >
> >
> >> If we can get things that “low-level” and still efficient to compile,
> then
> >> we can model every data structure. All you are doing is pointer chasing
> >> through a withStructure() data structure. .
> >>
> >
> > Agreed.
> >
> >
> > No one would ever want to write strategies for goto()-based Bytecode.
> >
> >
> > Also agreed.
> >
> >
> >
> >> Thus, perhaps there could be a PropertyGraphDecorationStrategy that
> does:
> >>
> >> [...]
> >
> >
> > No argument here, though the alternate-universe "bytecode" would look
> > slightly different. And the high-level syntax should also be able to deal
> > with generalized relations / data types gracefully. As a thought
> > experiment, suppose we were to define the steps to() as your goto(), and
> > from() as my comeFrom(). Then traversals like:
> >
> > u.from("Trip", "rider").to("dropoff").to("time")
> >
> > ...look pretty good as-is, and are not too low-level. However, ordinary
> > edge traversals like:
> >
> > v.from("knows", "out").to("in")
> >
> > ...do look a little Assembly-like. So in/out/both etc. remain as they
> are,
> > but are shorthand for from() and to() steps using "out" or "in":
> >
> > v.out("knows") === v.outE("knows").inV() === v.from("knows",
> "out").to("in")
> >
> >
> > [I AM NOW GOING OFF THE RAILS]
> >> [sniiiiip]
> >>
> >
> > Sure. Again, I like the idea of wrapping side-effects in monads. What
> would
> > that look like in a Gremlinesque fluent syntax? I don't quite know, but
> if
> > we think of the dot as a monadic bind operation like Haskell's >>=, then
> > perhaps the monadic expressions look pretty similar to what you have just
> > sketched out. Might have to be careful about what it means to nest
> > operations as in your addEdge examples.
> >
> >
> >
> > [I AM NOW BACK ON THE RAILS]
> >>
> >> Its as if “properties”, “outE”, “label”, “inV”, etc. references mean
> >> something to property graph providers and they can do more intelligent
> >> stuff than what MongoDB would do with such information. However,
> someone,
> >> of course, can create a MongoDBPropertyGraphStrategy that would make
> >> documents look like vertices and edges and then use O(log(n)) lookups on
> >> ids to walk the graph. However, if that didn’t exist, it would still do
> >> something that works even if its horribly inefficient as every database
> can
> >> make primitives with references between them!
> >>
> >
> > I'm on the same same pair of rails.
> >
> >
> >
> >> Anywho @Josh, I believe goto() is what you are doing with
> multi-references
> >> off an object. How do we make it all clean, easy, and universal?
> >>
> >
> > Let me know what you think of the above.
> >
> > Josh
> >
> >
> >
> >>
> >> Marko.
> >>
> >> http://rredux.com <http://rredux.com/>
> >>
> >>
> >>
> >>
> >>> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <jo...@fortytwo.net>
> wrote:
> >>>
> >>> Ah, glad you asked. It's all in the pictures. I have nowhere to put
> them
> >> online at the moment... maybe this attachment will go through to the
> list?
> >>>
> >>> Btw. David Spivak gave his talk today at Uber; it was great. Juan
> >> Sequeda (relational <--> RDF mapping guy) was also here, and Ryan joined
> >> remotely. Really interesting discussion about databases vs. graphs, and
> >> what category theory brings to the table.
> >>>
> >>>
> >>> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <okrammarko@gmail.com
> >> <ma...@gmail.com>> wrote:
> >>> Hey Josh,
> >>>
> >>> I’m digging what you are saying, but the pictures didn’t come through
> >> for me ? … Can you provide them again (or if dev@ is filtering them,
> can
> >> you give me URLs to them)?
> >>>
> >>> Thanks,
> >>> Marko.
> >>>
> >>>
> >>>> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <josh@fortytwo.net
> >> <ma...@fortytwo.net>> wrote:
> >>>>
> >>>> On the subject of "reified joins", maybe be a picture will be worth a
> >> few words. As I said in the thread <
> >> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ
> <
> >> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ
> >>
> >> on property graph standardization, if you think of vertex labels, edge
> >> labels, and property keys as types, each with projections to two other
> >> types, there is a nice analogy with relations of two columns, and this
> >> analogy can be easily extended to hyper-edges. Here is what the schema
> of
> >> the TinkerPop classic graph looks like if you make each type (e.g.
> Person,
> >> Project, knows, name) into a relation:
> >>>>
> >>>>
> >>>>
> >>>> I have made the vertex types salmon-colored, the edge types yellow,
> >> the property types green, and the data types blue. The "o" and "I"
> columns
> >> represent the out-type (e.g. out-vertex type of Person) and in-type
> (e.g.
> >> property value type of String) of each relation. More than two arrows
> from
> >> a column represent a coproduct, e.g. the out-type of "name" is Person OR
> >> Project. Now you can think of out() and in() as joins of two tables on a
> >> primary and foreign key.
> >>>>
> >>>> We are not limited to "out" and "in", however. Here is the ternary
> >> relationship (hyper-edge) from hyper-edge slide <
> >>
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
> >> <
> >>
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
> >>
> >> of my Graph Day preso, which has three columns/roles/projections:
> >>>>
> >>>>
> >>>>
> >>>> I have drawn Says in light blue to indicate that it is a generalized
> >> element; it has projections other than "out" and "in". Now the line
> between
> >> relations and edges begins to blur. E.g. in the following, is
> PlaceEvent a
> >> vertex or a property?
> >>>>
> >>>>
> >>>>
> >>>> With the right type system, we can just speak of graph elements, and
> >> use "vertex", "edge", "property" when it is convenient. In the
> relational
> >> model, they are relations. If you materialize them in a relational
> >> database, they are rows. In any case, you need two basic graph traversal
> >> operations:
> >>>> project() -- forward traversal of the arrows in the above diagrams.
> >> Takes you from an element to a component like in-vertex.
> >>>> select() -- reverse traversal of the arrows. Allows you to answer
> >> questions like "in which Trips is John Doe the rider?"
> >>>>
> >>>> Josh
> >>>>
> >>>>
> >>>> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <
> okrammarko@gmail.com
> >> <ma...@gmail.com> <mailto:okrammarko@gmail.com <mailto:
> >> okrammarko@gmail.com>>> wrote:
> >>>> Hello,
> >>>>
> >>>> I agree with everything you say. Here is my question:
> >>>>
> >>>>        Relational database — join: Table x Table x equality function
> >> -> Table
> >>>>        Graph database — traverser: Vertex x edge label -> Vertex
> >>>>
> >>>> I want a single function that does both. The only think was to
> >> represent traverser() in terms of join():
> >>>>
> >>>>        Graph database — traverser: Vertices x Vertex x equality
> >> function -> Vertices
> >>>>
> >>>> For example,
> >>>>
> >>>> V().out(‘address’)
> >>>>
> >>>>        ==>
> >>>>
> >>>> g.join(V().hasLabel(‘person’).as(‘a’)
> >>>>       V().hasLabel(‘addresses’).as(‘b’)).
> >>>>         by(‘name’).select(?address vertex?)
> >>>>
> >>>> That is, join the vertices with themselves based on some predicate to
> >> go from vertices to vertices.
> >>>>
> >>>> However, I would like instead to transform the relational database
> >> join() concept into a traverser() concept. Kuppitz and I were talking
> the
> >> other day about a link() type operator that says: “try and link to this
> >> thing in some specified way.” .. ?? The problem we ran into is again,
> “link
> >> it to what?”
> >>>>
> >>>>        - in graph, the ‘to what’ is hardcoded so you don’t need to
> >> specify anything.
> >>>>        - in rdbms, the ’to what’ is some other specified table.
> >>>>
> >>>> So what does the link() operator look like?
> >>>>
> >>>> ——
> >>>>
> >>>> Some other random thoughts….
> >>>>
> >>>> Relational databases join on the table (the whole collection)
> >>>> Graph databases traverser on the vertex (an element of the whole
> >> collection)
> >>>>
> >>>> We can make a relational database join on single row (by providing a
> >> filter to a particular primary key). This is the same as a table with
> one
> >> row. Likewise, for graph in the join() context above:
> >>>>
> >>>> V(1).out(‘address’)
> >>>>
> >>>>        ==>
> >>>>
> >>>> g.join(V(1).as(‘a’)
> >>>>       V().hasLabel(‘addresses’).as(‘b’)).
> >>>>         by(‘name’).select(?address vertex?)
> >>>>
> >>>> More thoughts please….
> >>>>
> >>>> Marko.
> >>>>
> >>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> >> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
> >> http://rredux.com/ <http://rredux.com/>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.martin@gmail.com
> >> <ma...@gmail.com> <mailto:pieter.martin@gmail.com
> <mailto:
> >> pieter.martin@gmail.com>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>> The way I saw it is that the big difference is that graph's have
> >>>>> reified joins. This is both a blessing and a curse.
> >>>>> A blessing because its much easier (less text to type, less mistakes,
> >>>>> clearer semantics...) to traverse an edge than to construct a manual
> >>>>> join.A curse because there are almost always far more ways to
> >> traverse
> >>>>> a data set than just by the edges some architect might have
> >> considered
> >>>>> when creating the data set. Often the architect is not the domain
> >>>>> expert and the edges are a hardcoded layout of the dataset, which
> >>>>> almost certainly won't survive the real world's demands. In graphs,
> >> if
> >>>>> their are no edges then the data is not reachable, except via indexed
> >>>>> lookups. This is the standard engineering problem of database design,
> >>>>> but it is important and useful that data can be traversed, joined,
> >>>>> without having reified edges.
> >>>>> In Sqlg at least, but I suspect it generalizes, I want to create the
> >>>>> notion of a "virtual edge". Which in meta data describes the join and
> >>>>> then the standard to(direction, "virtualEdgeName") will work.
> >>>>> In a way this is precisely to keep the graphy nature of gremlin, i.e.
> >>>>> traversing edges, and avoid using the manual join syntax you
> >> described.
> >>>>> CheersPieter
> >>>>>
> >>>>> On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> >>>>>> Hi,
> >>>>>> *** This is mainly for Kuppitz, but if others care.
> >>>>>> Was thinking last night about relational data and Gremlin. The T()
> >>>>>> step returns all the tables in the withStructure() RDBMS database.
> >>>>>> Tables are ‘complex values’ so they can't leave the VM (only a
> >> simple
> >>>>>> ‘toString’).
> >>>>>> Below is a fake Gremlin session. (and these are just ideas…) tables
> >>>>>> -> a ListLike of rows        rows -> a MapLike of primitives
> >>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >>>>>> g.T(‘people’)==>t[people]gremlin>
> >>>>>>
> >> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> >>>>>> in>
> >>>>>>
> >> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> >>>>>> age:10}==>{name:josh,age:35}gremlin>
> >>>>>>
> >> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> >>>>>> remlin>
> >>>>>>
> >> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> >>>>>> h
> >>>>>> Makes sense. Nice that values() and has() generally apply to all
> >>>>>> ListLike and MapLike structures. Also, note how asMap() is the
> >>>>>> valueMap() of TP4, but generalizes to anything that is MapLike so it
> >>>>>> can be turned into a primitive form as a data-rich result from the
> >>>>>> VM.
> >>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >>>>>>
> >> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> >>>>>> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> >>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
> >> by(se
> >>>>>> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
> >>
> >>>>>> values().asMap()==>{a.name:marko,a.age:29,b.name:
> >> marko,b.city:santafe
> >>>>>> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{
> >> a.name <http://a.name/> <http://a.name/ <http://a.name/>>:
> >>>>>> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> >>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
> >> by(’n
> >>>>>> ame’). // shorthand for equijoin on name
> >>>>>> column/key           values().asMap()==>{a.name:marko,a.age:29,
> >> b.name <http://b.name/> <http://b.name/ <http://b.name/>>
> >>>>>> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,
> >> b.ci <http://b.ci/> <http://b.ci/ <http://b.ci/>>
> >>>>>> ty:tucson}==>{a.name:josh,a.age:35,b.name:
> >> josh,b.city:desertisland}gr
> >>>>>> emlin>
> >>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
> >> by(’n
> >>>>>> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> >>>>>> complex value ‘toString'gremlin>
> >>>>>> And of course, all of this is strategized into a SQL call so its
> >>>>>> joins aren’t necessarily computed using TP4-VM resources.
> >>>>>> Anywho — what I hope to realize is the relationship between “links”
> >>>>>> (graph) and “joins” (tables). How can we make (bytecode-wise at
> >>>>>> least) RDBMS join operations and graph traversal operations ‘the
> >>>>>> same.’?
> >>>>>>     Singleton: Integer, String, Float, Double, etc. Collection:
> >>>>>> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> >>>>>> Vertices and Tables can be “linked.” Unlike Collections, they don’t
> >>>>>> maintain a “parent/child” relationship with the objects they
> >>>>>> reference. What does this mean……….?
> >>>>>> Take care,Marko.
> >>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> >> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
> >> http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ <
> >> http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <
> >> http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <
> >> http://rredux.com/>>>>
> >>>
> >>> <diagrams.zip>
> >>
> >>
>
>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi,

I think we are very close to something useable for TP4 structure/. Solving this problem elegantly will open the flood gates on tp4/ development.

——

I still don’t grock your comeFrom().goto() stuff. I don’t get the benefit of having two instructions for “pointer chasing” instead of one.

Lets put that aside for now and lets turn to modeling a Vertex. Go back to my original representation:

vertex.goto(‘label’)
vertex.goto(‘id’)
vertex.goto(‘outE’)
vertex.goto(‘inE’)
vertex.goto(‘properties’)

Any object can be converted into a Map. In TinkerPop3 we convert vertices into maps via:

	g.V().has(‘name’,’marko’).valueMap() => {name:marko,age:29}
	g.V().has(‘name’,’marko’).valueMap(true) => {id:1,label:person,name:marko,age:29}

In the spirit of instruction reuse, we should have an asMap() instruction that works for ANY object. (As a side: this gets back to ONLY sending primitives over the wire, no Vertex/Edge/Document/Table/Row/XML/ColumnFamily/etc.). Thus, the above is:

	g.V().has(‘name’,’marko’).properties().asMap() => {name:marko,age:29}
	g.V().has(‘name’,’marko’).asMap() => {id:1,label:person,properties:{name:marko,age:29}}

You might ask, why didn’t it go to outE and inE and map-ify that data? Because those are "sibling” references, not “children” references. 

	goto(‘outE’) is a “sibling” reference. (a vertex does not contain an edge)
	goto(‘id’) is a “child” reference. (a vertex contains the id)

Where do we find sibling references?
	Graphs: vertices don’t contain each other.
	OO heaps: many objects don’t contain each other.
	RDBMS: rows are linked by joins, but don’t contain each other.

So, the way in which we structure our references (pointers) determines the shape of the data and ultimately how different instructions will behave. We can’t assume that asMap() knows anything about vertices/edges/documents/rows/tables/etc. It will simply walk all child-references and create a map.

We don’t want TP to get involved in “complex data types.” We don’t care. You can propagate MyDatabaseObject through the TP4 VM pipeline and load your object up with methods for optimizations with your DB and all that, but for TP4, your object is just needs to implement:

	ComplexType
		- Iterator<T> children(String label)
		- Iterator<T> siblings(String label)
		- default Iterator<T> references(String label) { IteratorUtils.concat(children(label), siblings(label)) }
		- String toString()

When a ComplexType goes over the wire to the user, it just represented as a ComplexTypeProxy with a toString() like v[1], tinkergraph[vertices:10,edges:34], etc. All references are disconnected. Yes, even children references. We do not want language drivers having to know about random object types and have to deal with implementing serializers and all that non-sense. The TP4 serialization protocol is primitives, maps, lists, bytecode, and traversers. Thats it!

*** Only Maps and Lists (that don’t contain complex data types) maintain their child references “over the wire.”

——

I don’t get your hypergraph example, so let me try another example:

	tp ==member==> marko, josh

TP is a vertex and there is a directed hyperedge with label “member” connecting to marko and josh vertices.

tp.goto(“outE”).filter(goto(“label”).is(“member”)).goto(“inV”)

Looks exactly like a property graph query? However, its not because goto(“inV”) returns 2 vertices, not 1. EdgeVertexFlatmapFunction works for property graphs and hypergraphs. It doesn’t care — it just follows goto() pointers! That is, it follows the ComplexType.references(“inV”). Multi-properties are the same as well. Likewise for meta-properties. These data model variations are not “special” to the TP4 VM. It just walks references whether there are 0,1,2, or N of them.

Thus, what is crucial to all this is the “shape of the data.” Using your pointers wisely so instructions produce useful results.

Does any of what I wrote update your comeFrom().goto() stuff? If not, can you please explain to me why comeFrom() is cool — sorry for being dense (aka “being Kuppitz" — thats right, I said it. boom!).

Thanks,
Marko.

http://rredux.com <http://rredux.com/>




> On Apr 23, 2019, at 10:25 AM, Joshua Shinavier <jo...@fortytwo.net> wrote:
> 
> On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez <ok...@gmail.com>
> wrote:
> 
>> Hey Josh,
>> 
>> This gets to the notion I presented in “The Fabled GMachine.”
>>        http://rredux.com/the-fabled-gmachine.html <
>> http://rredux.com/the-fabled-gmachine.html> (first paragraph of
>> “Structures, Processes, and Languages” section)
>> 
>> All that exists are memory addresses that contain either:
>> 
>>        1. A primitive
>>        2. A set of labeled references to other references or primitives.
>> 
>> Using your work and the above, here is a super low-level ‘bytecode' for
>> property graphs.
>> 
>> v.goto("id") => 1
>> 
> 
> LGTM. An id is special because it is uniquely identifying / is a primary
> key for the element. However, it is also just a field of the element, like
> "in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id would
> only really need to be unique among other elements of the same type. To the
> above, I would add:
> 
> v.type() => Person
> 
> ...a special operation which takes you from an element to its type. This is
> important if unions are supported; e.g. "name" in my example can apply
> either to a Person or a Project.
> 
> 
> v.goto("label") => person
>> 
> 
> Or that. Like "id", "type"/"label" is special. You can think of it as a
> field; it's just a different sort of field which will have the same value
> for all elements of any given type.
> 
> 
> 
>> v.goto("properties").goto("name") => "marko"
>> 
> 
> OK, properties. Are properties built-in as a separate kind of thing from
> edges, or can we treat them the same as vertices and edges here? I think we
> can treat them the same. A property, in the algebraic model I described
> above, is just an element with two fields, the second of which is a
> primitive value. As I said, I think we need two distinct traversal
> operations -- projection and selection -- and here is where we can use the
> latter. Here, I will call it "comeFrom".
> 
> v.comeFrom("name", "out").goto("in") => {"marko"}
> 
> You can think of this comeFrom as a special case of a select() function
> which takes a type -- "name" -- and a set of key/value pairs {("out", v)}.
> It returns all matching elements of the given type. You then project to the
> "in" value using your goto. I wrote {"marko"} as a set, because comeFrom
> can give you multiple properties, depending on whether multi-properties are
> supported.
> 
> Note how similar this is to an edge traversal:
> 
> v.comeFrom("knows", "out").goto("in") => {v[2], v[4]}
> 
> Of course, you could define "properties" in such a way that a
> goto("properties") does exactly this under the hood, but in terms of low
> level instructions, you need something like comeFrom.
> 
> 
> v.goto("properties").goto("name").goto(0) => "m"
>> 
> 
> This is where the notion of optionals becomes handy. You can make
> array/list indices into fields like this, but IMO you should also make them
> safe. E.g. borrowing Haskell syntax for a moment:
> 
> v.goto("properties").goto("name").goto(0) => Just 'm'
> 
> v.goto("properties").goto("name").goto(5) => Nothing
> 
> 
> v.goto("outE").goto("inV") => v[2], v[4]
>> 
> 
> I am not a big fan of untyped "outE", but you can think of this as a union
> of all v.comeFrom(x, "out").goto("in"), where x is any edge type. Only
> "knows" and "created" are edge types which are applicable to "Person", so
> you will only get {v[2], v[4]}. If you want to get really crazy, you can
> allow x to be any type. Then you get {v[2], v[4], 29, "marko"}.
> 
> 
> 
>> g.goto("V").goto(1) => v[1]
>> 
> 
> That, or you give every element a virtual field called "graph". So:
> 
> v.goto("graph") => g
> 
> g.comeFrom("Person", "graph") => {v[1], v[2], v[4], v[6]}
> 
> g.comeFrom("Person", "graph").restrict("id", 1)
> 
> ...where restrict() is the relational "sigma" operation as above, not to be
> confused with TinkerPop's select(), filter(), or has() steps. Again, I
> prefer to specify a type in comeFrom (i.e. we're looking specifically for a
> Person with id of 1), but you could also do a comprehension g.comeFrom(x,
> "graph"), letting x range over all types.
> 
> 
> 
>> The goto() instruction moves the “memory reference” (traverser) from the
>> current “memory address” to the “memory address” referenced by the goto()
>> argument.
>> 
> 
> Agreed, if we also think of primitive values as memory references.
> 
> 
> 
>> The Gremlin expression:
>> 
>>        g.V().has(‘name’,’marko’).out(‘knows’).drop()
>> 
>> ..would compile to:
>> 
>> 
>> g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()
>> 
> 
> 
> In the alternate universe:
> 
> g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in",
> "marko").goto("out").comeFrom("knows", "out").goto("in").free()
> 
> I have wimped out on free() and just left it as you had it, but I think it
> would be worthwhile to explore a monadic syntax for traversals with
> side-effects. Different topic.
> 
> Now, all of this "out", "in" business is getting pretty repetitive, right?
> Well, the field names become more diverse if we allow hyper-edges and
> generalized ADTs. E.g. in my Trip example, say I want to know all drop-off
> locations for a given rider:
> 
> u.comeFrom("Trip", "rider").goto("dropoff").goto("place")
> 
> Done.
> 
> 
> 
>> If we can get things that “low-level” and still efficient to compile, then
>> we can model every data structure. All you are doing is pointer chasing
>> through a withStructure() data structure. .
>> 
> 
> Agreed.
> 
> 
> No one would ever want to write strategies for goto()-based Bytecode.
> 
> 
> Also agreed.
> 
> 
> 
>> Thus, perhaps there could be a PropertyGraphDecorationStrategy that does:
>> 
>> [...]
> 
> 
> No argument here, though the alternate-universe "bytecode" would look
> slightly different. And the high-level syntax should also be able to deal
> with generalized relations / data types gracefully. As a thought
> experiment, suppose we were to define the steps to() as your goto(), and
> from() as my comeFrom(). Then traversals like:
> 
> u.from("Trip", "rider").to("dropoff").to("time")
> 
> ...look pretty good as-is, and are not too low-level. However, ordinary
> edge traversals like:
> 
> v.from("knows", "out").to("in")
> 
> ...do look a little Assembly-like. So in/out/both etc. remain as they are,
> but are shorthand for from() and to() steps using "out" or "in":
> 
> v.out("knows") === v.outE("knows").inV() === v.from("knows", "out").to("in")
> 
> 
> [I AM NOW GOING OFF THE RAILS]
>> [sniiiiip]
>> 
> 
> Sure. Again, I like the idea of wrapping side-effects in monads. What would
> that look like in a Gremlinesque fluent syntax? I don't quite know, but if
> we think of the dot as a monadic bind operation like Haskell's >>=, then
> perhaps the monadic expressions look pretty similar to what you have just
> sketched out. Might have to be careful about what it means to nest
> operations as in your addEdge examples.
> 
> 
> 
> [I AM NOW BACK ON THE RAILS]
>> 
>> Its as if “properties”, “outE”, “label”, “inV”, etc. references mean
>> something to property graph providers and they can do more intelligent
>> stuff than what MongoDB would do with such information. However, someone,
>> of course, can create a MongoDBPropertyGraphStrategy that would make
>> documents look like vertices and edges and then use O(log(n)) lookups on
>> ids to walk the graph. However, if that didn’t exist, it would still do
>> something that works even if its horribly inefficient as every database can
>> make primitives with references between them!
>> 
> 
> I'm on the same same pair of rails.
> 
> 
> 
>> Anywho @Josh, I believe goto() is what you are doing with multi-references
>> off an object. How do we make it all clean, easy, and universal?
>> 
> 
> Let me know what you think of the above.
> 
> Josh
> 
> 
> 
>> 
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/>
>> 
>> 
>> 
>> 
>>> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <jo...@fortytwo.net> wrote:
>>> 
>>> Ah, glad you asked. It's all in the pictures. I have nowhere to put them
>> online at the moment... maybe this attachment will go through to the list?
>>> 
>>> Btw. David Spivak gave his talk today at Uber; it was great. Juan
>> Sequeda (relational <--> RDF mapping guy) was also here, and Ryan joined
>> remotely. Really interesting discussion about databases vs. graphs, and
>> what category theory brings to the table.
>>> 
>>> 
>>> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <okrammarko@gmail.com
>> <ma...@gmail.com>> wrote:
>>> Hey Josh,
>>> 
>>> I’m digging what you are saying, but the pictures didn’t come through
>> for me ? … Can you provide them again (or if dev@ is filtering them, can
>> you give me URLs to them)?
>>> 
>>> Thanks,
>>> Marko.
>>> 
>>> 
>>>> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <josh@fortytwo.net
>> <ma...@fortytwo.net>> wrote:
>>>> 
>>>> On the subject of "reified joins", maybe be a picture will be worth a
>> few words. As I said in the thread <
>> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ <
>> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ>>
>> on property graph standardization, if you think of vertex labels, edge
>> labels, and property keys as types, each with projections to two other
>> types, there is a nice analogy with relations of two columns, and this
>> analogy can be easily extended to hyper-edges. Here is what the schema of
>> the TinkerPop classic graph looks like if you make each type (e.g. Person,
>> Project, knows, name) into a relation:
>>>> 
>>>> 
>>>> 
>>>> I have made the vertex types salmon-colored, the edge types yellow,
>> the property types green, and the data types blue. The "o" and "I" columns
>> represent the out-type (e.g. out-vertex type of Person) and in-type (e.g.
>> property value type of String) of each relation. More than two arrows from
>> a column represent a coproduct, e.g. the out-type of "name" is Person OR
>> Project. Now you can think of out() and in() as joins of two tables on a
>> primary and foreign key.
>>>> 
>>>> We are not limited to "out" and "in", however. Here is the ternary
>> relationship (hyper-edge) from hyper-edge slide <
>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
>> <
>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>>
>> of my Graph Day preso, which has three columns/roles/projections:
>>>> 
>>>> 
>>>> 
>>>> I have drawn Says in light blue to indicate that it is a generalized
>> element; it has projections other than "out" and "in". Now the line between
>> relations and edges begins to blur. E.g. in the following, is PlaceEvent a
>> vertex or a property?
>>>> 
>>>> 
>>>> 
>>>> With the right type system, we can just speak of graph elements, and
>> use "vertex", "edge", "property" when it is convenient. In the relational
>> model, they are relations. If you materialize them in a relational
>> database, they are rows. In any case, you need two basic graph traversal
>> operations:
>>>> project() -- forward traversal of the arrows in the above diagrams.
>> Takes you from an element to a component like in-vertex.
>>>> select() -- reverse traversal of the arrows. Allows you to answer
>> questions like "in which Trips is John Doe the rider?"
>>>> 
>>>> Josh
>>>> 
>>>> 
>>>> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <okrammarko@gmail.com
>> <ma...@gmail.com> <mailto:okrammarko@gmail.com <mailto:
>> okrammarko@gmail.com>>> wrote:
>>>> Hello,
>>>> 
>>>> I agree with everything you say. Here is my question:
>>>> 
>>>>        Relational database — join: Table x Table x equality function
>> -> Table
>>>>        Graph database — traverser: Vertex x edge label -> Vertex
>>>> 
>>>> I want a single function that does both. The only think was to
>> represent traverser() in terms of join():
>>>> 
>>>>        Graph database — traverser: Vertices x Vertex x equality
>> function -> Vertices
>>>> 
>>>> For example,
>>>> 
>>>> V().out(‘address’)
>>>> 
>>>>        ==>
>>>> 
>>>> g.join(V().hasLabel(‘person’).as(‘a’)
>>>>       V().hasLabel(‘addresses’).as(‘b’)).
>>>>         by(‘name’).select(?address vertex?)
>>>> 
>>>> That is, join the vertices with themselves based on some predicate to
>> go from vertices to vertices.
>>>> 
>>>> However, I would like instead to transform the relational database
>> join() concept into a traverser() concept. Kuppitz and I were talking the
>> other day about a link() type operator that says: “try and link to this
>> thing in some specified way.” .. ?? The problem we ran into is again, “link
>> it to what?”
>>>> 
>>>>        - in graph, the ‘to what’ is hardcoded so you don’t need to
>> specify anything.
>>>>        - in rdbms, the ’to what’ is some other specified table.
>>>> 
>>>> So what does the link() operator look like?
>>>> 
>>>> ——
>>>> 
>>>> Some other random thoughts….
>>>> 
>>>> Relational databases join on the table (the whole collection)
>>>> Graph databases traverser on the vertex (an element of the whole
>> collection)
>>>> 
>>>> We can make a relational database join on single row (by providing a
>> filter to a particular primary key). This is the same as a table with one
>> row. Likewise, for graph in the join() context above:
>>>> 
>>>> V(1).out(‘address’)
>>>> 
>>>>        ==>
>>>> 
>>>> g.join(V(1).as(‘a’)
>>>>       V().hasLabel(‘addresses’).as(‘b’)).
>>>>         by(‘name’).select(?address vertex?)
>>>> 
>>>> More thoughts please….
>>>> 
>>>> Marko.
>>>> 
>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>> http://rredux.com/ <http://rredux.com/>>>
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.martin@gmail.com
>> <ma...@gmail.com> <mailto:pieter.martin@gmail.com <mailto:
>> pieter.martin@gmail.com>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> The way I saw it is that the big difference is that graph's have
>>>>> reified joins. This is both a blessing and a curse.
>>>>> A blessing because its much easier (less text to type, less mistakes,
>>>>> clearer semantics...) to traverse an edge than to construct a manual
>>>>> join.A curse because there are almost always far more ways to
>> traverse
>>>>> a data set than just by the edges some architect might have
>> considered
>>>>> when creating the data set. Often the architect is not the domain
>>>>> expert and the edges are a hardcoded layout of the dataset, which
>>>>> almost certainly won't survive the real world's demands. In graphs,
>> if
>>>>> their are no edges then the data is not reachable, except via indexed
>>>>> lookups. This is the standard engineering problem of database design,
>>>>> but it is important and useful that data can be traversed, joined,
>>>>> without having reified edges.
>>>>> In Sqlg at least, but I suspect it generalizes, I want to create the
>>>>> notion of a "virtual edge". Which in meta data describes the join and
>>>>> then the standard to(direction, "virtualEdgeName") will work.
>>>>> In a way this is precisely to keep the graphy nature of gremlin, i.e.
>>>>> traversing edges, and avoid using the manual join syntax you
>> described.
>>>>> CheersPieter
>>>>> 
>>>>> On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
>>>>>> Hi,
>>>>>> *** This is mainly for Kuppitz, but if others care.
>>>>>> Was thinking last night about relational data and Gremlin. The T()
>>>>>> step returns all the tables in the withStructure() RDBMS database.
>>>>>> Tables are ‘complex values’ so they can't leave the VM (only a
>> simple
>>>>>> ‘toString’).
>>>>>> Below is a fake Gremlin session. (and these are just ideas…) tables
>>>>>> -> a ListLike of rows        rows -> a MapLike of primitives
>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>>>>>> g.T(‘people’)==>t[people]gremlin>
>>>>>> 
>> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
>>>>>> in>
>>>>>> 
>> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
>>>>>> age:10}==>{name:josh,age:35}gremlin>
>>>>>> 
>> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
>>>>>> remlin>
>>>>>> 
>> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
>>>>>> h
>>>>>> Makes sense. Nice that values() and has() generally apply to all
>>>>>> ListLike and MapLike structures. Also, note how asMap() is the
>>>>>> valueMap() of TP4, but generalizes to anything that is MapLike so it
>>>>>> can be turned into a primitive form as a data-rich result from the
>>>>>> VM.
>>>>>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>>>>>> 
>> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
>>>>>> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>> by(se
>>>>>> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
>> 
>>>>>> values().asMap()==>{a.name:marko,a.age:29,b.name:
>> marko,b.city:santafe
>>>>>> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{
>> a.name <http://a.name/> <http://a.name/ <http://a.name/>>:
>>>>>> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>> by(’n
>>>>>> ame’). // shorthand for equijoin on name
>>>>>> column/key           values().asMap()==>{a.name:marko,a.age:29,
>> b.name <http://b.name/> <http://b.name/ <http://b.name/>>
>>>>>> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,
>> b.ci <http://b.ci/> <http://b.ci/ <http://b.ci/>>
>>>>>> ty:tucson}==>{a.name:josh,a.age:35,b.name:
>> josh,b.city:desertisland}gr
>>>>>> emlin>
>>>>>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>> by(’n
>>>>>> ame’)==>t[people<-name->addresses]  // without asMap(), just the
>>>>>> complex value ‘toString'gremlin>
>>>>>> And of course, all of this is strategized into a SQL call so its
>>>>>> joins aren’t necessarily computed using TP4-VM resources.
>>>>>> Anywho — what I hope to realize is the relationship between “links”
>>>>>> (graph) and “joins” (tables). How can we make (bytecode-wise at
>>>>>> least) RDBMS join operations and graph traversal operations ‘the
>>>>>> same.’?
>>>>>>     Singleton: Integer, String, Float, Double, etc. Collection:
>>>>>> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
>>>>>> Vertices and Tables can be “linked.” Unlike Collections, they don’t
>>>>>> maintain a “parent/child” relationship with the objects they
>>>>>> reference. What does this mean……….?
>>>>>> Take care,Marko.
>>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
>> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
>> http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ <
>> http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <
>> http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <
>> http://rredux.com/>>>>
>>> 
>>> <diagrams.zip>
>> 
>>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Joshua Shinavier <jo...@fortytwo.net>.

On Tue, Apr 23, 2019 at 5:14 AM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hey Josh,
>
> This gets to the notion I presented in “The Fabled GMachine.”
>         http://rredux.com/the-fabled-gmachine.html <
> http://rredux.com/the-fabled-gmachine.html> (first paragraph of
> “Structures, Processes, and Languages” section)
>
>  All that exists are memory addresses that contain either:
>
>         1. A primitive
>         2. A set of labeled references to other references or primitives.
>
> Using your work and the above, here is a super low-level ‘bytecode' for
> property graphs.
>
> v.goto("id") => 1
>

LGTM. An id is special because it is uniquely identifying / is a primary
key for the element. However, it is also just a field of the element, like
"in"/"inV" and "out"/"outV" are fields of an edge. As an aside, an id would
only really need to be unique among other elements of the same type. To the
above, I would add:

v.type() => Person

...a special operation which takes you from an element to its type. This is
important if unions are supported; e.g. "name" in my example can apply
either to a Person or a Project.


v.goto("label") => person
>

Or that. Like "id", "type"/"label" is special. You can think of it as a
field; it's just a different sort of field which will have the same value
for all elements of any given type.



> v.goto("properties").goto("name") => "marko"
>

OK, properties. Are properties built-in as a separate kind of thing from
edges, or can we treat them the same as vertices and edges here? I think we
can treat them the same. A property, in the algebraic model I described
above, is just an element with two fields, the second of which is a
primitive value. As I said, I think we need two distinct traversal
operations -- projection and selection -- and here is where we can use the
latter. Here, I will call it "comeFrom".

v.comeFrom("name", "out").goto("in") => {"marko"}

You can think of this comeFrom as a special case of a select() function
which takes a type -- "name" -- and a set of key/value pairs {("out", v)}.
It returns all matching elements of the given type. You then project to the
"in" value using your goto. I wrote {"marko"} as a set, because comeFrom
can give you multiple properties, depending on whether multi-properties are
supported.

Note how similar this is to an edge traversal:

v.comeFrom("knows", "out").goto("in") => {v[2], v[4]}

Of course, you could define "properties" in such a way that a
goto("properties") does exactly this under the hood, but in terms of low
level instructions, you need something like comeFrom.


v.goto("properties").goto("name").goto(0) => "m"
>

This is where the notion of optionals becomes handy. You can make
array/list indices into fields like this, but IMO you should also make them
safe. E.g. borrowing Haskell syntax for a moment:

v.goto("properties").goto("name").goto(0) => Just 'm'

v.goto("properties").goto("name").goto(5) => Nothing


v.goto("outE").goto("inV") => v[2], v[4]
>

I am not a big fan of untyped "outE", but you can think of this as a union
of all v.comeFrom(x, "out").goto("in"), where x is any edge type. Only
"knows" and "created" are edge types which are applicable to "Person", so
you will only get {v[2], v[4]}. If you want to get really crazy, you can
allow x to be any type. Then you get {v[2], v[4], 29, "marko"}.



> g.goto("V").goto(1) => v[1]
>

That, or you give every element a virtual field called "graph". So:

v.goto("graph") => g

g.comeFrom("Person", "graph") => {v[1], v[2], v[4], v[6]}

g.comeFrom("Person", "graph").restrict("id", 1)

...where restrict() is the relational "sigma" operation as above, not to be
confused with TinkerPop's select(), filter(), or has() steps. Again, I
prefer to specify a type in comeFrom (i.e. we're looking specifically for a
Person with id of 1), but you could also do a comprehension g.comeFrom(x,
"graph"), letting x range over all types.



> The goto() instruction moves the “memory reference” (traverser) from the
> current “memory address” to the “memory address” referenced by the goto()
> argument.
>

Agreed, if we also think of primitive values as memory references.



> The Gremlin expression:
>
>         g.V().has(‘name’,’marko’).out(‘knows’).drop()
>
> ..would compile to:
>
>
> g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()
>


In the alternate universe:

g.comeFrom("Person", "graph").comeFrom("name", "out").restrict("in",
"marko").goto("out").comeFrom("knows", "out").goto("in").free()

I have wimped out on free() and just left it as you had it, but I think it
would be worthwhile to explore a monadic syntax for traversals with
side-effects. Different topic.

Now, all of this "out", "in" business is getting pretty repetitive, right?
Well, the field names become more diverse if we allow hyper-edges and
generalized ADTs. E.g. in my Trip example, say I want to know all drop-off
locations for a given rider:

u.comeFrom("Trip", "rider").goto("dropoff").goto("place")

Done.



> If we can get things that “low-level” and still efficient to compile, then
> we can model every data structure. All you are doing is pointer chasing
> through a withStructure() data structure. .
>

Agreed.


No one would ever want to write strategies for goto()-based Bytecode.


Also agreed.



> Thus, perhaps there could be a PropertyGraphDecorationStrategy that does:
>
> [...]


No argument here, though the alternate-universe "bytecode" would look
slightly different. And the high-level syntax should also be able to deal
with generalized relations / data types gracefully. As a thought
experiment, suppose we were to define the steps to() as your goto(), and
from() as my comeFrom(). Then traversals like:

u.from("Trip", "rider").to("dropoff").to("time")

...look pretty good as-is, and are not too low-level. However, ordinary
edge traversals like:

v.from("knows", "out").to("in")

...do look a little Assembly-like. So in/out/both etc. remain as they are,
but are shorthand for from() and to() steps using "out" or "in":

v.out("knows") === v.outE("knows").inV() === v.from("knows", "out").to("in")


[I AM NOW GOING OFF THE RAILS]
> [sniiiiip]
>

Sure. Again, I like the idea of wrapping side-effects in monads. What would
that look like in a Gremlinesque fluent syntax? I don't quite know, but if
we think of the dot as a monadic bind operation like Haskell's >>=, then
perhaps the monadic expressions look pretty similar to what you have just
sketched out. Might have to be careful about what it means to nest
operations as in your addEdge examples.



[I AM NOW BACK ON THE RAILS]
>
> Its as if “properties”, “outE”, “label”, “inV”, etc. references mean
> something to property graph providers and they can do more intelligent
> stuff than what MongoDB would do with such information. However, someone,
> of course, can create a MongoDBPropertyGraphStrategy that would make
> documents look like vertices and edges and then use O(log(n)) lookups on
> ids to walk the graph. However, if that didn’t exist, it would still do
> something that works even if its horribly inefficient as every database can
> make primitives with references between them!
>

I'm on the same same pair of rails.



> Anywho @Josh, I believe goto() is what you are doing with multi-references
> off an object. How do we make it all clean, easy, and universal?
>

Let me know what you think of the above.

Josh



>
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
> > On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <jo...@fortytwo.net> wrote:
> >
> > Ah, glad you asked. It's all in the pictures. I have nowhere to put them
> online at the moment... maybe this attachment will go through to the list?
> >
> > Btw. David Spivak gave his talk today at Uber; it was great. Juan
> Sequeda (relational <--> RDF mapping guy) was also here, and Ryan joined
> remotely. Really interesting discussion about databases vs. graphs, and
> what category theory brings to the table.
> >
> >
> > On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <okrammarko@gmail.com
> <ma...@gmail.com>> wrote:
> > Hey Josh,
> >
> > I’m digging what you are saying, but the pictures didn’t come through
> for me ? … Can you provide them again (or if dev@ is filtering them, can
> you give me URLs to them)?
> >
> > Thanks,
> > Marko.
> >
> >
> > > On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <josh@fortytwo.net
> <ma...@fortytwo.net>> wrote:
> > >
> > > On the subject of "reified joins", maybe be a picture will be worth a
> few words. As I said in the thread <
> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ <
> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ>>
> on property graph standardization, if you think of vertex labels, edge
> labels, and property keys as types, each with projections to two other
> types, there is a nice analogy with relations of two columns, and this
> analogy can be easily extended to hyper-edges. Here is what the schema of
> the TinkerPop classic graph looks like if you make each type (e.g. Person,
> Project, knows, name) into a relation:
> > >
> > >
> > >
> > > I have made the vertex types salmon-colored, the edge types yellow,
> the property types green, and the data types blue. The "o" and "I" columns
> represent the out-type (e.g. out-vertex type of Person) and in-type (e.g.
> property value type of String) of each relation. More than two arrows from
> a column represent a coproduct, e.g. the out-type of "name" is Person OR
> Project. Now you can think of out() and in() as joins of two tables on a
> primary and foreign key.
> > >
> > > We are not limited to "out" and "in", however. Here is the ternary
> relationship (hyper-edge) from hyper-edge slide <
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49
> <
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>>
> of my Graph Day preso, which has three columns/roles/projections:
> > >
> > >
> > >
> > > I have drawn Says in light blue to indicate that it is a generalized
> element; it has projections other than "out" and "in". Now the line between
> relations and edges begins to blur. E.g. in the following, is PlaceEvent a
> vertex or a property?
> > >
> > >
> > >
> > > With the right type system, we can just speak of graph elements, and
> use "vertex", "edge", "property" when it is convenient. In the relational
> model, they are relations. If you materialize them in a relational
> database, they are rows. In any case, you need two basic graph traversal
> operations:
> > > project() -- forward traversal of the arrows in the above diagrams.
> Takes you from an element to a component like in-vertex.
> > > select() -- reverse traversal of the arrows. Allows you to answer
> questions like "in which Trips is John Doe the rider?"
> > >
> > > Josh
> > >
> > >
> > > On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <okrammarko@gmail.com
> <ma...@gmail.com> <mailto:okrammarko@gmail.com <mailto:
> okrammarko@gmail.com>>> wrote:
> > > Hello,
> > >
> > > I agree with everything you say. Here is my question:
> > >
> > >         Relational database — join: Table x Table x equality function
> -> Table
> > >         Graph database — traverser: Vertex x edge label -> Vertex
> > >
> > > I want a single function that does both. The only think was to
> represent traverser() in terms of join():
> > >
> > >         Graph database — traverser: Vertices x Vertex x equality
> function -> Vertices
> > >
> > > For example,
> > >
> > > V().out(‘address’)
> > >
> > >         ==>
> > >
> > > g.join(V().hasLabel(‘person’).as(‘a’)
> > >        V().hasLabel(‘addresses’).as(‘b’)).
> > >          by(‘name’).select(?address vertex?)
> > >
> > > That is, join the vertices with themselves based on some predicate to
> go from vertices to vertices.
> > >
> > > However, I would like instead to transform the relational database
> join() concept into a traverser() concept. Kuppitz and I were talking the
> other day about a link() type operator that says: “try and link to this
> thing in some specified way.” .. ?? The problem we ran into is again, “link
> it to what?”
> > >
> > >         - in graph, the ‘to what’ is hardcoded so you don’t need to
> specify anything.
> > >         - in rdbms, the ’to what’ is some other specified table.
> > >
> > > So what does the link() operator look like?
> > >
> > > ——
> > >
> > > Some other random thoughts….
> > >
> > > Relational databases join on the table (the whole collection)
> > > Graph databases traverser on the vertex (an element of the whole
> collection)
> > >
> > > We can make a relational database join on single row (by providing a
> filter to a particular primary key). This is the same as a table with one
> row. Likewise, for graph in the join() context above:
> > >
> > > V(1).out(‘address’)
> > >
> > >         ==>
> > >
> > > g.join(V(1).as(‘a’)
> > >        V().hasLabel(‘addresses’).as(‘b’)).
> > >          by(‘name’).select(?address vertex?)
> > >
> > > More thoughts please….
> > >
> > > Marko.
> > >
> > > http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
> http://rredux.com/ <http://rredux.com/>>>
> > >
> > >
> > >
> > >
> > > > On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.martin@gmail.com
> <ma...@gmail.com> <mailto:pieter.martin@gmail.com <mailto:
> pieter.martin@gmail.com>>> wrote:
> > > >
> > > > Hi,
> > > > The way I saw it is that the big difference is that graph's have
> > > > reified joins. This is both a blessing and a curse.
> > > > A blessing because its much easier (less text to type, less mistakes,
> > > > clearer semantics...) to traverse an edge than to construct a manual
> > > > join.A curse because there are almost always far more ways to
> traverse
> > > > a data set than just by the edges some architect might have
> considered
> > > > when creating the data set. Often the architect is not the domain
> > > > expert and the edges are a hardcoded layout of the dataset, which
> > > > almost certainly won't survive the real world's demands. In graphs,
> if
> > > > their are no edges then the data is not reachable, except via indexed
> > > > lookups. This is the standard engineering problem of database design,
> > > > but it is important and useful that data can be traversed, joined,
> > > > without having reified edges.
> > > > In Sqlg at least, but I suspect it generalizes, I want to create the
> > > > notion of a "virtual edge". Which in meta data describes the join and
> > > > then the standard to(direction, "virtualEdgeName") will work.
> > > > In a way this is precisely to keep the graphy nature of gremlin, i.e.
> > > > traversing edges, and avoid using the manual join syntax you
> described.
> > > > CheersPieter
> > > >
> > > > On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> > > >> Hi,
> > > >> *** This is mainly for Kuppitz, but if others care.
> > > >> Was thinking last night about relational data and Gremlin. The T()
> > > >> step returns all the tables in the withStructure() RDBMS database.
> > > >> Tables are ‘complex values’ so they can't leave the VM (only a
> simple
> > > >> ‘toString’).
> > > >> Below is a fake Gremlin session. (and these are just ideas…) tables
> > > >> -> a ListLike of rows        rows -> a MapLike of primitives
> > > >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> > > >> g.T(‘people’)==>t[people]gremlin>
> > > >>
> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> > > >> in>
> > > >>
> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> > > >> age:10}==>{name:josh,age:35}gremlin>
> > > >>
> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> > > >> remlin>
> > > >>
> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> > > >> h
> > > >> Makes sense. Nice that values() and has() generally apply to all
> > > >> ListLike and MapLike structures. Also, note how asMap() is the
> > > >> valueMap() of TP4, but generalizes to anything that is MapLike so it
> > > >> can be turned into a primitive form as a data-rich result from the
> > > >> VM.
> > > >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> > > >>
> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> > > >> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> > > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>  by(se
> > > >> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
>
> > > >> values().asMap()==>{a.name:marko,a.age:29,b.name:
> marko,b.city:santafe
> > > >> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{
> a.name <http://a.name/> <http://a.name/ <http://a.name/>>:
> > > >> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> > > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>  by(’n
> > > >> ame’). // shorthand for equijoin on name
> > > >> column/key           values().asMap()==>{a.name:marko,a.age:29,
> b.name <http://b.name/> <http://b.name/ <http://b.name/>>
> > > >> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,
> b.ci <http://b.ci/> <http://b.ci/ <http://b.ci/>>
> > > >> ty:tucson}==>{a.name:josh,a.age:35,b.name:
> josh,b.city:desertisland}gr
> > > >> emlin>
> > > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).
>  by(’n
> > > >> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> > > >> complex value ‘toString'gremlin>
> > > >> And of course, all of this is strategized into a SQL call so its
> > > >> joins aren’t necessarily computed using TP4-VM resources.
> > > >> Anywho — what I hope to realize is the relationship between “links”
> > > >> (graph) and “joins” (tables). How can we make (bytecode-wise at
> > > >> least) RDBMS join operations and graph traversal operations ‘the
> > > >> same.’?
> > > >>      Singleton: Integer, String, Float, Double, etc. Collection:
> > > >> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> > > >> Vertices and Tables can be “linked.” Unlike Collections, they don’t
> > > >> maintain a “parent/child” relationship with the objects they
> > > >> reference. What does this mean……….?
> > > >> Take care,Marko.
> > > >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
> http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ <
> http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <
> http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>>>>
> >
> > <diagrams.zip>
>
>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Marko Rodriguez <ok...@gmail.com>.

Hey Josh,

This gets to the notion I presented in “The Fabled GMachine.”
	http://rredux.com/the-fabled-gmachine.html <http://rredux.com/the-fabled-gmachine.html> (first paragraph of “Structures, Processes, and Languages” section)

 All that exists are memory addresses that contain either:

	1. A primitive
	2. A set of labeled references to other references or primitives.

Using your work and the above, here is a super low-level ‘bytecode' for property graphs.

v.goto("id") => 1
v.goto("label") => person
v.goto("properties").goto("name") => "marko"
v.goto("properties").goto("name").goto(0) => "m"
v.goto("outE").goto("inV") => v[2], v[4]
g.goto("V").goto(1) => v[1]

The goto() instruction moves the “memory reference” (traverser) from the current “memory address” to the “memory address” referenced by the goto() argument.

The Gremlin expression:

	g.V().has(‘name’,’marko’).out(‘knows’).drop()

..would compile to:

	g.goto(“V”).filter(goto(“properties”).goto(“name”).is(“marko”)).goto(“outE”).filter(goto(“label”).is(“knows”)).goto(“inV”).free()

…where free() is the opposite of malloc().

If we can get things that “low-level” and still efficient to compile, then we can model every data structure. All you are doing is pointer chasing through a withStructure() data structure. .

No one would ever want to write strategies for goto()-based Bytecode. Thus, perhaps there could be a PropertyGraphDecorationStrategy that does:

g = Gremlin.traversal(machine).withStructure(JanusGraph.class)  // this will register the strategy
g.V().has(‘name’,’marko’).out(‘knows’).drop() // this generates goto()-based bytecode underneath the covers
	==submit==>
[goto,V][filter,[goto…]][goto][goto][free]] // Bytecode from the “fundamental instruction set” 
[V][has,name,marko][out,knows][drop] // PropertyGraphDecorationStrategy converts goto() instructions into a property graph-specific instruction set.
[V-idx,name,marko][out,knows][drop] // JanusGraphProviderStrategy converts V().has() into an index lookup instruction.

[I AM NOW GOING OFF THE RAILS]

Like fluent-style Gremlin, we could have an AssemblyLanguage that only has goto(), free(), malloc(), filter(), map(), reduce(), flatmap(), barrier(), branch(), repeat(), sideEffect() instructions. For instance, if you wanted to create an array list (not a linked list! :):

[“marko”,29,true]

you would do:

malloc(childrefs(0,1,2)).sideEffect(goto(0).malloc(“marko”)).sideEffect(goto(1).malloc(29)).sideEffect(goto(2).malloc(true))

This tells the underlying data structure (e.g. database) that you want to create a set of “children references" labeled 0, 1, and 2. And then you goto() each reference and add primitives. Now, if JanusGraph got this batch of instructions, it would do the following:

Vertex refs = graph.addVertex()
refs.addEdge(“childref", graph.addVertex(“value”,”marko”)).property(“ref”,0)
refs.addEdge(“childref", graph.addVertex(“value”,29)).property(“ref”,1)
refs.addEdge(“childref", graph.addVertex(“value”,true)).property(“ref”,2)

The reason for childref, is that if you delete the list, you should delete all the children referenced data! In other words, refs-vertex has cascading deletes.

list.drop()
==>
list.sideEffect(goto(0,1,2).free()).free()

JanusGraph would then do:

refs.out(“childref").drop()
refs.drop()

Or, more generally:

refs.emit().repeat(out(“childref”)).drop()

Trippy.

[I AM NOW BACK ON THE RAILS]

Its as if “properties”, “outE”, “label”, “inV”, etc. references mean something to property graph providers and they can do more intelligent stuff than what MongoDB would do with such information. However, someone, of course, can create a MongoDBPropertyGraphStrategy that would make documents look like vertices and edges and then use O(log(n)) lookups on ids to walk the graph. However, if that didn’t exist, it would still do something that works even if its horribly inefficient as every database can make primitives with references between them!

Anywho @Josh, I believe goto() is what you are doing with multi-references off an object. How do we make it all clean, easy, and universal?

Marko.

http://rredux.com <http://rredux.com/>




> On Apr 22, 2019, at 6:42 PM, Joshua Shinavier <jo...@fortytwo.net> wrote:
> 
> Ah, glad you asked. It's all in the pictures. I have nowhere to put them online at the moment... maybe this attachment will go through to the list?
> 
> Btw. David Spivak gave his talk today at Uber; it was great. Juan Sequeda (relational <--> RDF mapping guy) was also here, and Ryan joined remotely. Really interesting discussion about databases vs. graphs, and what category theory brings to the table.
> 
> 
> On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>> wrote:
> Hey Josh,
> 
> I’m digging what you are saying, but the pictures didn’t come through for me ? … Can you provide them again (or if dev@ is filtering them, can you give me URLs to them)?
> 
> Thanks,
> Marko.
> 
> 
> > On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <josh@fortytwo.net <ma...@fortytwo.net>> wrote:
> > 
> > On the subject of "reified joins", maybe be a picture will be worth a few words. As I said in the thread <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ>> on property graph standardization, if you think of vertex labels, edge labels, and property keys as types, each with projections to two other types, there is a nice analogy with relations of two columns, and this analogy can be easily extended to hyper-edges. Here is what the schema of the TinkerPop classic graph looks like if you make each type (e.g. Person, Project, knows, name) into a relation:
> > 
> > 
> > 
> > I have made the vertex types salmon-colored, the edge types yellow, the property types green, and the data types blue. The "o" and "I" columns represent the out-type (e.g. out-vertex type of Person) and in-type (e.g. property value type of String) of each relation. More than two arrows from a column represent a coproduct, e.g. the out-type of "name" is Person OR Project. Now you can think of out() and in() as joins of two tables on a primary and foreign key.
> > 
> > We are not limited to "out" and "in", however. Here is the ternary relationship (hyper-edge) from hyper-edge slide <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49 <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>> of my Graph Day preso, which has three columns/roles/projections:
> > 
> > 
> > 
> > I have drawn Says in light blue to indicate that it is a generalized element; it has projections other than "out" and "in". Now the line between relations and edges begins to blur. E.g. in the following, is PlaceEvent a vertex or a property?
> > 
> > 
> > 
> > With the right type system, we can just speak of graph elements, and use "vertex", "edge", "property" when it is convenient. In the relational model, they are relations. If you materialize them in a relational database, they are rows. In any case, you need two basic graph traversal operations:
> > project() -- forward traversal of the arrows in the above diagrams. Takes you from an element to a component like in-vertex.
> > select() -- reverse traversal of the arrows. Allows you to answer questions like "in which Trips is John Doe the rider?"
> > 
> > Josh
> > 
> > 
> > On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com> <mailto:okrammarko@gmail.com <ma...@gmail.com>>> wrote:
> > Hello,
> > 
> > I agree with everything you say. Here is my question:
> > 
> >         Relational database — join: Table x Table x equality function -> Table
> >         Graph database — traverser: Vertex x edge label -> Vertex
> > 
> > I want a single function that does both. The only think was to represent traverser() in terms of join():
> > 
> >         Graph database — traverser: Vertices x Vertex x equality function -> Vertices
> > 
> > For example, 
> > 
> > V().out(‘address’)
> > 
> >         ==>
> > 
> > g.join(V().hasLabel(‘person’).as(‘a’)
> >        V().hasLabel(‘addresses’).as(‘b’)).
> >          by(‘name’).select(?address vertex?)
> > 
> > That is, join the vertices with themselves based on some predicate to go from vertices to vertices.
> > 
> > However, I would like instead to transform the relational database join() concept into a traverser() concept. Kuppitz and I were talking the other day about a link() type operator that says: “try and link to this thing in some specified way.” .. ?? The problem we ran into is again, “link it to what?”
> > 
> >         - in graph, the ‘to what’ is hardcoded so you don’t need to specify anything.
> >         - in rdbms, the ’to what’ is some other specified table.
> > 
> > So what does the link() operator look like?
> > 
> > ——
> > 
> > Some other random thoughts….
> > 
> > Relational databases join on the table (the whole collection)
> > Graph databases traverser on the vertex (an element of the whole collection)
> > 
> > We can make a relational database join on single row (by providing a filter to a particular primary key). This is the same as a table with one row. Likewise, for graph in the join() context above:
> > 
> > V(1).out(‘address’) 
> > 
> >         ==>
> > 
> > g.join(V(1).as(‘a’)
> >        V().hasLabel(‘addresses’).as(‘b’)).
> >          by(‘name’).select(?address vertex?)
> > 
> > More thoughts please….
> > 
> > Marko.
> > 
> > http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>>
> > 
> > 
> > 
> > 
> > > On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.martin@gmail.com <ma...@gmail.com> <mailto:pieter.martin@gmail.com <ma...@gmail.com>>> wrote:
> > > 
> > > Hi,
> > > The way I saw it is that the big difference is that graph's have
> > > reified joins. This is both a blessing and a curse.
> > > A blessing because its much easier (less text to type, less mistakes,
> > > clearer semantics...) to traverse an edge than to construct a manual
> > > join.A curse because there are almost always far more ways to traverse
> > > a data set than just by the edges some architect might have considered
> > > when creating the data set. Often the architect is not the domain
> > > expert and the edges are a hardcoded layout of the dataset, which
> > > almost certainly won't survive the real world's demands. In graphs, if
> > > their are no edges then the data is not reachable, except via indexed
> > > lookups. This is the standard engineering problem of database design,
> > > but it is important and useful that data can be traversed, joined,
> > > without having reified edges.
> > > In Sqlg at least, but I suspect it generalizes, I want to create the
> > > notion of a "virtual edge". Which in meta data describes the join and
> > > then the standard to(direction, "virtualEdgeName") will work.
> > > In a way this is precisely to keep the graphy nature of gremlin, i.e.
> > > traversing edges, and avoid using the manual join syntax you described.
> > > CheersPieter
> > > 
> > > On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> > >> Hi,
> > >> *** This is mainly for Kuppitz, but if others care. 
> > >> Was thinking last night about relational data and Gremlin. The T()
> > >> step returns all the tables in the withStructure() RDBMS database.
> > >> Tables are ‘complex values’ so they can't leave the VM (only a simple
> > >> ‘toString’).
> > >> Below is a fake Gremlin session. (and these are just ideas…) tables
> > >> -> a ListLike of rows        rows -> a MapLike of primitives
> > >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> > >> g.T(‘people’)==>t[people]gremlin>
> > >> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> > >> in>
> > >> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> > >> age:10}==>{name:josh,age:35}gremlin>
> > >> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> > >> remlin>
> > >> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> > >> h
> > >> Makes sense. Nice that values() and has() generally apply to all
> > >> ListLike and MapLike structures. Also, note how asMap() is the
> > >> valueMap() of TP4, but generalizes to anything that is MapLike so it
> > >> can be turned into a primitive form as a data-rich result from the
> > >> VM.
> > >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> > >> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> > >> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(se
> > >> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).           
> > >> values().asMap()==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe
> > >> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{a.name <http://a.name/> <http://a.name/ <http://a.name/>>:
> > >> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> > >> ame’). // shorthand for equijoin on name
> > >> column/key           values().asMap()==>{a.name:marko,a.age:29,b.name <http://b.name/> <http://b.name/ <http://b.name/>>
> > >> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.ci <http://b.ci/> <http://b.ci/ <http://b.ci/>>
> > >> ty:tucson}==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}gr
> > >> emlin>
> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> > >> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> > >> complex value ‘toString'gremlin>
> > >> And of course, all of this is strategized into a SQL call so its
> > >> joins aren’t necessarily computed using TP4-VM resources.
> > >> Anywho — what I hope to realize is the relationship between “links”
> > >> (graph) and “joins” (tables). How can we make (bytecode-wise at
> > >> least) RDBMS join operations and graph traversal operations ‘the
> > >> same.’?
> > >>      Singleton: Integer, String, Float, Double, etc. Collection:
> > >> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> > >> Vertices and Tables can be “linked.” Unlike Collections, they don’t
> > >> maintain a “parent/child” relationship with the objects they
> > >> reference. What does this mean……….?
> > >> Take care,Marko.
> > >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>> <http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>>>
> 
> <diagrams.zip>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Joshua Shinavier <jo...@fortytwo.net>.

Ah, glad you asked. It's all in the pictures. I have nowhere to put them
online at the moment... maybe this attachment will go through to the list?

Btw. David Spivak gave his talk today at Uber; it was great. Juan Sequeda
(relational <--> RDF mapping guy) was also here, and Ryan joined remotely.
Really interesting discussion about databases vs. graphs, and what category
theory brings to the table.


On Mon, Apr 22, 2019 at 1:45 PM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hey Josh,
>
> I’m digging what you are saying, but the pictures didn’t come through for
> me ? … Can you provide them again (or if dev@ is filtering them, can you
> give me URLs to them)?
>
> Thanks,
> Marko.
>
>
> > On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <jo...@fortytwo.net>
> wrote:
> >
> > On the subject of "reified joins", maybe be a picture will be worth a
> few words. As I said in the thread <
> https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ>
> on property graph standardization, if you think of vertex labels, edge
> labels, and property keys as types, each with projections to two other
> types, there is a nice analogy with relations of two columns, and this
> analogy can be easily extended to hyper-edges. Here is what the schema of
> the TinkerPop classic graph looks like if you make each type (e.g. Person,
> Project, knows, name) into a relation:
> >
> >
> >
> > I have made the vertex types salmon-colored, the edge types yellow, the
> property types green, and the data types blue. The "o" and "I" columns
> represent the out-type (e.g. out-vertex type of Person) and in-type (e.g.
> property value type of String) of each relation. More than two arrows from
> a column represent a coproduct, e.g. the out-type of "name" is Person OR
> Project. Now you can think of out() and in() as joins of two tables on a
> primary and foreign key.
> >
> > We are not limited to "out" and "in", however. Here is the ternary
> relationship (hyper-edge) from hyper-edge slide <
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>
> of my Graph Day preso, which has three columns/roles/projections:
> >
> >
> >
> > I have drawn Says in light blue to indicate that it is a generalized
> element; it has projections other than "out" and "in". Now the line between
> relations and edges begins to blur. E.g. in the following, is PlaceEvent a
> vertex or a property?
> >
> >
> >
> > With the right type system, we can just speak of graph elements, and use
> "vertex", "edge", "property" when it is convenient. In the relational
> model, they are relations. If you materialize them in a relational
> database, they are rows. In any case, you need two basic graph traversal
> operations:
> > project() -- forward traversal of the arrows in the above diagrams.
> Takes you from an element to a component like in-vertex.
> > select() -- reverse traversal of the arrows. Allows you to answer
> questions like "in which Trips is John Doe the rider?"
> >
> > Josh
> >
> >
> > On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <okrammarko@gmail.com
> <ma...@gmail.com>> wrote:
> > Hello,
> >
> > I agree with everything you say. Here is my question:
> >
> >         Relational database — join: Table x Table x equality function ->
> Table
> >         Graph database — traverser: Vertex x edge label -> Vertex
> >
> > I want a single function that does both. The only think was to represent
> traverser() in terms of join():
> >
> >         Graph database — traverser: Vertices x Vertex x equality
> function -> Vertices
> >
> > For example,
> >
> > V().out(‘address’)
> >
> >         ==>
> >
> > g.join(V().hasLabel(‘person’).as(‘a’)
> >        V().hasLabel(‘addresses’).as(‘b’)).
> >          by(‘name’).select(?address vertex?)
> >
> > That is, join the vertices with themselves based on some predicate to go
> from vertices to vertices.
> >
> > However, I would like instead to transform the relational database
> join() concept into a traverser() concept. Kuppitz and I were talking the
> other day about a link() type operator that says: “try and link to this
> thing in some specified way.” .. ?? The problem we ran into is again, “link
> it to what?”
> >
> >         - in graph, the ‘to what’ is hardcoded so you don’t need to
> specify anything.
> >         - in rdbms, the ’to what’ is some other specified table.
> >
> > So what does the link() operator look like?
> >
> > ——
> >
> > Some other random thoughts….
> >
> > Relational databases join on the table (the whole collection)
> > Graph databases traverser on the vertex (an element of the whole
> collection)
> >
> > We can make a relational database join on single row (by providing a
> filter to a particular primary key). This is the same as a table with one
> row. Likewise, for graph in the join() context above:
> >
> > V(1).out(‘address’)
> >
> >         ==>
> >
> > g.join(V(1).as(‘a’)
> >        V().hasLabel(‘addresses’).as(‘b’)).
> >          by(‘name’).select(?address vertex?)
> >
> > More thoughts please….
> >
> > Marko.
> >
> > http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>>
> >
> >
> >
> >
> > > On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.martin@gmail.com
> <ma...@gmail.com>> wrote:
> > >
> > > Hi,
> > > The way I saw it is that the big difference is that graph's have
> > > reified joins. This is both a blessing and a curse.
> > > A blessing because its much easier (less text to type, less mistakes,
> > > clearer semantics...) to traverse an edge than to construct a manual
> > > join.A curse because there are almost always far more ways to traverse
> > > a data set than just by the edges some architect might have considered
> > > when creating the data set. Often the architect is not the domain
> > > expert and the edges are a hardcoded layout of the dataset, which
> > > almost certainly won't survive the real world's demands. In graphs, if
> > > their are no edges then the data is not reachable, except via indexed
> > > lookups. This is the standard engineering problem of database design,
> > > but it is important and useful that data can be traversed, joined,
> > > without having reified edges.
> > > In Sqlg at least, but I suspect it generalizes, I want to create the
> > > notion of a "virtual edge". Which in meta data describes the join and
> > > then the standard to(direction, "virtualEdgeName") will work.
> > > In a way this is precisely to keep the graphy nature of gremlin, i.e.
> > > traversing edges, and avoid using the manual join syntax you described.
> > > CheersPieter
> > >
> > > On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> > >> Hi,
> > >> *** This is mainly for Kuppitz, but if others care.
> > >> Was thinking last night about relational data and Gremlin. The T()
> > >> step returns all the tables in the withStructure() RDBMS database.
> > >> Tables are ‘complex values’ so they can't leave the VM (only a simple
> > >> ‘toString’).
> > >> Below is a fake Gremlin session. (and these are just ideas…) tables
> > >> -> a ListLike of rows        rows -> a MapLike of primitives
> > >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> > >> g.T(‘people’)==>t[people]gremlin>
> > >> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> > >> in>
> > >> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> > >> age:10}==>{name:josh,age:35}gremlin>
> > >> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> > >> remlin>
> > >> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> > >> h
> > >> Makes sense. Nice that values() and has() generally apply to all
> > >> ListLike and MapLike structures. Also, note how asMap() is the
> > >> valueMap() of TP4, but generalizes to anything that is MapLike so it
> > >> can be turned into a primitive form as a data-rich result from the
> > >> VM.
> > >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> > >> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> > >> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(se
> > >> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
> > >> values().asMap()==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe
> > >> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{a.name
> <http://a.name/>:
> > >> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> > >> ame’). // shorthand for equijoin on name
> > >> column/key           values().asMap()==>{a.name:marko,a.age:29,b.name
> <http://b.name/>
> > >> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.ci
> <http://b.ci/>
> > >> ty:tucson}==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}gr
> > >> emlin>
> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> > >> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> > >> complex value ‘toString'gremlin>
> > >> And of course, all of this is strategized into a SQL call so its
> > >> joins aren’t necessarily computed using TP4-VM resources.
> > >> Anywho — what I hope to realize is the relationship between “links”
> > >> (graph) and “joins” (tables). How can we make (bytecode-wise at
> > >> least) RDBMS join operations and graph traversal operations ‘the
> > >> same.’?
> > >>      Singleton: Integer, String, Float, Double, etc. Collection:
> > >> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> > >> Vertices and Tables can be “linked.” Unlike Collections, they don’t
> > >> maintain a “parent/child” relationship with the objects they
> > >> reference. What does this mean……….?
> > >> Take care,Marko.
> > >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <
> http://rredux.com/ <http://rredux.com/>>>
>
>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Marko Rodriguez <ok...@gmail.com>.

Hey Josh,

I’m digging what you are saying, but the pictures didn’t come through for me ? … Can you provide them again (or if dev@ is filtering them, can you give me URLs to them)?

Thanks,
Marko.


> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <jo...@fortytwo.net> wrote:
> 
> On the subject of "reified joins", maybe be a picture will be worth a few words. As I said in the thread <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ> on property graph standardization, if you think of vertex labels, edge labels, and property keys as types, each with projections to two other types, there is a nice analogy with relations of two columns, and this analogy can be easily extended to hyper-edges. Here is what the schema of the TinkerPop classic graph looks like if you make each type (e.g. Person, Project, knows, name) into a relation:
> 
> 
> 
> I have made the vertex types salmon-colored, the edge types yellow, the property types green, and the data types blue. The "o" and "I" columns represent the out-type (e.g. out-vertex type of Person) and in-type (e.g. property value type of String) of each relation. More than two arrows from a column represent a coproduct, e.g. the out-type of "name" is Person OR Project. Now you can think of out() and in() as joins of two tables on a primary and foreign key.
> 
> We are not limited to "out" and "in", however. Here is the ternary relationship (hyper-edge) from hyper-edge slide <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49> of my Graph Day preso, which has three columns/roles/projections:
> 
> 
> 
> I have drawn Says in light blue to indicate that it is a generalized element; it has projections other than "out" and "in". Now the line between relations and edges begins to blur. E.g. in the following, is PlaceEvent a vertex or a property?
> 
> 
> 
> With the right type system, we can just speak of graph elements, and use "vertex", "edge", "property" when it is convenient. In the relational model, they are relations. If you materialize them in a relational database, they are rows. In any case, you need two basic graph traversal operations:
> project() -- forward traversal of the arrows in the above diagrams. Takes you from an element to a component like in-vertex.
> select() -- reverse traversal of the arrows. Allows you to answer questions like "in which Trips is John Doe the rider?"
> 
> Josh
> 
> 
> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <okrammarko@gmail.com <ma...@gmail.com>> wrote:
> Hello,
> 
> I agree with everything you say. Here is my question:
> 
>         Relational database — join: Table x Table x equality function -> Table
>         Graph database — traverser: Vertex x edge label -> Vertex
> 
> I want a single function that does both. The only think was to represent traverser() in terms of join():
> 
>         Graph database — traverser: Vertices x Vertex x equality function -> Vertices
> 
> For example, 
> 
> V().out(‘address’)
> 
>         ==>
> 
> g.join(V().hasLabel(‘person’).as(‘a’)
>        V().hasLabel(‘addresses’).as(‘b’)).
>          by(‘name’).select(?address vertex?)
> 
> That is, join the vertices with themselves based on some predicate to go from vertices to vertices.
> 
> However, I would like instead to transform the relational database join() concept into a traverser() concept. Kuppitz and I were talking the other day about a link() type operator that says: “try and link to this thing in some specified way.” .. ?? The problem we ran into is again, “link it to what?”
> 
>         - in graph, the ‘to what’ is hardcoded so you don’t need to specify anything.
>         - in rdbms, the ’to what’ is some other specified table.
> 
> So what does the link() operator look like?
> 
> ——
> 
> Some other random thoughts….
> 
> Relational databases join on the table (the whole collection)
> Graph databases traverser on the vertex (an element of the whole collection)
> 
> We can make a relational database join on single row (by providing a filter to a particular primary key). This is the same as a table with one row. Likewise, for graph in the join() context above:
> 
> V(1).out(‘address’) 
> 
>         ==>
> 
> g.join(V(1).as(‘a’)
>        V().hasLabel(‘addresses’).as(‘b’)).
>          by(‘name’).select(?address vertex?)
> 
> More thoughts please….
> 
> Marko.
> 
> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>
> 
> 
> 
> 
> > On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.martin@gmail.com <ma...@gmail.com>> wrote:
> > 
> > Hi,
> > The way I saw it is that the big difference is that graph's have
> > reified joins. This is both a blessing and a curse.
> > A blessing because its much easier (less text to type, less mistakes,
> > clearer semantics...) to traverse an edge than to construct a manual
> > join.A curse because there are almost always far more ways to traverse
> > a data set than just by the edges some architect might have considered
> > when creating the data set. Often the architect is not the domain
> > expert and the edges are a hardcoded layout of the dataset, which
> > almost certainly won't survive the real world's demands. In graphs, if
> > their are no edges then the data is not reachable, except via indexed
> > lookups. This is the standard engineering problem of database design,
> > but it is important and useful that data can be traversed, joined,
> > without having reified edges.
> > In Sqlg at least, but I suspect it generalizes, I want to create the
> > notion of a "virtual edge". Which in meta data describes the join and
> > then the standard to(direction, "virtualEdgeName") will work.
> > In a way this is precisely to keep the graphy nature of gremlin, i.e.
> > traversing edges, and avoid using the manual join syntax you described.
> > CheersPieter
> > 
> > On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> >> Hi,
> >> *** This is mainly for Kuppitz, but if others care. 
> >> Was thinking last night about relational data and Gremlin. The T()
> >> step returns all the tables in the withStructure() RDBMS database.
> >> Tables are ‘complex values’ so they can't leave the VM (only a simple
> >> ‘toString’).
> >> Below is a fake Gremlin session. (and these are just ideas…) tables
> >> -> a ListLike of rows        rows -> a MapLike of primitives
> >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >> g.T(‘people’)==>t[people]gremlin>
> >> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> >> in>
> >> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> >> age:10}==>{name:josh,age:35}gremlin>
> >> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> >> remlin>
> >> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> >> h
> >> Makes sense. Nice that values() and has() generally apply to all
> >> ListLike and MapLike structures. Also, note how asMap() is the
> >> valueMap() of TP4, but generalizes to anything that is MapLike so it
> >> can be turned into a primitive form as a data-rich result from the
> >> VM.
> >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> >> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(se
> >> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).           
> >> values().asMap()==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe
> >> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{a.name <http://a.name/>:
> >> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> >> ame’). // shorthand for equijoin on name
> >> column/key           values().asMap()==>{a.name:marko,a.age:29,b.name <http://b.name/>
> >> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.ci <http://b.ci/>
> >> ty:tucson}==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}gr
> >> emlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> >> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> >> complex value ‘toString'gremlin>
> >> And of course, all of this is strategized into a SQL call so its
> >> joins aren’t necessarily computed using TP4-VM resources.
> >> Anywho — what I hope to realize is the relationship between “links”
> >> (graph) and “joins” (tables). How can we make (bytecode-wise at
> >> least) RDBMS join operations and graph traversal operations ‘the
> >> same.’?
> >>      Singleton: Integer, String, Float, Double, etc. Collection:
> >> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> >> Vertices and Tables can be “linked.” Unlike Collections, they don’t
> >> maintain a “parent/child” relationship with the objects they
> >> reference. What does this mean……….?
> >> Take care,Marko.
> >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Joshua Shinavier <jo...@fortytwo.net>.

On the subject of "reified joins", maybe be a picture will be worth a few
words. As I said in the thread
<https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ> on
property graph standardization, if you think of vertex labels, edge labels,
and property keys as types, each with projections to two other types, there
is a nice analogy with relations of two columns, and this analogy can be
easily extended to hyper-edges. Here is what the schema of the TinkerPop
classic graph looks like if you make each type (e.g. Person, Project,
knows, name) into a relation:

[image: image.png]


I have made the vertex types salmon-colored, the edge types yellow, the
property types green, and the data types blue. The "o" and "I" columns
represent the out-type (e.g. out-vertex type of Person) and in-type (e.g.
property value type of String) of each relation. More than two arrows from
a column represent a coproduct, e.g. the out-type of "name" is Person OR
Project. Now you can think of out() and in() as joins of two tables on a
primary and foreign key.

We are not limited to "out" and "in", however. Here is the ternary
relationship (hyper-edge) from hyper-edge slide
<https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>
of
my Graph Day preso, which has three columns/roles/projections:

[image: image.png]


I have drawn Says in light blue to indicate that it is a generalized
element; it has projections other than "out" and "in". Now the line between
relations and edges begins to blur. E.g. in the following, is PlaceEvent a
vertex or a property?

[image: image.png]


With the right type system, we can just speak of graph elements, and use
"vertex", "edge", "property" when it is convenient. In the relational
model, they are relations. If you materialize them in a relational
database, they are rows. In any case, you need two basic graph traversal
operations:

   - project() -- forward traversal of the arrows in the above diagrams.
   Takes you from an element to a component like in-vertex.
   - select() -- reverse traversal of the arrows. Allows you to answer
   questions like "in which Trips is John Doe the rider?"


Josh


On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello,
>
> I agree with everything you say. Here is my question:
>
>         Relational database — join: Table x Table x equality function ->
> Table
>         Graph database — traverser: Vertex x edge label -> Vertex
>
> I want a single function that does both. The only think was to represent
> traverser() in terms of join():
>
>         Graph database — traverser: Vertices x Vertex x equality function
> -> Vertices
>
> For example,
>
> V().out(‘address’)
>
>         ==>
>
> g.join(V().hasLabel(‘person’).as(‘a’)
>        V().hasLabel(‘addresses’).as(‘b’)).
>          by(‘name’).select(?address vertex?)
>
> That is, join the vertices with themselves based on some predicate to go
> from vertices to vertices.
>
> However, I would like instead to transform the relational database join()
> concept into a traverser() concept. Kuppitz and I were talking the other
> day about a link() type operator that says: “try and link to this thing in
> some specified way.” .. ?? The problem we ran into is again, “link it to
> what?”
>
>         - in graph, the ‘to what’ is hardcoded so you don’t need to
> specify anything.
>         - in rdbms, the ’to what’ is some other specified table.
>
> So what does the link() operator look like?
>
> ——
>
> Some other random thoughts….
>
> Relational databases join on the table (the whole collection)
> Graph databases traverser on the vertex (an element of the whole
> collection)
>
> We can make a relational database join on single row (by providing a
> filter to a particular primary key). This is the same as a table with one
> row. Likewise, for graph in the join() context above:
>
> V(1).out(‘address’)
>
>         ==>
>
> g.join(V(1).as(‘a’)
>        V().hasLabel(‘addresses’).as(‘b’)).
>          by(‘name’).select(?address vertex?)
>
> More thoughts please….
>
> Marko.
>
> http://rredux.com <http://rredux.com/>
>
>
>
>
> > On Apr 19, 2019, at 4:20 AM, pieter martin <pi...@gmail.com>
> wrote:
> >
> > Hi,
> > The way I saw it is that the big difference is that graph's have
> > reified joins. This is both a blessing and a curse.
> > A blessing because its much easier (less text to type, less mistakes,
> > clearer semantics...) to traverse an edge than to construct a manual
> > join.A curse because there are almost always far more ways to traverse
> > a data set than just by the edges some architect might have considered
> > when creating the data set. Often the architect is not the domain
> > expert and the edges are a hardcoded layout of the dataset, which
> > almost certainly won't survive the real world's demands. In graphs, if
> > their are no edges then the data is not reachable, except via indexed
> > lookups. This is the standard engineering problem of database design,
> > but it is important and useful that data can be traversed, joined,
> > without having reified edges.
> > In Sqlg at least, but I suspect it generalizes, I want to create the
> > notion of a "virtual edge". Which in meta data describes the join and
> > then the standard to(direction, "virtualEdgeName") will work.
> > In a way this is precisely to keep the graphy nature of gremlin, i.e.
> > traversing edges, and avoid using the manual join syntax you described.
> > CheersPieter
> >
> > On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> >> Hi,
> >> *** This is mainly for Kuppitz, but if others care.
> >> Was thinking last night about relational data and Gremlin. The T()
> >> step returns all the tables in the withStructure() RDBMS database.
> >> Tables are ‘complex values’ so they can't leave the VM (only a simple
> >> ‘toString’).
> >> Below is a fake Gremlin session. (and these are just ideas…) tables
> >> -> a ListLike of rows        rows -> a MapLike of primitives
> >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >> g.T(‘people’)==>t[people]gremlin>
> >> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> >> in>
> >> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> >> age:10}==>{name:josh,age:35}gremlin>
> >> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> >> remlin>
> >> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> >> h
> >> Makes sense. Nice that values() and has() generally apply to all
> >> ListLike and MapLike structures. Also, note how asMap() is the
> >> valueMap() of TP4, but generalizes to anything that is MapLike so it
> >> can be turned into a primitive form as a data-rich result from the
> >> VM.
> >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> >> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(se
> >> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).
> >> values().asMap()==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe
> >> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{a.name:
> >> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> >> ame’). // shorthand for equijoin on name
> >> column/key           values().asMap()==>{a.name:marko,a.age:29,b.name
> >> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.ci
> >> ty:tucson}==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}gr
> >> emlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> >> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> >> complex value ‘toString'gremlin>
> >> And of course, all of this is strategized into a SQL call so its
> >> joins aren’t necessarily computed using TP4-VM resources.
> >> Anywho — what I hope to realize is the relationship between “links”
> >> (graph) and “joins” (tables). How can we make (bytecode-wise at
> >> least) RDBMS join operations and graph traversal operations ‘the
> >> same.’?
> >>      Singleton: Integer, String, Float, Double, etc. Collection:
> >> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> >> Vertices and Tables can be “linked.” Unlike Collections, they don’t
> >> maintain a “parent/child” relationship with the objects they
> >> reference. What does this mean……….?
> >> Take care,Marko.
> >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>>
>
>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by Marko Rodriguez <ok...@gmail.com>.

Hello,

I agree with everything you say. Here is my question:

	Relational database — join: Table x Table x equality function -> Table
	Graph database — traverser: Vertex x edge label -> Vertex

I want a single function that does both. The only think was to represent traverser() in terms of join():

	Graph database — traverser: Vertices x Vertex x equality function -> Vertices

For example, 

V().out(‘address’)

	==>

g.join(V().hasLabel(‘person’).as(‘a’)
       V().hasLabel(‘addresses’).as(‘b’)).
         by(‘name’).select(?address vertex?)

That is, join the vertices with themselves based on some predicate to go from vertices to vertices.

However, I would like instead to transform the relational database join() concept into a traverser() concept. Kuppitz and I were talking the other day about a link() type operator that says: “try and link to this thing in some specified way.” .. ?? The problem we ran into is again, “link it to what?”

	- in graph, the ‘to what’ is hardcoded so you don’t need to specify anything.
	- in rdbms, the ’to what’ is some other specified table.

So what does the link() operator look like?

——

Some other random thoughts….

Relational databases join on the table (the whole collection)
Graph databases traverser on the vertex (an element of the whole collection)

We can make a relational database join on single row (by providing a filter to a particular primary key). This is the same as a table with one row. Likewise, for graph in the join() context above:

V(1).out(‘address’) 

	==>

g.join(V(1).as(‘a’)
       V().hasLabel(‘addresses’).as(‘b’)).
         by(‘name’).select(?address vertex?)

More thoughts please….

Marko.

http://rredux.com <http://rredux.com/>




> On Apr 19, 2019, at 4:20 AM, pieter martin <pi...@gmail.com> wrote:
> 
> Hi,
> The way I saw it is that the big difference is that graph's have
> reified joins. This is both a blessing and a curse.
> A blessing because its much easier (less text to type, less mistakes,
> clearer semantics...) to traverse an edge than to construct a manual
> join.A curse because there are almost always far more ways to traverse
> a data set than just by the edges some architect might have considered
> when creating the data set. Often the architect is not the domain
> expert and the edges are a hardcoded layout of the dataset, which
> almost certainly won't survive the real world's demands. In graphs, if
> their are no edges then the data is not reachable, except via indexed
> lookups. This is the standard engineering problem of database design,
> but it is important and useful that data can be traversed, joined,
> without having reified edges.
> In Sqlg at least, but I suspect it generalizes, I want to create the
> notion of a "virtual edge". Which in meta data describes the join and
> then the standard to(direction, "virtualEdgeName") will work.
> In a way this is precisely to keep the graphy nature of gremlin, i.e.
> traversing edges, and avoid using the manual join syntax you described.
> CheersPieter
> 
> On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
>> Hi,
>> *** This is mainly for Kuppitz, but if others care. 
>> Was thinking last night about relational data and Gremlin. The T()
>> step returns all the tables in the withStructure() RDBMS database.
>> Tables are ‘complex values’ so they can't leave the VM (only a simple
>> ‘toString’).
>> Below is a fake Gremlin session. (and these are just ideas…)	tables
>> -> a ListLike of rows	rows -> a MapLike of primitives
>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>> g.T(‘people’)==>t[people]gremlin>
>> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
>> in>
>> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
>> age:10}==>{name:josh,age:35}gremlin>
>> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
>> remlin>
>> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
>> h
>> Makes sense. Nice that values() and has() generally apply to all
>> ListLike and MapLike structures. Also, note how asMap() is the
>> valueMap() of TP4, but generalizes to anything that is MapLike so it
>> can be turned into a primitive form as a data-rich result from the
>> VM.
>> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
>> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
>> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(se
>> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).           
>> values().asMap()==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe
>> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{a.name:
>> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
>> ame’). // shorthand for equijoin on name
>> column/key           values().asMap()==>{a.name:marko,a.age:29,b.name
>> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.ci
>> ty:tucson}==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}gr
>> emlin>
>> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
>> ame’)==>t[people<-name->addresses]  // without asMap(), just the
>> complex value ‘toString'gremlin>
>> And of course, all of this is strategized into a SQL call so its
>> joins aren’t necessarily computed using TP4-VM resources.
>> Anywho — what I hope to realize is the relationship between “links”
>> (graph) and “joins” (tables). How can we make (bytecode-wise at
>> least) RDBMS join operations and graph traversal operations ‘the
>> same.’?
>> 	Singleton: Integer, String, Float, Double, etc.	Collection:
>> List, Map (Vertex, Table, Document)	Linkable: Vertex, Table
>> Vertices and Tables can be “linked.” Unlike Collections, they don’t
>> maintain a “parent/child” relationship with the objects they
>> reference. What does this mean……….?
>> Take care,Marko.
>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>

Re: What makes 'graph traversals' and 'relational joins' the same?

Posted by pieter martin <pi...@gmail.com>.

Hi,
The way I saw it is that the big difference is that graph's have
reified joins. This is both a blessing and a curse.
A blessing because its much easier (less text to type, less mistakes,
clearer semantics...) to traverse an edge than to construct a manual
join.A curse because there are almost always far more ways to traverse
a data set than just by the edges some architect might have considered
when creating the data set. Often the architect is not the domain
expert and the edges are a hardcoded layout of the dataset, which
almost certainly won't survive the real world's demands. In graphs, if
their are no edges then the data is not reachable, except via indexed
lookups. This is the standard engineering problem of database design,
but it is important and useful that data can be traversed, joined,
without having reified edges.
In Sqlg at least, but I suspect it generalizes, I want to create the
notion of a "virtual edge". Which in meta data describes the join and
then the standard to(direction, "virtualEdgeName") will work.
In a way this is precisely to keep the graphy nature of gremlin, i.e.
traversing edges, and avoid using the manual join syntax you described.
CheersPieter

On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> Hi,
> *** This is mainly for Kuppitz, but if others care. 
> Was thinking last night about relational data and Gremlin. The T()
> step returns all the tables in the withStructure() RDBMS database.
> Tables are ‘complex values’ so they can't leave the VM (only a simple
> ‘toString’).
> Below is a fake Gremlin session. (and these are just ideas…)	tables
> -> a ListLike of rows	rows -> a MapLike of primitives
> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> g.T(‘people’)==>t[people]gremlin>
> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> in>
> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> age:10}==>{name:josh,age:35}gremlin>
> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> remlin>
> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> h
> Makes sense. Nice that values() and has() generally apply to all
> ListLike and MapLike structures. Also, note how asMap() is the
> valueMap() of TP4, but generalizes to anything that is MapLike so it
> can be turned into a primitive form as a data-rich result from the
> VM.
> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(se
> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).           
> values().asMap()==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe
> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{a.name:
> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> ame’). // shorthand for equijoin on name
> column/key           values().asMap()==>{a.name:marko,a.age:29,b.name
> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.ci
> ty:tucson}==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}gr
> emlin>
> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> complex value ‘toString'gremlin>
> And of course, all of this is strategized into a SQL call so its
> joins aren’t necessarily computed using TP4-VM resources.
> Anywho — what I hope to realize is the relationship between “links”
> (graph) and “joins” (tables). How can we make (bytecode-wise at
> least) RDBMS join operations and graph traversal operations ‘the
> same.’?
> 	Singleton: Integer, String, Float, Double, etc.	Collection:
> List, Map (Vertex, Table, Document)	Linkable: Vertex, Table
> Vertices and Tables can be “linked.” Unlike Collections, they don’t
> maintain a “parent/child” relationship with the objects they
> reference. What does this mean……….?
> Take care,Marko.
> http://rredux.com <http://rredux.com/>
> 
> 
>