You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by sp...@apache.org on 2016/10/14 16:05:41 UTC

tinkerpop git commit: Added a tree recipe CTR

Repository: tinkerpop
Updated Branches:
  refs/heads/master c9bab0858 -> 3245849ea


Added a tree recipe CTR


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/3245849e
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/3245849e
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/3245849e

Branch: refs/heads/master
Commit: 3245849ea9c53526135c3daaa0f903d90d2a561b
Parents: c9bab08
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Fri Oct 14 12:05:17 2016 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Fri Oct 14 12:05:17 2016 -0400

----------------------------------------------------------------------
 docs/src/recipes/index.asciidoc           |   2 +
 docs/src/recipes/tree.asciidoc            | 203 +++++++++++++++++++++++++
 docs/static/images/gremlin-index-time.png | Bin 0 -> 53361 bytes
 docs/static/images/gremlin-max-depth.png  | Bin 0 -> 109506 bytes
 docs/static/images/gremlin-tree.png       | Bin 0 -> 229528 bytes
 docs/static/images/tree-lca.png           | Bin 0 -> 27172 bytes
 6 files changed, 205 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/3245849e/docs/src/recipes/index.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/index.asciidoc b/docs/src/recipes/index.asciidoc
index cb93674..47c2b37 100644
--- a/docs/src/recipes/index.asciidoc
+++ b/docs/src/recipes/index.asciidoc
@@ -52,6 +52,8 @@ include::shortest-path.asciidoc[]
 
 include::traversal-induced-values.asciidoc[]
 
+include::tree.asciidoc[]
+
 Implementation Recipes
 ======================
 

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/3245849e/docs/src/recipes/tree.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/tree.asciidoc b/docs/src/recipes/tree.asciidoc
new file mode 100644
index 0000000..5d73dc6
--- /dev/null
+++ b/docs/src/recipes/tree.asciidoc
@@ -0,0 +1,203 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+////
+[[tree]]
+Tree
+----
+
+image:gremlin-tree.png[width=280]
+
+Lowest Common Ancestor
+~~~~~~~~~~~~~~~~~~~~~~
+
+image:tree-lca.png[width=230,float=right] Given a tree, the link:https://en.wikipedia.org/wiki/Lowest_common_ancestor[lowest common ancestor]
+is the deepest vertex that is common to two or more other vertices. The diagram to the right depicts the common
+ancestor tree for vertices A and D in the various green shades. The C vertex, the vertex with the darkest green
+shading, is the lowest common ancestor.
+
+The following code simply sets up the graph depicted above using "hasParent" for the edge label:
+
+[gremlin-groovy]
+----
+g.addV('name', 'A').as('a').
+  addV('name', 'B').as('b').
+  addV('name', 'C').as('c').
+  addV('name', 'D').as('d').
+  addV('name', 'E').as('e').
+  addV('name', 'F').as('f').
+  addV('name', 'G').as('g').
+  addE('hasParent').from('a').to('b').
+  addE('hasParent').from('b').to('c').
+  addE('hasParent').from('d').to('c').
+  addE('hasParent').from('c').to('e').
+  addE('hasParent').from('e').to('f').
+  addE('hasParent').from('g').to('f').iterate()
+----
+
+Given that graph, the following traversal will get the lowest common ancestor for two vertices, A and D:
+
+[gremlin-groovy,existing]
+----
+g.V().has('name','A').
+  repeat(out('hasParent')).emit().as('x').
+  repeat(__.in('hasParent')).emit(has('name','D')).
+  select('x').limit(1).values('name')
+----
+
+The above traversal is reasonably straightforward to follow in that it simply traverses up the tree from the A vertex
+and then traverses down from each ancestor until it finds the "D" vertex. The first path that uncovers that match is
+the lowest common ancestor.
+
+The complexity of finding the lowest common ancestor increases when trying to find the ancestors of three or more
+vertices.
+
+[gremlin-groovy,existing]
+----
+input = ['A','B','D']
+g.V().has('name', input.head()).
+  repeat(out('hasParent')).emit().as('x').                               <1>
+  V().has('name', within(input.tail())).                                 <2>
+  repeat(out('hasParent')).emit(where(eq('x'))).                         <3>
+  group().by(select('x')).by(path().count(local).fold()).                <4>
+  unfold().filter(select(values).count(local).is(input.tail().size())).  <5>
+  order().by(select(values).unfold().sum()).                             <6>
+  select(keys).limit(1).valueMap(true)                                   <7>
+----
+
+<1> The start of the traversal is not so different than the previous one and starts with vertex A.
+<2> Use a mid-traversal `V()` to find the child vertices B and D.
+<3> Traverse up the tree for B and D and find common ancestors that were labeled with "x".
+<4> Group on the common ancestors where the value of the grouping is the length of the path.
+<5> The result of the previous step is a `Map` with a vertex (i.e. common ancestor) for the key and a list of path
+lengths. Unroll the `Map` and ensure that the number of path lengths are equivalent to the number of children that
+were given to the mid-traversal `V()`.
+<6> Order the results based on the sum of the path lengths.
+<7> Since the results were placed in ascending order, the first result must be the lowest common ancestor.
+
+As the above traversal utilizes a mid-traversal `V()`, it cannot be used for OLAP. In OLAP, the pattern changes a bit:
+
+[gremlin-groovy,existing]
+----
+g.withComputer().V().has('name', within(input)).aggregate('input').
+  has('name', input.head()).repeat(out('hasParent')).emit().as('x').
+  select('input').unfold().has('name', within(input.tail())).
+  repeat(out('hasParent')).emit(where(eq('x'))).
+  group().by(select('x')).by(path().count(local).fold()).
+  unfold().filter(select(values).count(local).is(input.tail().size())).
+  order().by(select(values).unfold().sum()).
+  select(keys).limit(1).valueMap(true)
+----
+
+Maximum Depth
+~~~~~~~~~~~~~
+
+Finding the maximum depth of a tree starting from a specified root vertex can be determined as follows:
+
+[gremlin-groovy]
+----
+g.addV('name', 'A').as('a').
+  addV('name', 'B').as('b').
+  addV('name', 'C').as('c').
+  addV('name', 'D').as('d').
+  addV('name', 'E').as('e').
+  addV('name', 'F').as('f').
+  addV('name', 'G').as('g').
+  addE('hasParent').from('a').to('b').
+  addE('hasParent').from('b').to('c').
+  addE('hasParent').from('d').to('c').
+  addE('hasParent').from('c').to('e').
+  addE('hasParent').from('e').to('f').
+  addE('hasParent').from('g').to('f').iterate()
+g.V().has('name','F').repeat(__.in()).emit().path().count(local).max()
+g.V().has('name','C').repeat(__.in()).emit().path().count(local).max()
+----
+
+image:gremlin-max-depth.png[float=right,width=350]The traversals shown above are fairly straightforward. The traversal
+beings at a particlar starting vertex, traverse in on the "hasParent" edges emitting all vertices as it goes. It
+calculates the path length and then selects the longest one. While this approach is quite direct, there is room for
+improvement:
+
+[gremlin-groovy,existing]
+----
+g.V().has('name','F').
+  repeat(__.in()).emit(__.not(inE())).tail(1).
+  path().count(local)
+g.V().has('name','C').
+  repeat(__.in()).emit(__.not(inE())).tail(1).
+  path().count(local)
+----
+
+There are two optimizations at play. First, there is no need to emit all the vertices, only the "leaf" vertices (i.e.
+those without incoming edges). Second, all results save the last one can be ignored to that point (i.e. the last one is
+the one at the deepest point in the tree). In this way, the path and path length only need to be calculated for a
+single result.
+
+Time-based Indexing
+~~~~~~~~~~~~~~~~~~~
+
+Trees can be used for modelling time-oriented data in a graph. Modeling time where there are "year", "month" and "day"
+vertices (or lower granularity as needed) allows the structure of the graph to inherently index data tied to them.
+
+image:gremlin-index-time.png[width=800]
+
+NOTE: This model is discussed further in this Neo4j link:https://neo4j.com/blog/modeling-a-multilevel-index-in-neoj4/[blog post].
+Also, there can be other versions of this model that utilize different edge/vertex labelling and property naming
+strategies. The schema depicted here is designed for simplicity.
+
+The Gremlin script below creates the graph depicted in the graph above:
+
+[gremlin-groovy]
+----
+g.addV(label, 'year', 'name', '2016').as('y2016').
+  addV(label, 'month', 'name', 'may').as('m05').addV(label, 'month', 'name', 'june').as('m06').
+  addV(label, 'day', 'name', '30').as('d30').addV(label, 'day', 'name', '31').as('d31').
+  addV(label, 'day', 'name', '01').as('d01').
+  addV(label, 'event', 'name', 'A').as('eA').addV(label, 'event', 'name', 'B').as('eB').
+  addV(label, 'event', 'name', 'C').as('eC').addV(label, 'event', 'name', 'D').as('eD').
+  addV(label, 'event', 'name', 'E').as('eE').
+  addE('may').from('y2016').to('m05').
+  addE('june').from('y2016').to('m06').
+  addE('day30').from('m05').to('d30').
+  addE('day31').from('m05').to('d31').
+  addE('day01').from('m06').to('d01').
+  addE('has').from('d30').to('eA').
+  addE('has').from('d30').to('eB').
+  addE('has').from('d31').to('eC').
+  addE('has').from('d31').to('eD').
+  addE('has').from('d01').to('eE').
+  addE('next').from('d30').to('d31').
+  addE('next').from('d31').to('d01').
+  addE('next').from('m05').to('m06').iterate()
+----
+
+IMPORTANT: The code example above does not create any indices. Proper index creation, which is specific to the
+graph implementation used, will be critical to the performance of traversals over this structure.
+
+[gremlin-groovy,existing]
+----
+g.V().has('name','2016').out().out().out('has').values()                  <1>
+g.V().has('name','2016').out('may').out().out('has').values()             <2>
+g.V().has('name','2016').out('may').out('day31').out('has').values()      <3>
+g.V().has('name','2016').out('may').out('day31').as('start').
+  V().has('name','2016').out('june').out('day01').as('end').
+  emit().repeat(__.in('next')).until(where(eq('start'))).
+  out('has').order().by('name').values('name')                            <4>
+----
+
+<1> Find all the events in 2016.
+<2> Find all the events in May of 2016.
+<3> Find all the events on May 31, 2016.
+<4> Find all the events between May 31, 2016 and June 1, 2016.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/3245849e/docs/static/images/gremlin-index-time.png
----------------------------------------------------------------------
diff --git a/docs/static/images/gremlin-index-time.png b/docs/static/images/gremlin-index-time.png
new file mode 100755
index 0000000..6760a90
Binary files /dev/null and b/docs/static/images/gremlin-index-time.png differ

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/3245849e/docs/static/images/gremlin-max-depth.png
----------------------------------------------------------------------
diff --git a/docs/static/images/gremlin-max-depth.png b/docs/static/images/gremlin-max-depth.png
new file mode 100755
index 0000000..3b8163d
Binary files /dev/null and b/docs/static/images/gremlin-max-depth.png differ

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/3245849e/docs/static/images/gremlin-tree.png
----------------------------------------------------------------------
diff --git a/docs/static/images/gremlin-tree.png b/docs/static/images/gremlin-tree.png
new file mode 100755
index 0000000..012b164
Binary files /dev/null and b/docs/static/images/gremlin-tree.png differ

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/3245849e/docs/static/images/tree-lca.png
----------------------------------------------------------------------
diff --git a/docs/static/images/tree-lca.png b/docs/static/images/tree-lca.png
new file mode 100755
index 0000000..2bf40a9
Binary files /dev/null and b/docs/static/images/tree-lca.png differ