You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by dk...@apache.org on 2017/01/24 10:01:42 UTC

[1/5] tinkerpop git commit: Updated the closeness and betweeness centrality recipes.

Repository: tinkerpop
Updated Branches:
  refs/heads/master d50139023 -> 748e76557


Updated the closeness and betweeness centrality recipes.

1) added a disclaimer that both recipes should be used with care (only on small (sub)graphs)
2) optimized the execution plan
3) took multiple shortest paths between a pair of vertices into account
4) updated the sample graph to one that has multiple shortest paths between certain pairs of vertices


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/5577ea56
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/5577ea56
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/5577ea56

Branch: refs/heads/master
Commit: 5577ea564e8a9634d3b0699cbddee59b039db90f
Parents: 8ad2911
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Fri Jan 20 16:36:42 2017 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Fri Jan 20 16:41:09 2017 +0100

----------------------------------------------------------------------
 docs/src/recipes/centrality.asciidoc | 121 +++++++++++++++++-------------
 1 file changed, 67 insertions(+), 54 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/5577ea56/docs/src/recipes/centrality.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/centrality.asciidoc b/docs/src/recipes/centrality.asciidoc
index cbce418..f6fba30 100644
--- a/docs/src/recipes/centrality.asciidoc
+++ b/docs/src/recipes/centrality.asciidoc
@@ -67,43 +67,48 @@ image:betweeness-example.png[width=600]
 
 [gremlin-groovy ]
 ----
-g.addV('name','a').as('a').
-  addV('name','b').as('b').
-  addV('name','c').as('c').
-  addV('name','d').as('d').
-  addV('name','e').as('e').
+g.addV(id,'A').as('a').
+  addV(id,'B').as('b').
+  addV(id,'C').as('c').
+  addV(id,'D').as('d').
+  addV(id,'E').as('e').
+  addV(id,'F').as('f').
   addE('next').from('a').to('b').
   addE('next').from('b').to('c').
-  addE('next').from('c').to('d').
-  addE('next').from('d').to('e').iterate()
-g.withSack(0).V().store("x").repeat(both().simplePath()).emit().path(). <1>
-  group().by(project("a","b").by(limit(local, 1)).                      <2>
-                              by(tail(local, 1))).
-          by(order().by(count(local))).                                 <3>
-          select(values).as("shortestPaths").                           <4>
-          select("x").unfold().as("v").                                 <5>
-          select("shortestPaths").                                      <6>
-            map(unfold().filter(unfold().where(eq("v"))).count()).      <7>
-            sack(sum).sack().as("betweeness").                          <8>
-          select("v","betweeness")
+  addE('next').from('b').to('d').
+  addE('next').from('c').to('e').
+  addE('next').from('d').to('e').
+  addE('next').from('e').to('f').iterate()
+g.V().as("v").                                                                  <1>
+  repeat(both().simplePath().as("v")).emit().                                   <2>
+  filter(project("x","y","z").by(select(first, "v")).                           <3>
+                              by(select(last, "v")).
+                              by(select(all, "v").count(local)).as("triple").
+         coalesce(select("x","y").as("a").                                      <4>
+                    select("triples").unfold().as("t").
+                    select("x","y").where(eq("a")).
+                    select("t"),
+                  store("triples")).                                            <5>
+         select("z").as("length").
+         select("triple").select("z").where(eq("length"))).                     <6>
+  select(all, "v").unfold().                                                    <7>
+  groupCount().next()                                                           <8>
 ----
 
-<1> Defines a Gremlin link:http://tinkerpop.apache.org/docs/x.y.z/reference/#sack-step[sack] with a value of zero,
-which represents the initial betweeness score for each vertex, and traverses on both incoming and outgoing edges
-avoiding <<cycle-detection, cyclic paths>>.
-<2> Group each path by the first and last vertex.
-<3> Reduce the list of paths to the shortest path between the first and last vertex by ordering on their lengths.
-<4> Recall that at this point, there is a `Map` keyed by first and last vertex and with a value of just the shortest
-path. Extract the shortest path with `select(values)`, since that's the only portion required for the remainder of
-the traversal.
-<5> The "x" key contains the list of vertices stored from step 1 - unfold that list into "v" for later use. This step
-will unwrap the vertex that is stored in the `Traverser` as
-link:http://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/process/traversal/step/util/BulkSet.html[BulkSet]
-so that it can be used directly in the `Traversal`.
-<6> Iterate the set of shortest paths. At this point, it is worth noting that the traversal is iterating each vertex
-in "v" and for each vertex in "v" it is iterating each `Path` in "shortestpaths".
-<7> For each path, transform it to a count of the number of times that "v" from step 5 is encountered.
-<8> Sum the counts for each vertex using `sack()`, normalize the value and label it as the "betweeness" to be the score.
+<1> Starting from each vertex in the graph...
+<2> ...traverse on both - incoming and outgoing - edges, avoiding <<cycle-detection, cyclic paths>>.
+<3> Create a triple consisting of the first vertex, the last vertex and the length of the path between them.
+<4> Determine whether a path between those two vertices was already found.
+<5> If this is the first path between the two vertices, store the triple in an internal collection named "triples".
+<6> Keep only those paths between a pair of vertices that have the same length as the first path that was found between them.
+<7> Select all shortest paths and unfold them.
+<8> Count the number of occurrences of each vertex, which is ultimately its betweeness score.
+
+WARNING: Since the betweeness centrality algorithm requires the shortest path between any pair of vertices in the graph,
+its practical applications are very limited. It's recommended to use this algorithm only on small subgraphs (graphs like
+the link:http://tinkerpop.apache.org/docs/current/reference/#grateful-dead[Grateful Dead graph] with only 808 vertices
+and 8049 edges already require a massive amount of compute resources to determine the shortest paths between all vertex
+pairs).
 
 [[closeness-centrality]]
 Closeness Centrality
@@ -114,28 +119,36 @@ other reachable vertices in the graph. The following examples use the modern toy
 
 [gremlin-groovy,modern]
 ----
-g.withSack(1f).V().repeat(both().simplePath()).emit().path().                    <1>
-  group().by(project("a","b").by(limit(local, 1)).                               <2>
-                              by(tail(local, 1))).
-          by(order().by(count(local))).                                          <3>
-  select(values).unfold().                                                       <4>
-  project("v","length").
-    by(limit(local, 1)).                                                         <5>
-    by(count(local).sack(div).sack()).                                           <6>
-  group().by(select("v")).by(select("length").sum())                             <7>
+g = TinkerFactory.createModern().traversal()
+g.withSack(1f).V().as("v").                                                     <1>
+  repeat(both().simplePath().as("v")).emit().                                   <2>
+  filter(project("x","y","z").by(select(first, "v")).                           <3>
+                              by(select(last, "v")).
+                              by(select(all, "v").count(local)).as("triple").
+         coalesce(select("x","y").as("a").                                      <4>
+                    select("triples").unfold().as("t").
+                    select("x","y").where(eq("a")).
+                    select("t"),
+                  store("triples")).                                            <5>
+         select("z").as("length").
+         select("triple").select("z").where(eq("length"))).                     <6>
+  group().by(select(first, "v")).                                               <7>
+          by(select(all, "v").count(local).sack(div).sack().sum()).next()
 ----
 
-<1> Defines a Gremlin link:http://tinkerpop.apache.org/docs/x.y.z/reference/#sack-step[sack] with a value of one,
-and traverses on both incoming and outgoing edges avoiding <<cycle-detection, cyclic paths>>.
-<2> Group each path by the first and last vertex.
-<3> Reduce the list of paths to the shortest path between the first and last vertex by ordering on their lengths.
-<4> Recall that at this point, there is a `Map` keyed by first and last vertex and with a value of just the shortest
-path. Extract the shortest path with `select(values)`, since that's the only portion required for the remainder of
-the traversal.
-<5> The first `by()` modulator for `project()` extracts the first vertex in the path.
-<6> The second `by()` modulator for `project()` extracts the path length and divides that distance by the value of
-the `sack()` which was initialized to 1 at the start of the traversal.
-<7> Group the resulting `Map` objects on "v" and sum their lengths to get the centrality score for each.
+<1> Defines a Gremlin link:http://tinkerpop.apache.org/docs/x.y.z/reference/#sack-step[sack] with a value of one.
+<2> Traverses on both - incoming and outgoing - edges, avoiding <<cycle-detection, cyclic paths>>.
+<3> Create a triple consisting of the first vertex, the last vertex and the length of the path between them.
+<4> Determine whether a path between those two vertices was already found.
+<5> If this is the first path between the two vertices, store the triple in an internal collection named "triples".
+<6> Keep only those paths between a pair of vertices that have the same length as the first path that was found between them.
+<7> For each vertex divide 1 by the product of the lengths of all shortest paths that start with this particular vertex.
+
+WARNING: Since the closeness centrality algorithm requires the shortest path between any pair of vertices in the graph,
+its practical applications are very limited. It's recommended to use this algorithm only on small subgraphs (graphs like
+the link:http://tinkerpop.apache.org/docs/current/reference/#grateful-dead[Grateful Dead graph] with only 808 vertices
+and 8049 edges already require a massive amount of compute resources to determine the shortest paths between all vertex
+pairs).
 
 [[eigenvector-centrality]]
 Eigenvector Centrality
@@ -165,4 +178,4 @@ link:http://tinkerpop.apache.org/docs/current/reference/#timelimit-step[time lim
 traversal from taking longer than one hundred milliseconds to execute (the previous example takes considerably longer
 than that). While the answer provided with the `timeLimit()` is not the absolute ranking, it does provide a relative
 ranking that closely matches the absolute one. The use of `timeLimit()` in certain algorithms (e.g. recommendations)
-can shorten the time required to get a reasonable and usable result.
\ No newline at end of file
+can shorten the time required to get a reasonable and usable result.


[3/5] tinkerpop git commit: Fixed up betweeness example graphic - for real this time.

Posted by dk...@apache.org.
Fixed up betweeness example graphic - for real this time.


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/1756461c
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/1756461c
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/1756461c

Branch: refs/heads/master
Commit: 1756461cef6f51a1610702c813b759b40de2a849
Parents: 841fc30
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jan 23 11:30:44 2017 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Mon Jan 23 11:30:44 2017 -0500

----------------------------------------------------------------------
 docs/static/images/betweeness-example.png | Bin 9961 -> 21283 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/1756461c/docs/static/images/betweeness-example.png
----------------------------------------------------------------------
diff --git a/docs/static/images/betweeness-example.png b/docs/static/images/betweeness-example.png
index a086a65..fa58b5d 100755
Binary files a/docs/static/images/betweeness-example.png and b/docs/static/images/betweeness-example.png differ


[5/5] tinkerpop git commit: Merge branch 'tp32'

Posted by dk...@apache.org.
Merge branch 'tp32'


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/748e7655
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/748e7655
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/748e7655

Branch: refs/heads/master
Commit: 748e7655723e45ad79b71b4394dbb8d90959004a
Parents: d501390 9d88304
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Tue Jan 24 11:00:57 2017 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Tue Jan 24 11:00:57 2017 +0100

----------------------------------------------------------------------
 docs/src/recipes/centrality.asciidoc      | 121 ++++++++++++++-----------
 docs/static/images/betweeness-example.png | Bin 8465 -> 21283 bytes
 2 files changed, 67 insertions(+), 54 deletions(-)
----------------------------------------------------------------------



[4/5] tinkerpop git commit: Merge branch 'centrality-recipes' into tp32

Posted by dk...@apache.org.
Merge branch 'centrality-recipes' into tp32


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/9d883043
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/9d883043
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/9d883043

Branch: refs/heads/master
Commit: 9d883043ed3123af5b1c614b343cbbdd0dc9013f
Parents: aa262d6 1756461
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Tue Jan 24 11:00:26 2017 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Tue Jan 24 11:00:26 2017 +0100

----------------------------------------------------------------------
 docs/src/recipes/centrality.asciidoc      | 121 ++++++++++++++-----------
 docs/static/images/betweeness-example.png | Bin 8465 -> 21283 bytes
 2 files changed, 67 insertions(+), 54 deletions(-)
----------------------------------------------------------------------



[2/5] tinkerpop git commit: Fixed up betweeness graphic based on the revised example.

Posted by dk...@apache.org.
Fixed up betweeness graphic based on the revised example.


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/841fc30a
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/841fc30a
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/841fc30a

Branch: refs/heads/master
Commit: 841fc30a5e8f4b9c1b98ca5056c26d0d26b67115
Parents: 5577ea5
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jan 23 11:15:30 2017 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Mon Jan 23 11:15:30 2017 -0500

----------------------------------------------------------------------
 docs/static/images/betweeness-example.png | Bin 8465 -> 9961 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/841fc30a/docs/static/images/betweeness-example.png
----------------------------------------------------------------------
diff --git a/docs/static/images/betweeness-example.png b/docs/static/images/betweeness-example.png
old mode 100644
new mode 100755
index 650ee53..a086a65
Binary files a/docs/static/images/betweeness-example.png and b/docs/static/images/betweeness-example.png differ