You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by ok...@apache.org on 2016/06/13 19:37:52 UTC
[34/42] tinkerpop git commit: Add "centrality" recipes. CTR
Add "centrality" recipes. CTR
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/e790e56a
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/e790e56a
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/e790e56a
Branch: refs/heads/TINKERPOP-1278
Commit: e790e56aaa2118b07e36f2579502786bd9d79cc0
Parents: b36b42f
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Fri Jun 10 16:50:35 2016 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Fri Jun 10 16:50:35 2016 -0400
----------------------------------------------------------------------
docs/src/recipes/centrality.asciidoc | 138 +++++++++++++++++++++++++
docs/src/recipes/index.asciidoc | 2 +
docs/static/images/betweeness-example.png | Bin 0 -> 8465 bytes
3 files changed, 140 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/e790e56a/docs/src/recipes/centrality.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/centrality.asciidoc b/docs/src/recipes/centrality.asciidoc
new file mode 100644
index 0000000..cb3ce12
--- /dev/null
+++ b/docs/src/recipes/centrality.asciidoc
@@ -0,0 +1,138 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+////
+[[centrality]]
+Centrality
+----------
+
+There are many measures of link:https://en.wikipedia.org/wiki/Centrality[centrality] which are meant to help identify
+the most important vertices in a graph. As these measures are common in graph theory, this section attempts to
+demonstrate how some of these different indicators can be calculated using Gremlin.
+
+[[degree-centrality]]
+Degree Centrality
+~~~~~~~~~~~~~~~~~
+
+link:https://en.wikipedia.org/wiki/Centrality#Degree_centrality[Degree centrality] is a measure of the number of
+edges associated to each vertex.
+
+[gremlin-groovy,modern]
+----
+g.V().bothE().group().by(otherV()).by(count()) <1>
+g.V().inE().group().by(inV()).by(count()) <2>
+g.V().outE().group().by(outV()).by(count()) <3>
+g.V().group().by().by(outE().count()) <4>
+----
+
+<1> Calculation of degree centrality which counts all incident edges on each vertex to include those that are both
+incoming and outgoing.
+<2> Calculation of in-degree centrality which only counts incoming edges to a vertex.
+<3> Calculation of out-degree centrality which only counts outgoing edges from a vertex.
+<4> Same calculation as the previous traversal, but includes all vertices, even those with a zero
+
+[[betweeness-centrality]]
+Betweeness Centrality
+~~~~~~~~~~~~~~~~~~~~~
+
+link:https://en.wikipedia.org/wiki/Betweenness_centrality[Betweeness centrality] is a measure of the number of times
+a vertex is found between the <<shortest-path,shortest path>> of each vertex pair in a graph. Consider the following
+graph for demonstration purposes:
+
+image:betweeness-example.png[width=600]
+
+[gremlin-groovy ]
+----
+a = graph.addVertex('name','a')
+b = graph.addVertex('name','b')
+c = graph.addVertex('name','c')
+d = graph.addVertex('name','d')
+e = graph.addVertex('name','e')
+a.addEdge('next',b)
+b.addEdge('next',c)
+c.addEdge('next',d)
+d.addEdge('next',e)
+g.withSack(0).V().repeat(both().simplePath()).emit().path(). <1>
+ group().by(project("a","b").by(limit(local, 1)). <2>
+ by(tail(local, 1))).
+ by(order().by(count(local))). <3>
+ select(values).as("shortestPaths"). <4>
+ V().as("v").map(select("shortestPaths").unfold(). <5>
+ filter(unfold().where(eq("v"))).count()).
+ sack(sum).barrier(normSack).sack().as("betweeness"). <6>
+ select("v","betweeness")
+----
+
+<1> Defines a Gremlin link:http://tinkerpop.apache.org/docs/x.y.z/reference/#sack-step[sack] with a value of zero,
+which represents the initial betweeness score for each vertex, and traverses on both incoming and outgoing edges
+avoiding <<cycle-detection, cyclic paths>>.
+<2> Group each path by the first and last vertex.
+<3> Reduce the list of paths to the shortest path between the first and last vertex by ordering on their lengths.
+<4> Recall that at this point, there is a `Map` keyed by first and last vertex and with a value of just the shortest
+path. Extract the shortest path with `select(values)`, since that's the only portion required for the remainder of
+the traversal.
+<5> Begin a mid-traversal iteration of all vertices to essentially count the shortest paths that pass through each one.
+<6> Sum the counts for each vertex using `sack()`, normalize the value and label it as the "betweeness" to be the score.
+
+[[closeness-centrality]]
+Closeness Centrality
+~~~~~~~~~~~~~~~~~~~~
+
+link:https://en.wikipedia.org/wiki/Centrality[Closeness centrality] is a measure of the distance of one vertex to all
+other reachable vertices in the graph.
+
+[gremlin-groovy,modern]
+----
+g.V().as("a").repeat(both().simplePath()).emit().as("b").
+ dedup().by(select("a","b")).
+ path().
+ group().by(limit(local, 1)).
+ by(count(local).map {1/it.get()}.sum())
+----
+
+The traversal starts by iterating over all vertices to find all paths that don't <<cycle-detection,cycle>>. It labels
+the start vertex as "a" and the ending vertex being emitted as "b". To perform this calculation, the measurement
+only requires the <<shortest-path,shortest path>> between each vertex pair, so the first occurrence of "a" and "b"
+as denoted in the `dedup().by(select("a","b"))` will be that shortest distance.
+
+To this point, the traversal has the unique list of shortest paths between each vertex pair and it is time to
+operate on that `path()`. To get the output displaying each start vertex with the centrality calculation, the `Path`
+objects can be grouped. In this case, `group()` takes two `by()` modulators. The first `by()` specifies the key for
+grouping and simply extracts the first vertex in the `Path` with `limit(local, 1)`. The second `by()` produces the
+value for the grouping and in this case:
+
+. `(count(local)` - counts the length of each path to produce the __distance__
+. `map {1/it.get()}` - uses a closure to divide the __distance__ by "1"
+. `sum()` - sums those values to produce the centrality score for that vertex
+
+[[eigenvector-centrality]]
+Eigenvector Centrality
+~~~~~~~~~~~~~~~~~~~~~~
+
+A calculation of link:https://en.wikipedia.org/wiki/Centrality#Eigenvector_centrality[eigenvector centrality] uses the
+relative importance of adjacent vertices to help determine their centrality. In other words, unlike
+<<degree-centrality, degree centrality>> the vertex with the greatest number of incident edges does not necessarily
+give it the highest rank.
+
+[gremlin-groovy,modern]
+----
+g.V().repeat(groupCount("m").out()).times(30).cap('m')
+----
+
+The traversal iterates through each vertex in the graph and for each one repeatedly group counts each vertex that
+passes through using the vertex as the key. The `Map` of this group count is stored in a variable named "m". The
+`out()` traversal is repeated thirty times or until the paths are exhausted. Thirty iterations should provide enough
+time to converge on a solution. Calling `cap('m')` at the end simply extracts the `Map` side-effect stored in "m"
+and returns it from the traversal as the result.
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/e790e56a/docs/src/recipes/index.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/index.asciidoc b/docs/src/recipes/index.asciidoc
index 6f80ad0..3ac9142 100644
--- a/docs/src/recipes/index.asciidoc
+++ b/docs/src/recipes/index.asciidoc
@@ -44,6 +44,8 @@ include::if-then-based-grouping.asciidoc[]
include::cycle-detection.asciidoc[]
+include::centrality.asciidoc[]
+
Implementation Recipes
======================
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/e790e56a/docs/static/images/betweeness-example.png
----------------------------------------------------------------------
diff --git a/docs/static/images/betweeness-example.png b/docs/static/images/betweeness-example.png
new file mode 100644
index 0000000..650ee53
Binary files /dev/null and b/docs/static/images/betweeness-example.png differ