You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by pw...@apache.org on 2014/01/14 07:59:29 UTC
[05/50] git commit: Add connected components example to doc
Add connected components example to doc
Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/7a4bb863
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/7a4bb863
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/7a4bb863
Branch: refs/heads/master
Commit: 7a4bb863c7c11e22332763081793e4989af8c526
Parents: 5e35d39
Author: Ankur Dave <an...@gmail.com>
Authored: Sun Jan 12 16:58:18 2014 -0800
Committer: Ankur Dave <an...@gmail.com>
Committed: Sun Jan 12 16:58:18 2014 -0800
----------------------------------------------------------------------
docs/graphx-programming-guide.md | 20 +++++++++++++++++++-
graphx/data/followers.txt | 6 +-----
graphx/data/users.txt | 2 +-
3 files changed, 21 insertions(+), 7 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/7a4bb863/docs/graphx-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 52668b0..22feccb 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -475,6 +475,7 @@ GraphX includes a set of graph algorithms in to simplify analytics. The algorith
[Algorithms]: api/graphx/index.html#org.apache.spark.graphx.lib.Algorithms
## PageRank
+<a name="pagerank"></a>
PageRank measures the importance of each vertex in a graph, assuming an edge from *u* to *v* represents an endorsement of *v*'s importance by *u*. For example, if a Twitter user is followed by many others, the user will be ranked highly.
@@ -503,9 +504,26 @@ val ranksByUsername = users.leftOuterJoin(ranks).map {
println(ranksByUsername.collect().mkString("\n"))
{% endhighlight %}
-
## Connected Components
+The connected components algorithm labels each connected component of the graph with the ID of its lowest-numbered vertex. For example, in a social network, connected components can approximate clusters. We can compute the connected components of the example social network dataset from the [PageRank section](#pagerank) as follows:
+
+{% highlight scala %}
+// Load the implicit conversion and graph as in the PageRank example
+import org.apache.spark.graphx.lib._
+val users = ...
+val followers = ...
+val graph = Graph(users, followers)
+// Find the connected components
+val cc = graph.connectedComponents().vertices
+// Join the connected components with the usernames
+val ccByUsername = graph.vertices.innerJoin(cc) { (id, username, cc) =>
+ (username, cc)
+}
+// Print the result
+println(ccByUsername.collect().mkString("\n"))
+{% endhighlight %}
+
## Shortest Path
## Triangle Counting
http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/7a4bb863/graphx/data/followers.txt
----------------------------------------------------------------------
diff --git a/graphx/data/followers.txt b/graphx/data/followers.txt
index 0f46d80..7bb8e90 100644
--- a/graphx/data/followers.txt
+++ b/graphx/data/followers.txt
@@ -1,10 +1,6 @@
2 1
-3 1
4 1
-6 1
-3 2
-6 2
-7 2
+1 2
6 3
7 3
7 6
http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/7a4bb863/graphx/data/users.txt
----------------------------------------------------------------------
diff --git a/graphx/data/users.txt b/graphx/data/users.txt
index ce3d06c..26e3b3b 100644
--- a/graphx/data/users.txt
+++ b/graphx/data/users.txt
@@ -1,5 +1,5 @@
1 BarackObama
-2 ericschmidt
+2 ladygaga
3 jeresig
4 justinbieber
6 matei_zaharia