You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by pw...@apache.org on 2014/01/14 07:59:29 UTC

[05/50] git commit: Add connected components example to doc

Add connected components example to doc


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/7a4bb863
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/7a4bb863
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/7a4bb863

Branch: refs/heads/master
Commit: 7a4bb863c7c11e22332763081793e4989af8c526
Parents: 5e35d39
Author: Ankur Dave <an...@gmail.com>
Authored: Sun Jan 12 16:58:18 2014 -0800
Committer: Ankur Dave <an...@gmail.com>
Committed: Sun Jan 12 16:58:18 2014 -0800

----------------------------------------------------------------------
 docs/graphx-programming-guide.md | 20 +++++++++++++++++++-
 graphx/data/followers.txt        |  6 +-----
 graphx/data/users.txt            |  2 +-
 3 files changed, 21 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/7a4bb863/docs/graphx-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 52668b0..22feccb 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -475,6 +475,7 @@ GraphX includes a set of graph algorithms in to simplify analytics. The algorith
 [Algorithms]: api/graphx/index.html#org.apache.spark.graphx.lib.Algorithms
 
 ## PageRank
+<a name="pagerank"></a>
 
 PageRank measures the importance of each vertex in a graph, assuming an edge from *u* to *v* represents an endorsement of *v*'s importance by *u*. For example, if a Twitter user is followed by many others, the user will be ranked highly.
 
@@ -503,9 +504,26 @@ val ranksByUsername = users.leftOuterJoin(ranks).map {
 println(ranksByUsername.collect().mkString("\n"))
 {% endhighlight %}
 
-
 ## Connected Components
 
+The connected components algorithm labels each connected component of the graph with the ID of its lowest-numbered vertex. For example, in a social network, connected components can approximate clusters. We can compute the connected components of the example social network dataset from the [PageRank section](#pagerank) as follows:
+
+{% highlight scala %}
+// Load the implicit conversion and graph as in the PageRank example
+import org.apache.spark.graphx.lib._
+val users = ...
+val followers = ...
+val graph = Graph(users, followers)
+// Find the connected components
+val cc = graph.connectedComponents().vertices
+// Join the connected components with the usernames
+val ccByUsername = graph.vertices.innerJoin(cc) { (id, username, cc) =>
+  (username, cc)
+}
+// Print the result
+println(ccByUsername.collect().mkString("\n"))
+{% endhighlight %}
+
 ## Shortest Path
 
 ## Triangle Counting

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/7a4bb863/graphx/data/followers.txt
----------------------------------------------------------------------
diff --git a/graphx/data/followers.txt b/graphx/data/followers.txt
index 0f46d80..7bb8e90 100644
--- a/graphx/data/followers.txt
+++ b/graphx/data/followers.txt
@@ -1,10 +1,6 @@
 2 1
-3 1
 4 1
-6 1
-3 2
-6 2
-7 2
+1 2
 6 3
 7 3
 7 6

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/7a4bb863/graphx/data/users.txt
----------------------------------------------------------------------
diff --git a/graphx/data/users.txt b/graphx/data/users.txt
index ce3d06c..26e3b3b 100644
--- a/graphx/data/users.txt
+++ b/graphx/data/users.txt
@@ -1,5 +1,5 @@
 1 BarackObama
-2 ericschmidt
+2 ladygaga
 3 jeresig
 4 justinbieber
 6 matei_zaharia