You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by rx...@apache.org on 2014/01/15 06:52:22 UTC

[1/2] git commit: Merge pull request #431 from ankurdave/graphx-caching-doc

Updated Branches:
  refs/heads/branch-0.9 51131bf82 -> a075a452d


Merge pull request #431 from ankurdave/graphx-caching-doc

Describe caching and uncaching in GraphX programming guide

(cherry picked from commit ad294db326f57beb98f9734e2b4c45d9da1a4c89)
Signed-off-by: Reynold Xin <rx...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/6fa4e02d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/6fa4e02d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/6fa4e02d

Branch: refs/heads/branch-0.9
Commit: 6fa4e02dd19308c9629fb898061334d554def641
Parents: 2f930d5
Author: Reynold Xin <rx...@apache.org>
Authored: Tue Jan 14 21:51:06 2014 -0800
Committer: Reynold Xin <rx...@apache.org>
Committed: Tue Jan 14 21:51:25 2014 -0800

----------------------------------------------------------------------
 docs/graphx-programming-guide.md | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/6fa4e02d/docs/graphx-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 5641f9f..03940d8 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -611,11 +611,20 @@ class GraphOps[VD, ED] {
 > substantial communication.  If possible try expressing the same computation using the
 > `mapReduceTriplets` operator directly.
 
+## Caching and Uncaching
+
+In Spark, RDDs are not persisted in memory by default. To avoid recomputation, they must be explicitly cached when using them multiple times (see the [Spark Programming Guide][RDD Persistence]). Graphs in GraphX behave the same way. **When using a graph multiple times, make sure to call [`Graph.cache()`][Graph.cache] on it first.**
+
+[RDD Persistence]: scala-programming-guide.html#rdd-persistence
+[Graph.cache]: api/graphx/index.html#org.apache.spark.graphx.Graph@cache():Graph[VD,ED]
+
+In iterative computations, *uncaching* may also be necessary for best performance. By default, cached RDDs and graphs will remain in memory until memory pressure forces them to be evicted in LRU order. For iterative computation, intermediate results from previous iterations will fill up the cache. Though they will eventually be evicted, the unnecessary data stored in memory will slow down garbage collection. It would be more efficient to uncache intermediate results as soon as they are no longer necessary. This involves materializing (caching and forcing) a graph or RDD every iteration, uncaching all other datasets, and only using the materialized dataset in future iterations. However, because graphs are composed of multiple RDDs, it can be difficult to unpersist them correctly. **For iterative computation we recommend using the Pregel API, which correctly unpersists intermediate results.**
+
 # Pregel API
 <a name="pregel"></a>
 
 Graphs are inherently recursive data-structures as properties of vertices depend on properties of
-their neighbors which intern depend on properties of *their* neighbors.  As a
+their neighbors which in turn depend on properties of *their* neighbors.  As a
 consequence many important graph algorithms iteratively recompute the properties of each vertex
 until a fixed-point condition is reached.  A range of graph-parallel abstractions have been proposed
 to express these iterative algorithms.  GraphX exposes a Pregel-like operator which is a fusion of


[2/2] git commit: Merge branch 'branch-0.9' of https://git-wip-us.apache.org/repos/asf/incubator-spark into branch-0.9

Posted by rx...@apache.org.
Merge branch 'branch-0.9' of https://git-wip-us.apache.org/repos/asf/incubator-spark into branch-0.9


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/a075a452
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/a075a452
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/a075a452

Branch: refs/heads/branch-0.9
Commit: a075a452da54f488d178457c87f8109148521a35
Parents: 6fa4e02 51131bf
Author: Reynold Xin <rx...@apache.org>
Authored: Tue Jan 14 21:52:13 2014 -0800
Committer: Reynold Xin <rx...@apache.org>
Committed: Tue Jan 14 21:52:13 2014 -0800

----------------------------------------------------------------------
 assembly/pom.xml         |  2 +-
 bagel/pom.xml            |  2 +-
 core/pom.xml             |  2 +-
 examples/pom.xml         |  2 +-
 external/flume/pom.xml   |  2 +-
 external/kafka/pom.xml   |  2 +-
 external/mqtt/pom.xml    |  2 +-
 external/twitter/pom.xml |  2 +-
 external/zeromq/pom.xml  |  2 +-
 graphx/pom.xml           |  2 +-
 mllib/pom.xml            |  2 +-
 pom.xml                  | 10 +++++++++-
 repl-bin/pom.xml         |  2 +-
 repl/pom.xml             |  2 +-
 streaming/pom.xml        |  2 +-
 tools/pom.xml            |  2 +-
 yarn/pom.xml             |  2 +-
 yarn/stable/pom.xml      |  2 +-
 18 files changed, 26 insertions(+), 18 deletions(-)
----------------------------------------------------------------------