You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by rx...@apache.org on 2014/06/06 02:45:46 UTC

git commit: [SPARK-2025] Unpersist edges of previous graph in Pregel

Repository: spark
Updated Branches:
  refs/heads/master 3d3f8c800 -> 9bad0b737


[SPARK-2025] Unpersist edges of previous graph in Pregel

Due to a bug introduced by apache/spark#497, Pregel does not unpersist replicated vertices from previous iterations. As a result, they stay cached until memory is full, wasting GC time.

This PR corrects the problem by unpersisting both the edges and the replicated vertices of previous iterations. This is safe because the edges and replicated vertices of the current iteration are cached by the call to `g.cache()` and then materialized by the call to `messages.count()`. Therefore no unmaterialized RDDs depend on `prevG.edges`. I verified that no recomputation occurs by running PageRank with a custom patch to Spark that warns when a partition is recomputed.

Thanks to Tim Weninger for reporting this bug.

Author: Ankur Dave <an...@gmail.com>

Closes #972 from ankurdave/SPARK-2025 and squashes the following commits:

13d5b07 [Ankur Dave] Unpersist edges of previous graph in Pregel


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9bad0b73
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9bad0b73
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9bad0b73

Branch: refs/heads/master
Commit: 9bad0b73722fb359f14db864e69aa7efde3588c5
Parents: 3d3f8c8
Author: Ankur Dave <an...@gmail.com>
Authored: Thu Jun 5 17:45:38 2014 -0700
Committer: Reynold Xin <rx...@apache.org>
Committed: Thu Jun 5 17:45:38 2014 -0700

----------------------------------------------------------------------
 graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala | 1 +
 1 file changed, 1 insertion(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/9bad0b73/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
----------------------------------------------------------------------
diff --git a/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala b/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
index 4572eab..5e55620 100644
--- a/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
+++ b/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
@@ -150,6 +150,7 @@ object Pregel extends Logging {
       oldMessages.unpersist(blocking=false)
       newVerts.unpersist(blocking=false)
       prevG.unpersistVertices(blocking=false)
+      prevG.edges.unpersist(blocking=false)
       // count the iteration
       i += 1
     }