You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by dash <bs...@nd.edu> on 2014/06/20 19:27:59 UTC
Can not checkpoint Graph object's vertices but could checkpoint
edges
I'm trying to workaround the StackOverflowError when an object have a long
dependency chain, someone said I should use checkpoint to cuts off
dependencies. I write a sample code to test it, but I can only checkpoint
edges but not vertices. I think I do materialize vertices and edges after
calling checkpoint, why only edge been checkpointed?
Here is my code, really appreciate if you can point out what I did wrong.
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Test")
.setMaster("local[4]")
val sc = new SparkContext(conf)
sc.setCheckpointDir("./checkpoint")
val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L,
2L)))
val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L,
1L), Edge(2L, 0L, 2L)))
var g = Graph(v, e)
val vertexIds = Seq(0L, 1L, 2L)
var prevG: Graph[VertexId, Long] = null
for (i <- 1 to 100000) {
vertexIds.toStream.foreach(id => {
println("generate new graph")
prevG = g
g = Graph(g.vertices, g.edges)
println("uncache vertices")
prevG.unpersistVertices(blocking = false)
println("uncache edges")
prevG.edges.unpersist(blocking = false)
//Third approach, do checkpoint
//Vertices can not be checkpointed, still have StackOverflowError
g.vertices.checkpoint()
g.edges.checkpoint()
println(g.vertices.count()+g.edges.count())
println(g.vertices.isCheckpointed+" "+g.edges.isCheckpointed)
})
println(" iter " + i + " finished")
}
}
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-not-checkpoint-Graph-object-s-vertices-but-could-checkpoint-edges-tp8019.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.