You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jamie Hutton (JIRA)" <ji...@apache.org> on 2016/04/29 15:07:12 UTC
[jira] [Created] (SPARK-15002) Calling unpersist can cause spark to
hang indefinitely when writing out a result
Jamie Hutton created SPARK-15002:
------------------------------------
Summary: Calling unpersist can cause spark to hang indefinitely when writing out a result
Key: SPARK-15002
URL: https://issues.apache.org/jira/browse/SPARK-15002
Project: Spark
Issue Type: Bug
Components: GraphX, Spark Core
Affects Versions: 1.6.0, 1.5.2
Environment: AWS and Linux VM, both in spark-shell and spark-submit. tested in 1.5.2 and 1.6
Reporter: Jamie Hutton
The following code will cause spark to hang indefinitely. It happens when you have an unpersist which is followed by some futher processing of that data (in my case writing it out).
I first experienced this issue with graphX so my example below involved graphX code, however i suspect it might be more of a core issue than a graphx one. I have raised another bug with similar results (indefinite hanging) but in different circumstances, so these may well be linked
Code to reproduce (can be run in spark-shell):
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types._
import org.apache.spark.sql._
val r = scala.util.Random
val list = (0L to 500L).map(i=>(i,r.nextInt(500).asInstanceOf[Long],"LABEL"))
val distData = sc.parallelize(list)
val edgesRDD = distData.map(x => Edge(x._1, x._2, x._3))
val distinctNodes = distData.flatMap{row => Iterable((row._1, ("A")),(row._2, ("B")))}.distinct()
val nodesRDD: RDD[(VertexId, (String))] = distinctNodes
val graph = Graph(nodesRDD, edgesRDD)
graph.persist()
val ccGraph = graph.connectedComponents()
ccGraph.cache
val schema = StructType(Seq(StructField("id", LongType, false), StructField("netid", LongType, false)))
val rdd=ccGraph.vertices.map(row=>(Row(row._1.asInstanceOf[Long],row._2.asInstanceOf[Long])))
val builtEntityDF = sqlContext.createDataFrame(rdd, schema)
/*this unpersist step causes the issue*/
ccGraph.unpersist()
/*write step hangs for ever*/
builtEntityDF.write.format("parquet").mode("overwrite").save("/user/root/writetest.parquet")
If you take out the ccGraph.unpersist() step the write step completes successfully
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org