You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by jerryye <je...@gmail.com> on 2014/08/22 02:13:50 UTC
saveAsTextFile makes no progress without caching RDD
Hi,
Cross-posting this from users list.
I'm running on branch-1.1 and trying to do a simple transformation to a
relatively small dataset of 64GB and saveAsTextFile essentially hangs and
tasks are stuck in running mode with the following code:
// Stalls with tasks running for over an hour with no tasks finishing.
Smallest partition is 10MB
val data = sc.textFile("s3n://input")
val reformatted = data.map(t =>
t.replace("Test(","").replace(")","").replaceAll(",","\t"))
reformatted.saveAsTextFile("s3n://transformed")
// This runs but stalls doing GC after filling up 150% of 650GB of memory
val data = sc.textFile("s3n://input")
val reformatted = data.map(t =>
t.replace("Test(","").replace(")","").replaceAll(",","\t")).cache
reformatted.saveAsTextFile("s3n://transformed")
Any idea if this is a parameter issue and there is something I should try
out?
Thanks!
- jerry
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-makes-no-progress-without-caching-RDD-tp7949.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org