You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by jerryye <je...@gmail.com> on 2014/08/22 02:13:50 UTC

saveAsTextFile makes no progress without caching RDD

Hi, 
Cross-posting this from users list.

I'm running on branch-1.1 and trying to do a simple transformation to a
relatively small dataset of 64GB and saveAsTextFile essentially hangs and
tasks are stuck in running mode with the following code: 

// Stalls with tasks running for over an hour with no tasks finishing.
Smallest partition is 10MB 
val data = sc.textFile("s3n://input") 
val reformatted = data.map(t =>
t.replace("Test(","").replace(")","").replaceAll(",","\t")) 
reformatted.saveAsTextFile("s3n://transformed") 

// This runs but stalls doing GC after filling up 150% of 650GB of memory 
val data = sc.textFile("s3n://input") 
val reformatted = data.map(t =>
t.replace("Test(","").replace(")","").replaceAll(",","\t")).cache 
reformatted.saveAsTextFile("s3n://transformed") 

Any idea if this is a parameter issue and there is something I should try
out? 

Thanks! 

- jerry 



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-makes-no-progress-without-caching-RDD-tp7949.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org