You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by mathewvinoj <vi...@hotmail.com> on 2015/07/15 07:03:07 UTC

spark cache issue while doing saveAsTextFile and saveAsParquetFile

Hi There,

I am using cache mapPartition to do some processing and cache the result as
below

I am storing the file as both format (parquet and textfile) where 
recomputing is happening both time.Eventhough i put the  cache its not
working as expected.

below is the code snippet.Any help is really appreciated.

 val record = sql(sqlString)
   val outputRecords=record.repartition(1).mapPartitions{rows =>
       val finalList1 = ListBuffer[Row]()  
       while (rows.hasNext){
        .
        .
            finalList1.add(xyz)
          }
	     finalList1.iterator   
     }.cache()

 val l = applySchema(outputRecords, schemaName).cache()
  l.saveAsTextFile(filename + ".txt")
 l.saveAsParquetFile(filename+ ".parquet")

Expected result: When we do saveAsTextFile the computation should happen and
cache the result
and the second time when we do saveAsparquetFile it should get the result
from the cache.

thanks





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-cache-issue-while-doing-saveAsTextFile-and-saveAsParquetFile-tp23845.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org