You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by gaganbm <ga...@gmail.com> on 2014/03/21 11:42:33 UTC

Persist streams to text files

Hi, 

I am trying to persist the DStreams to text files. When I use the inbuilt
API 'saveAsTextFiles' as : 

stream.saveAsTextFiles(resultDirectory) 

this creates a number of subdirectories, for each batch, and within each sub
directory, it creates bunch of text files for each RDD (I assume).

I am wondering if I can have single text files for each batch. Is there any
API for that ? Or else, a single output file for the entire stream ? 

I tried to manually write from each RDD stream to a text file as : 

stream.foreachRDD(rdd =>{ 
  rdd.foreach(element => { 
  fileWriter.write(element) 
  }) 
  }) 

where 'fileWriter' simply makes use of a Java BufferedWriter to write
strings to a file. However, this fails with exception : 

DStreamCheckpointData.writeObject used 
java.io.BufferedWriter 
java.io.NotSerializableException: java.io.BufferedWriter 
        at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) 
        at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) 
        ..... 

Any help on how to proceed with this ? 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Persist-streams-to-text-files-tp2986.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.