You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SK <sk...@gmail.com> on 2014/06/11 03:34:08 UTC

output tuples in CSV format

My output is a set of tuples and when I output it using saveAsTextFile, my
file looks as follows:

(field1_tup1, field2_tup1, field3_tup1,...)
(field1_tup2, field2_tup2, field3_tup2,...)

In Spark. is there some way I can simply have it output in CSV format as
follows (i.e. without the parentheses):
field1_tup1, field2_tup1, field3_tup1,...
field1_tup2, field2_tup2, field3_tup2,...

I could write a script to remove the parentheses, but would be easier if I
could omit the parentheses. I did not find a saveAsCsvFile in Spark.

thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/output-tuples-in-CSV-format-tp7363.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: output tuples in CSV format

Posted by "Shao, Saisai" <sa...@intel.com>.
It would be better to add one more transformation step before saveAsTextFile, like:

rdd.map(tuple => "%s,%s,%s".format(tuple._1, tuple._2, tuple._3)).saveAsTextFile(...)

By manually convert to the format you what, and then write to HDFS.

Thanks
Jerry

-----Original Message-----
From: SK [mailto:skrishna.id@gmail.com] 
Sent: Wednesday, June 11, 2014 9:34 AM
To: user@spark.incubator.apache.org
Subject: output tuples in CSV format

My output is a set of tuples and when I output it using saveAsTextFile, my file looks as follows:

(field1_tup1, field2_tup1, field3_tup1,...) (field1_tup2, field2_tup2, field3_tup2,...)

In Spark. is there some way I can simply have it output in CSV format as follows (i.e. without the parentheses):
field1_tup1, field2_tup1, field3_tup1,...
field1_tup2, field2_tup2, field3_tup2,...

I could write a script to remove the parentheses, but would be easier if I could omit the parentheses. I did not find a saveAsCsvFile in Spark.

thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/output-tuples-in-CSV-format-tp7363.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: output tuples in CSV format

Posted by Mikhail Strebkov <st...@gmail.com>.
you can just use something like this:
  myRdd(_.productIterator.mkString(",")).saveAsTextFile


On Tue, Jun 10, 2014 at 6:34 PM, SK <sk...@gmail.com> wrote:

> My output is a set of tuples and when I output it using saveAsTextFile, my
> file looks as follows:
>
> (field1_tup1, field2_tup1, field3_tup1,...)
> (field1_tup2, field2_tup2, field3_tup2,...)
>
> In Spark. is there some way I can simply have it output in CSV format as
> follows (i.e. without the parentheses):
> field1_tup1, field2_tup1, field3_tup1,...
> field1_tup2, field2_tup2, field3_tup2,...
>
> I could write a script to remove the parentheses, but would be easier if I
> could omit the parentheses. I did not find a saveAsCsvFile in Spark.
>
> thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/output-tuples-in-CSV-format-tp7363.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>